{{ message }}
Add STRING_AGG as alias of groupConcat#105125
Merged
Merged
Conversation
PostgreSQL/SQL-standard `STRING_AGG(expr, sep)` matches ClickHouse's existing `groupConcat(expr, sep)` exactly when the separator is passed as a regular argument. Expose `STRING_AGG` as a case-insensitive alias so PostgreSQL-dialect queries (e.g., the SQLStorm corpus) do not need rewriting. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
alexey-milovidov
added a commit
that referenced
this pull request
May 16, 2026
Recent compatibility PRs added case-insensitive aliases and parser sugar that make several of the SQLStorm rewrites unnecessary: - `STDDEV` -> `stddevPop` (#105120) - `array_to_string` -> `arrayStringConcat` (#105121) - `REGEXP_SUBSTR` -> `regexpExtract` (#105122) - `CARDINALITY` -> `length` (#105123) - `unnest()` function -> `arrayJoin()` (#105124) - `STRING_AGG` -> `groupConcat` (#105125) - `date_part(unit,e)` -> `EXTRACT(unit FROM e)` (#105127) - `expr OP ANY/ALL(array_literal)` (#105129) `ARRAY_AGG`, `TRANSLATE`, and `EXTRACT(EPOCH|DOW|... FROM ...)` were already supported by ClickHouse before these PRs. Removed the corresponding rewrite calls and helper functions (`rewrite_string_agg`, `rewrite_array_agg`, `rewrite_date_part`, `rewrite_stddev`, `rewrite_extract_epoch`, the EXTRACT(DOW) inline rewrite, `rewrite_any_comparison`, and the trailing `unnest(...) -> arrayJoin(...)` substitution). Also dropped the unreferenced no-op helpers (`rewrite_extract_unit`, `rewrite_fetch_offset`, `rewrite_interval`, `rewrite_cast_timestamp`, `rewrite_current_timestamp`, `rewrite_bool_literals`, `rewrite_ilike`, `rewrite_no_supertype`). The PostgreSQL `LATERAL` / `CROSS JOIN UNNEST(...)` table-source forms, `arrayJoin(...)` in JOIN position, PG-specific casts, `AT TIME ZONE`, `STRING_AGGDistinct` (a mangled-name artifact), and the still-unsupported function rewrites (`string_to_array`, `regexp_split_to_array`, `RANDOM`, `TO_TIMESTAMP`, `ARRAY_LENGTH`, `SPLIT_PART`, `age`) are still rewritten. Net change: -329 lines from rewrite_queries.py and -75 lines from the tests (the `TestRewriteAnyComparison` class is removed since the rewrite it covered no longer exists). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The analyzer rewrites the 2-argument call `groupConcat(expr, sep)` into the parameterized form `groupConcat(sep)(expr)` only for names listed in `GroupConcatImpl<false>::getNameAndAliases` (see `QueryTreeBuilder::setSecondArgumentAsParameter`). The previous alias was registered through `registerAlias` only, so `STRING_AGG(expr, sep)` hit the unary check in `createAggregateFunctionGroupConcat` and failed with `NUMBER_OF_ARGUMENTS_DOESNT_MATCH`. Add `string_agg` to `getNameAndAliases` so the rewrite triggers, and register every alias from that list in `registerAggregateFunctionGroupConcat`. Failure: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=105125&sha=28eefc28ebcdb622ae3d79b79d96e3563e495faa&name_0=PR&name_1=Fast%20test PR: #105125 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
m-selmi
approved these changes
May 18, 2026
The 2-argument form `STRING_AGG(expr, sep)` (and equivalently `groupConcat(expr, sep)`) is rewritten into the parameterized form `STRING_AGG(sep)(expr)` only by the new analyzer (see `QueryTreeBuilder::setSecondArgumentAsParameter`). The old analyzer sees two positional arguments and fails with `NUMBER_OF_ARGUMENTS_DOESNT_MATCH`. The test fails under "Stateless tests (amd_llvm_coverage, old analyzer, s3 storage, DatabaseReplicated, WasmEdge, parallel)": https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=105125&sha=d5baa1fd637c78da6f35bf52e576d0c77223ea48&name_0=PR&name_1=Stateless%20tests%20%28amd_llvm_coverage%2C%20old%20analyzer%2C%20s3%20storage%2C%20DatabaseReplicated%2C%20WasmEdge%2C%20parallel%29 Pin the 2-argument queries to `enable_analyzer=1`, matching the existing convention in `03156_group_concat.sql`. The 1-argument form remains analyzer-agnostic. PR: #105125 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
LLVM Coverage ReportChanged lines: 100.00% (7/7) · Uncovered code |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

PostgreSQL/SQL-standard
STRING_AGG(expr, sep)matches ClickHouse's existinggroupConcat(expr, sep)exactly when the separator is passed as a regular argument. Adding a case-insensitive alias spares PostgreSQL-dialect workloads (such as the SQLStorm corpus) a rewrite.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Added
STRING_AGGas a case-insensitive alias ofgroupConcatfor PostgreSQL/SQL-standard compatibility.Documentation entry for user-facing changes
Version info
26.5.1.850