Fix duplicate ALIAS columns causing column count mismatch in Distributed#101789
Fix duplicate ALIAS columns causing column count mismatch in Distributed#101789Onyx2406 wants to merge 13 commits into
Conversation
When two ALIAS columns share the same expression, the Planner deduplicated them causing a column count mismatch. Wrap duplicate ALIAS expressions in `identity()` to keep them distinct while preserving semantics. Fixes ClickHouse#85895
Rename `04073_distributed_duplicate_alias_columns` to `04079` since number 04073 is already taken on master by other tests.
… avoid number collision with upstream master
…NS_FOR_ALIAS` The previous revision decided per *visit* whether to wrap an expanded `ALIAS` column in `identity`: the first time an expression hash was seen it stayed bare, and every later visit wrapped. That broke ordinary queries that reference the same `ALIAS` column more than once (e.g. `SELECT user_level, grouping(user_level) GROUP BY user_level`), because the SELECT occurrence expanded to `multiIf(...)` while the GROUP BY occurrence expanded to `identity(multiIf(...))`, and the Planner raised `MULTIPLE_EXPRESSIONS_FOR_ALIAS` for the single alias now bound to two different expressions. Regressions were visible on `03259_grouping_sets_aliases`, `02494_optimize_group_by_function_keys_and_alias_columns`, `00594_alias_in_distributed`, and `03034_normalized_ast`. Key the wrap decision on the alias name, not the visit order. The first visit of a new alias decides `wrap` vs `no-wrap` based on whether the expression hash was already used by another alias, and every subsequent visit of the same alias reuses that decision. Different aliases that share an expression still get `identity` on the later one, which is the behavior ClickHouse#85895 requires.
The previous revision set the alias on `column_expression` up-front and
then, when it decided to wrap, passed the already-aliased expression into
`identity`. That produced `identity(X AS e) AS e` — the same alias on both
the inner expression and the outer wrapper. The analyzer later saw two
distinct expressions for alias `e` and threw
`MULTIPLE_EXPRESSIONS_FOR_ALIAS` on `04090_distributed_duplicate_alias_columns`
and `00594_alias_in_distributed`.
Factor the expansion tail into `finalizeExpansion`, which either:
- attaches the alias to the bare expression (when the alias is seen for
the first time), or
- strips any lingering alias from the inner expression and attaches the
alias only to the `identity` wrapper.
The alias is therefore bound to exactly one node, matching what the
Planner and analyzer expect.
Audit of the `identity()` cache fix pointed out that the regression test only exercised the original bug (two distinct aliases that share one expression). The class of query the cache specifically guards against — the *same* ALIAS column referenced from more than one clause — wasn't covered, so a regression of that logic would slip through. Add four queries that each reference the same ALIAS column from multiple positions: - SELECT + GROUP BY - SELECT + GROUP BY + HAVING - SELECT + ORDER BY - Two aliases `d` and `e`, each referenced more than once in the output, so the visitor both hits the cache *and* the identity wrap in the same query. If the visitor ever returns a mismatched expansion between visits of the same alias, these cases raise MULTIPLE_EXPRESSIONS_FOR_ALIAS and fail.
…licate-alias-columns
|
Merged current
Root cause (reproduced on Design note. The current fix disambiguates by wrapping later occurrences in I've held off on further changes to the |
The internal function `__actionName(expression, 'name')` overrides the action name of `expression` with the constant 'name' (see `PlannerActionsVisitor`). The 2024 hardening against direct calls restricted both arguments to `String`, which was fine while the function was generated only for constants. To reuse it for keeping duplicate `ALIAS` column expansions distinct in queries sent to shards of Distributed tables, the first argument (the renamed expression, which is returned unchanged) may now have any type. The second argument (the name) must still be a genuine `String`, constant and non-empty, so the protection against malformed direct calls is preserved. Related: ClickHouse#85895 Related: ClickHouse#101789
Replace the `identity` wrapping of duplicate `ALIAS` column expansions in `ReplaceAliasColumnsVisitor` (also fixing the `Replase` typo in its name) with the internal function `__actionName(expression, 'name')`, which exists exactly for pinning planner action names and is a no-op at runtime: - Any number of `ALIAS` columns sharing one expression stay distinct, because every column gets its own name. The `identity` approach kept only two: the third and subsequent duplicates produced identical `identity(...)` trees that were collapsed again, and the query failed with the same `NUMBER_OF_COLUMNS_DOESNT_MATCH` exception. - The wrap decision is computed by a deterministic two-pass traversal scoped per column source instead of a visit-order-dependent cache keyed by alias name alone, so equal alias names of different tables cannot be conflated. - Occurrences in WHERE and PREWHERE are expanded bare: filters are computed entirely on the shards, and an opaque wrapper would prevent primary key and skipping index analysis there. `identity` disables index analysis by design, so the previous approach pessimized filtering by a duplicate `ALIAS` column to a full scan. The test now also covers three duplicates, GROUP BY keys, subqueries, GLOBAL JOIN of two distributed tables, `ALIAS` columns missing on the shard table, equal alias names with different expressions in two sources, and verifies that primary key analysis still prunes granules when filtering by a duplicate `ALIAS` column. Closes: ClickHouse#85895
|
I reworked the fix to address the review feedback (commits
The test now covers three duplicates, Note: a |
|
@groeneai, investigate the failure: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=101789&sha=9761f4206efb6c6de7509e18e9beef83c44444f2&name_0=PR&name_1=Integration%20tests%20%28amd_msan%2C%204%2F6%29 ( |
|
|
||
| column_expression->removeAlias(); | ||
|
|
||
| auto action_name = fmt::format("__aliasColumn_{}_{}", source_ordinals.at(column_source), column_name); |
There was a problem hiding this comment.
The generated action name can collide with a real input column name. For a valid table with a physical column named __aliasColumn_0_d plus duplicate aliases d ALIAS b + c, e ALIAS b + c, a query like SELECT __aliasColumn_0_d, d, e FROM dist puts the physical column and the wrapper for d in the same ActionsDAG namespace. PlannerActionsVisitor / ActionsScopeNode::addInputColumnIfNecessary and addFunctionIfNecessary both reuse an existing node by result name, so whichever expression is visited first wins and the other column can silently return the wrong values. Please generate these names through a namespace that cannot collide with user columns, for example by using the planner's column identifiers or unique-id machinery and checking GlobalPlannerContext::hasColumnIdentifier, and add a regression with a physical __aliasColumn_0_d column.
|
@alexey-milovidov Confirmed: this is the known Fix PR (in progress): #107086 (test-only, CI clean, awaiting review). Root cause. CIDB (30d). 14 Fix in #107086. Replace the one-shot KILL + bare join with a bounded loop that re-issues |
…licate-alias-columns
The fix for issue ClickHouse#85895 lives entirely in the new analyzer's distributed query-tree rewrite (`buildQueryTreeDistributed`, wrapping duplicate `ALIAS` expansions into the internal `__actionName` function). The old analyzer uses a separate AST-based path that cannot resolve `ALIAS` columns of a `GLOBAL JOIN` right-hand table (`UNKNOWN_IDENTIFIER` in `ExpressionAnalyzer` / `GlobalSubqueriesVisitor`), which is an unrelated, pre-existing limitation. Tag the test `no-old-analyzer` so it runs only where the fix applies. Failure: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=101789&sha=dfd7798ef2e74556fed1933e5f312fd160322ac2&name_0=PR&name_1=Stateless%20tests%20%28amd_llvm_coverage%2C%20old%20analyzer%2C%20s3%20storage%2C%20DatabaseReplicated%2C%20WasmEdge%2C%20parallel%29 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Merged
One unresolved review thread remains (the generated |
| void replace(QueryTreeNodePtr & node, bool is_filter_context) | ||
| { | ||
| if (auto column_expression = getColumnNodeAliasExpression(node, is_filter_context)) | ||
| node = std::move(column_expression); |
There was a problem hiding this comment.
This only expands the alias column once. If an alias points at another alias, for example d ALIAS b + c, e ALIAS d, getColumnNodeAliasExpression for e returns the ColumnNode for d; after this assignment the visitor descends into d's children, but it never replaces the new root ColumnNode d with b + c. The shard query can still contain d AS e, which breaks the stated contract when aliases exist only on the Distributed table and are missing on the shard table. It also means duplicate detection misses aliases that become equal only after nested ALIAS expansion.
Please expand recursively until the replacement root is no longer an ALIAS ColumnNode (or normalize recursively before computing duplicate groups), and add a regression with nested aliases, ideally with the shard table lacking both alias columns.
| static FunctionPtr create(ContextPtr) { return std::make_shared<FunctionActionName>(); } | ||
| size_t getNumberOfArguments() const override { return 2; } | ||
| ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {0, 1}; } | ||
| ColumnNumbers getArgumentsThatAreAlwaysConstant() const override { return {1}; } |
There was a problem hiding this comment.
Relaxing the first argument from constant to arbitrary expressions makes direct __actionName calls able to forge any ActionsDAG node name. PlannerActionsVisitor trusts the second argument as the result name, while ActionsScopeNode::addFunctionIfNecessary reuses an existing node with that name, so a direct query such as SELECT __actionName(number + 1, 'number') FROM numbers(3) can resolve to the existing input node number instead of computing number + 1. This was previously rejected because the first argument had to be constant, and the tests now explicitly allow non-constant first arguments.
Please keep the direct-call hardening: either make the arbitrary-expression form generated-only, or reject/namespace __actionName names that collide with existing action nodes so malformed direct calls fail instead of returning wrong results.
…licate-alias-columns
|
Merged |
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered by tests: 73/73 (100.00%) | Lost baseline coverage: none · Uncovered code |

What this PR does
When several
ALIAScolumns of one table share the same expression (e.g.d Float64 ALIAS b + c, e Float64 ALIAS b + c), the query rewrite for Distributed tables expanded them into identical expression trees with identical action names, the planner collapsed them into a single column of the block sent over the network, and the positional conversion against the header expected by the initiator failed withNUMBER_OF_COLUMNS_DOESNT_MATCH.The fix wraps such expansions into the internal function
__actionName(expression, 'name'), which exists exactly for pinning planner action names (seePlannerActionsVisitor) and is a no-op at runtime. EveryALIAScolumn gets its own action name, so any number of duplicates stay distinct, while all occurrences of oneALIAScolumn resolve to the same name and are correctly recognized as one column. The duplicate detection is a deterministic two-pass traversal scoped per column source. Occurrences inWHERE/PREWHEREare expanded bare, so primary key and skipping index analysis on the shards is unaffected. The first argument of__actionName(returned unchanged) may now have any type; the name argument must still be a constant, non-empty, genuineString.Closes: #85895
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix
NUMBER_OF_COLUMNS_DOESNT_MATCHerror when querying a Distributed table with severalALIAScolumns sharing the same expression.Documentation entry for user-facing changes