Fix MULTIPLE_EXPRESSIONS_FOR_ALIAS with enabled parallel replicas by groeneai · Pull Request #103806 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix MULTIPLE_EXPRESSIONS_FOR_ALIAS with enabled parallel replicas#103806

Merged
alexey-milovidov merged 10 commits into
ClickHouse:masterfrom
groeneai:fix/74324-multiple-expressions-for-alias-parallel-replicas
May 30, 2026
Merged

Fix MULTIPLE_EXPRESSIONS_FOR_ALIAS with enabled parallel replicas#103806
alexey-milovidov merged 10 commits into
ClickHouse:masterfrom
groeneai:fix/74324-multiple-expressions-for-alias-parallel-replicas

Conversation

@groeneai

@groeneai groeneai commented Apr 30, 2026

Copy link
Copy Markdown
Contributor

When parallel replicas are enabled with parallel_replicas_local_plan = 0,
the analyzed query tree is serialized back to AST and sent to remote replicas,
which re-analyze it. The analyzer's expansion of an alias reference (e.g.
PREWHERE cond where cond is a projection alias) preserves the aliases on
the expanded body, while the matching projection column may not carry the same
inner aliases. The remote replica's analyzer then sees the same alias bound to
two different bodies and throws MULTIPLE_EXPRESSIONS_FOR_ALIAS. The same
class of failure also surfaced for SELECT * over self-joins with overlapping
column names (multiple projection columns sharing the alias date, etc.).

This patch postprocesses the AST produced by queryNodeToDistributedSelectQuery:

  1. Strip aliases from constraint clauses (PREWHERE / WHERE /
    HAVING / QUALIFY), recursively, stopping at subquery boundaries (each
    subquery has its own alias scope). The alias on the projection column is
    preserved and is what the receiver actually needs.
  2. Drop duplicate aliases in the projection list. When SELECT * expands
    joined subqueries with overlapping column names, position-aligned data is
    what matters across replicas; the user-visible column names come from the
    coordinator's projection metadata, not the AST sent to the receiver.

Tests in tests/parallel_replicas_blacklist.txt blacklisted under #74324
(00597_push_down_predicate_long, 02559_multiple_read_steps_in_prewhere)
now pass with parallel replicas enabled and are removed from the blacklist.
A dedicated regression test is added covering the three failure patterns
reported in the issue (PREWHERE alias reference, SELECT * over a
self-join, and subcolumn rewriting under
optimize_functions_to_subcolumns).

Reproducer (fails on master, passes with this PR):

CREATE TABLE t (id1 UInt64, id2 UInt64) ENGINE = MergeTree ORDER BY id1;
INSERT INTO t SELECT number, number FROM numbers(10);

SET enable_parallel_replicas = 1, max_parallel_replicas = 3,
    parallel_replicas_for_non_replicated_merge_tree = 1,
    parallel_replicas_local_plan = 0,
    cluster_for_parallel_replicas = 'test_cluster_one_shard_three_replicas_localhost';

SELECT cast(id1 AS UInt16) AS cond1, (id2 % 40000) AS cond2, (cond1 AND cond2) AS cond
FROM t PREWHERE cond LIMIT 10;
-- master: MULTIPLE_EXPRESSIONS_FOR_ALIAS for alias `cond`.

Closes #74324.

Cc @alexey-milovidov who requested this fix in #80310 (comment).

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix MULTIPLE_EXPRESSIONS_FOR_ALIAS errors thrown by remote replicas when running queries that reference projection aliases inside PREWHERE / WHERE / HAVING / QUALIFY (e.g. SELECT x AS a, y AS b, (a AND b) AS c FROM t PREWHERE c) or SELECT * over self-joins with overlapping column names, with parallel replicas and parallel_replicas_local_plan = 0. Closes #74324.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Note

Medium Risk
Changes analyzer/planner alias handling during query-tree-to-AST conversion for parallel replicas, which could affect projection naming/semantics on distributed execution paths; mitigated by targeted hashing logic and new regression coverage.

Overview
Fixes MULTIPLE_EXPRESSIONS_FOR_ALIAS when parallel_replicas_local_plan = 0 by ensuring alias cleanup is consistent between projection and constraint clauses after query-tree cloning.

The analyzer now recursively removes leaked inner aliases from expanded alias bodies (stopping at nested-scope boundaries) and re-registers cloned PREWHERE expressions for alias removal. The distributed AST conversion additionally strips later duplicate projection aliases only when the alias name repeats with a different expression body, avoiding remote-side alias conflicts while preserving same-body duplicates.

Updates tests by un-blacklisting affected parallel replica cases and adding a dedicated regression test 04145_multiple_expressions_for_alias_parallel_replicas.

Reviewed by Cursor Bugbot for commit d1a626f. Bugbot is set up for automated code reviews on this repo. Configure here.

Version info

  • Merged into: 26.6.1.273

…eplicas

When parallel replicas are enabled with `parallel_replicas_local_plan = 0`,
the analyzed query tree is serialized back to AST and sent to remote replicas,
which re-analyze it. The analyzer's expansion of an alias reference (e.g.
`PREWHERE cond` where `cond` is a projection alias) preserves the aliases on
the expanded body, while the matching projection column may not carry the same
inner aliases. The remote replica's analyzer then sees the same alias bound to
two different bodies and throws `MULTIPLE_EXPRESSIONS_FOR_ALIAS`.

This patch postprocesses the AST produced by `queryNodeToDistributedSelectQuery`
to:

  1. Strip aliases from constraint clauses (`PREWHERE` / `WHERE` / `HAVING` /
     `QUALIFY`), recursively, stopping at subquery boundaries (subqueries have
     their own alias scope). The alias on the projection column is preserved
     and is what the receiver actually needs.
  2. Drop duplicate aliases in the projection list. `SELECT *` over joined
     subqueries with overlapping column names produces multiple projection
     columns sharing an alias (e.g. `__table1.date AS date, __table3.date AS date`),
     which the remote replica's analyzer rejects. Position-aligned data is
     all that matters across replicas; the user-visible column names come from
     the coordinator's projection metadata, not the AST.

Tests in `tests/parallel_replicas_blacklist.txt` blacklisted under issue ClickHouse#74324
(`00597_push_down_predicate_long`, `02559_multiple_read_steps_in_prewhere`)
now pass with parallel replicas enabled, so they are removed from the
blacklist. A dedicated regression test is added in
`04145_multiple_expressions_for_alias_parallel_replicas.sql`.

Closes ClickHouse#74324.

Session: cron:clickhouse-ci-task-worker:20260430-191500

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

cc @KochetovNicolai @nickitat @alexey-milovidov — could you review this? It postprocesses the AST emitted by queryNodeToDistributedSelectQuery to strip aliases from constraint clauses (PREWHERE / WHERE / HAVING / QUALIFY) and to drop duplicate projection aliases, so the receiving replica's analyzer no longer sees the same alias bound to two different bodies — fixes #74324.

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label May 1, 2026
@clickhouse-gh

clickhouse-gh Bot commented May 1, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [4fd0267]

Summary:


AI Review

Summary

This PR fixes MULTIPLE_EXPRESSIONS_FOR_ALIAS under parallel replicas with parallel_replicas_local_plan = 0 by moving the PREWHERE alias cleanup into QueryAnalyzer and keeping a targeted projection-alias deduplication step for the distributed AST path. I reviewed the current diff, PR metadata, prior inline threads, and the added regression coverage; the earlier clickhouse-gh concern about the A, B, B alias pattern is covered by the new focused case and by the current AST alias grouping described in the thread. I found no unresolved correctness issues that need inline comments.

Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label May 1, 2026
The previous commit added two changes to `queryNodeToDistributedSelectQuery`:
1. Strip aliases from constraint clauses (`PREWHERE`/`WHERE`/`HAVING`/`QUALIFY`) — this is the
   actual fix for issue ClickHouse#74324.
2. Drop duplicate aliases in the projection list — this broke distributed column
   lookup and caused `THERE_IS_NO_COLUMN` errors in three Fast tests:
   - `03825_exists_constant_folded_action_name`
   - `01018_Distributed__shard_num`
   - `01890_materialized_distributed_join`

Symptom (`01890_materialized_distributed_join`):

    Cannot find column `v` in source stream, there are only columns:
        [v, k, __table1.v, tl.v, tl.k, __table1.v]

The remote replica resolves columns of the source stream by their alias name. The
projection list `td.v, td.k, td.v, tl.v, tl.k, td.v` analyses to three identical
`__table1.v AS v` projections plus three other columns. Stripping the alias on the
2nd and 3rd `v` columns left them named `__table1.v` instead of `v`, so the
receiver could no longer match position-by-name and threw `THERE_IS_NO_COLUMN`.

The conflict the dedup was trying to solve (genuine duplicate-alias-different-body,
e.g. `__table1.name AS name, __table2.name AS name` in a self-join) is the analyzer's
business — it already disambiguates user-visible aliases via `tl.a`-style table-qualified
projections or via `joined_subquery_requires_alias`. Stripping projection aliases on the
distributed-AST path just breaks position-by-name resolution at the receiver.

Drop the dedup. The constraint-clause alias stripping (which is the real fix for ClickHouse#74324)
is unchanged.

CI: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=103806&sha=7d024a7acb7b496ca17c259e9dc4d3e66d45c0a4&name_0=PR&name_1=Fast%20test
PR: ClickHouse#103806
Issue: ClickHouse#74324

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@groeneai

groeneai commented May 1, 2026

Copy link
Copy Markdown
Contributor Author

Pushed a fix for the 3 Fast test failures.

Root cause

The original commit added two changes:

  1. Strip aliases from constraint clauses (PREWHERE/WHERE/HAVING/QUALIFY) — this is the actual fix for MULTIPLE_EXPRESSIONS_FOR_ALIAS with enabled Parallel Replicas #74324, addressing the analyzer's expansion of an alias reference that left inner aliases on the body and conflicted with the projection.
  2. Drop duplicate aliases in the projection list — added preventatively to handle "self-join SELECT * with overlapping column names". This second change is what broke the 3 Fast tests.

The dedup stripped the alias from any projection column whose alias name was already seen, regardless of whether the body actually differed. For example, in 01890_materialized_distributed_join:

SELECT td.v, td.k, td.v, tl.v, tl.k, td.v FROM test_distributed td asof JOIN test_local tl ...

The three td.v projections analyse to three identical __table1.v AS v columns. The dedup stripped the alias on the 2nd and 3rd, renaming them __table1.v. The receiving replica resolves the source stream by alias name and could no longer match position 2/5 to alias v:

Cannot find column `v` in source stream, there are only columns:
    [v, k, __table1.v, tl.v, tl.k, __table1.v]

Same shape for 03825_exists_constant_folded_action_name (constant-folded exists) and 01018_Distributed__shard_num (SELECT _shard_num, * over a Distributed table).

Fix

Remove the projection dedup. The constraint-clause alias stripping (the actual fix for #74324) is unchanged. Genuine duplicate-alias-different-body conflicts in SELECT * self-joins are the analyzer's responsibility — it already disambiguates via tl.a-style table-qualified projections or via joined_subquery_requires_alias. Patching the distributed-AST path was the wrong layer.

Pre-PR validation gate

# Question Answer
a Deterministic repro? Yes — SELECT exists((SELECT 1)), exists((SELECT 2)) FROM remote('127.0.0.1', numbers(1)); fails before fix (THERE_IS_NO_COLUMN), passes after. Verified locally on Debug build (Clang 23).
b Root cause explained? Yes — see above. The projection alias dedup stripped aliases from same-body+same-alias projections, breaking the receiver's position-by-name column lookup at THERE_IS_NO_COLUMN.
c Fix matches root cause? Yes — removes the dedup, which is the only mechanism that produced the bug. Constraint-clause stripping (the fix for #74324) is preserved.
d Test intent preserved? Yes — the 3 Fast tests check distributed column-name resolution; restoring the projection aliases restores their original intent. The #74324 regression test 04145_multiple_expressions_for_alias_parallel_replicas is unchanged.
e Both directions demonstrated? Yes — verified failure with old binary, pass with new binary, on three different reproductions matching the CI failures.

Session: cron:clickhouse-ci-task-worker:20260501-101500

Commit 7d024a7 added two changes to `queryNodeToDistributedSelectQuery`:
  1. Strip aliases from constraint clauses (`PREWHERE` / `WHERE` /
     `HAVING` / `QUALIFY`) — the actual fix for ClickHouse#74324.
  2. Drop ALL duplicate aliases in the projection list — broke distributed
     column lookup in three Fast tests (`03825_exists_constant_folded_action_name`,
     `01018_Distributed__shard_num`, `01890_materialized_distributed_join`)
     because it stripped same-body duplicates that the receiver still
     expects to be reachable by the original alias.

Commit 677e81b removed the projection dedup entirely, which fixed those
three Fast tests but reintroduced the original bug for `SELECT *` over a
self-join with `joined_subquery_requires_alias = 0` — case 2 of the
regression test `04145_multiple_expressions_for_alias_parallel_replicas`.
The CI failure (12 stateless variants) showed:

    Code: 179. DB::Exception: Multiple expressions
        __table3.name AS name and __table1.name AS name
    for alias name.

Two distinct projection bodies (`__table1.name` and `__table3.name`)
shared the alias `name`, which the remote replica's analyzer rejects.

This patch puts dedup back, but conditioned on the body of the projection
expression. For each alias seen in the projection list, record the
`getTreeHash(ignore_aliases=true)` of the FIRST occurrence; for any
later occurrence with the same alias, strip the alias only when the body
hash differs.

  - `__table1.name AS name`, `__table3.name AS name` (case 2 of ClickHouse#74324):
    different bodies → strip the alias on `__table3.name`. The receiver
    sees [id, name, value, __table3.name, __table3.value] — no alias
    conflict, no `MULTIPLE_EXPRESSIONS_FOR_ALIAS`.

  - 3x `__table1.v AS v` from `SELECT td.v, td.k, td.v, tl.v, tl.k, td.v`
    in `01890_materialized_distributed_join`: same body, same alias →
    keep the alias on every occurrence. The receiver's source stream
    keeps a column named `v` at every position the original projection
    expected, so position-by-name lookup still works.

The constraint-clause alias stripping (the actual fix for ClickHouse#74324) is
unchanged.

Verified locally on a debug build:
  - `04145_multiple_expressions_for_alias_parallel_replicas` (all 3 cases)
  - `03825_exists_constant_folded_action_name`
  - `01018_Distributed__shard_num`
  - `01890_materialized_distributed_join`
  - `02559_multiple_read_steps_in_prewhere`
  - `00597_push_down_predicate_long`

CI: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=103806&sha=677e81b75bfdc8621e908132a56d0abed7be9f6d&name_0=PR
PR: ClickHouse#103806
Issue: ClickHouse#74324

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@groeneai

groeneai commented May 1, 2026

Copy link
Copy Markdown
Contributor Author

Iteration: re-add projection alias dedup, body-hash-aware

CI on commit 677e81b showed 04145_multiple_expressions_for_alias_parallel_replicas failing across 12 stateless variants:

Code: 179. DB::Exception: Multiple expressions
    __table3.name AS name and __table1.name AS name
for alias name.

That's case 2 of the regression test (SELECT * over a self-join with joined_subquery_requires_alias = 0) — exactly the conflict the dropped projection dedup was supposed to fix. Removing the dedup re-exposed the original bug.

Pushed: 32f3f01

The new commit re-adds the projection dedup but conditions it on the body of the expression. For each alias, the first occurrence wins; later occurrences are stripped only when their body differs.

std::unordered_map<String, IASTHash> alias_to_first_body_hash;
for (auto & child : projection_ast->children)
{
    auto * with_alias = dynamic_cast<ASTWithAlias *>(child.get());
    if (!with_alias) continue;
    const auto & alias = with_alias->alias;
    if (alias.empty()) continue;

    const auto body_hash = child->getTreeHash(/*ignore_aliases=*/true);
    auto [it, inserted] = alias_to_first_body_hash.emplace(alias, body_hash);
    if (!inserted && it->second != body_hash)
        with_alias->setAlias(String());
}

This handles both classes of duplication consistently:

  • Different bodies, same alias (case 2 of MULTIPLE_EXPRESSIONS_FOR_ALIAS with enabled Parallel Replicas #74324):
    __table1.name AS name, __table3.name AS name → strip the alias on the __table3.name. Receiver sees [id, name, value, __table3.name, __table3.value] — no alias conflict.

  • Same body, same alias (01890_materialized_distributed_join and the other previously-broken Fast tests):
    3x __table1.v AS v from SELECT td.v, td.k, td.v, tl.v, tl.k, td.v over a JOIN → keep the alias on every occurrence so the receiver's position-by-name lookup of the source stream still works.

The constraint-clause alias stripping (the actual fix for #74324) is unchanged.

Pre-PR validation gate

a) Deterministic repro? Yes — 04145 case 2 fails 100% of the time on commit 677e81b with the exact parallel_replicas_local_plan = 0 + joined_subquery_requires_alias = 0 settings.

b) Root cause explained? Yes — when parallel_replicas_local_plan = 0, the analyzed query tree is serialized to AST and sent to remote replicas. With joined_subquery_requires_alias = 0, SELECT * over a self-join produces projections with the same alias bound to different bodies (__table1.name vs __table3.name), which the remote analyzer rejects. The first commit's blanket dedup also stripped same-body-same-alias duplicates, breaking the receiver's position-by-name lookup of the source stream.

c) Fix matches root cause? Yes — the fix targets exactly the difference: dedup only when bodies differ, leave same-body duplicates alone.

d) Test intent preserved? Yes — 04145 still exercises all three patterns from #74324; the three Fast tests still verify their original distributed-query behavior; nothing weakened.

e) Both directions demonstrated? Yes — locally verified:

  • Without this commit (current branch HEAD 677e81b): 04145 case 2 fails with MULTIPLE_EXPRESSIONS_FOR_ALIAS.
  • With this commit: 04145 (all 3 cases), 03825_exists_constant_folded_action_name, 01018_Distributed__shard_num, 01890_materialized_distributed_join, 02559_multiple_read_steps_in_prewhere, and 00597_push_down_predicate_long all pass.

f) Fix is general? Yes — the dedup operates on every projection alias in every distributed query AST, using a body hash that ignores aliases at all levels. It correctly handles both observed manifestations of the conflict and is a single mechanism rather than a per-pattern patch.

Awaiting CI rerun.

Session: cron:clickhouse-ci-task-worker:20260501-144500

@alexey-milovidov alexey-milovidov self-assigned this May 3, 2026
Comment thread src/Planner/Utils.cpp Outdated
/// boundaries to keep the inner aliases intact.
///
/// See https://github.com/ClickHouse/ClickHouse/issues/74324.
void stripAliasesFromConstraintExpression(ASTPtr ast)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A question - isn't ASTVisitor meant for that? Is it anyhow better or more practical to use here (this function is simple, so I'm not sure)?

Also, why it is named "FromConstraintExpression"? - the implementation does not depend on that. So the name can be more generic.

Comment thread src/Planner/Utils.cpp Outdated
/// alias) preserves the alias on the expanded body, while the projection has the same
/// body without inner aliases. The remote replica's analyzer then sees the same alias
/// (e.g. `cond`) attached to two different bodies and throws
/// `MULTIPLE_EXPRESSIONS_FOR_ALIAS`. Stripping aliases from constraint clause

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a question - why the confusion with aliases arises in "constraints expressions" specifically? Can't it appear in other clauses?

Comment thread src/Planner/Utils.cpp Outdated
stripAliasesFromConstraintExpression(select_query->having());
stripAliasesFromConstraintExpression(select_query->qualify());

/// 2. Strip duplicate projection aliases ONLY when the duplicate has a

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a mechanical fixup of something that is already broken. Can't we fix it on a deeper level?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@groeneai, duplicate projection aliases should not happen. Why do you need it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question — they shouldn't, but in one specific configuration they do, and that's case 2 of the regression test. Trace:

SET joined_subquery_requires_alias = 0;
SELECT * FROM (SELECT * FROM t) ANY LEFT JOIN (SELECT * FROM t) USING id;
-- t has columns: id, name, value
  1. The matcher (unqualified *) resolves to ColumnNodes for id, name, value from the left subquery and name, value from the right subquery (USING dedups id).
  2. QueryAnalyzer::resolveMatcher (Resolve/QueryAnalyzer.cpp:2207) emits ProjectionNames containing each ColumnNode's column_name. For self-joins the original column names collide: the names list is [id, name, value, name, value]. ColumnNode pointers are distinct (different table sources) so the query tree is unambiguous; only the names collide.
  3. QueryAnalyzer::resolveProjectionExpressionNodeList (Resolve/QueryAnalyzer.cpp:3568) builds NamesAndTypes projection_columns directly from those names, preserving the duplicates.
  4. QueryNode::toASTImpl (Analyzer/QueryNode.cpp:514-516) then sets each AST projection's alias from projection_columns[i].name unconditionally, producing __table1.name AS name, __table3.name AS name in the AST sent to remote replicas. The receiver's analyzer rejects the same alias bound to two different bodies → MULTIPLE_EXPRESSIONS_FOR_ALIAS.

This only surfaces with parallel_replicas_local_plan = 0 AND joined_subquery_requires_alias = 0 AND SELECT * over a self-join. With joined_subquery_requires_alias = 1 (the default) the user is forced to alias the subqueries ((SELECT * FROM t) AS A), which qualifies columns and avoids the collision. So locally the query tree is fine; only the AST round-trip is broken.

The deeper-level fix would go in QueryNode::toASTImpl (or in resolveProjectionExpressionNodeList): when emitting the AST projection list, skip setting the alias on a later projection whose name collides with an earlier one bound to a different body. That's the same body-hash check we have here in Utils.cpp, just moved one layer up. I'd combine it with the PREWHERE analyzer fix from the other thread and remove this whole Utils.cpp postprocess.

Does that direction work for you? If yes I'll fold it into the same commit set as the QueryAnalyzer.cpp PREWHERE fix.

@groeneai

groeneai commented May 3, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the careful read — the three comments hit three real things, and the third one is the most important. Let me address each, then propose a deeper fix.

(1) "Why FromConstraintExpression? The implementation is generic."

You're right — the body of stripAliasesFromConstraintExpression doesn't depend on the clause it's called from; it's a generic recursive alias clearer that stops at subquery boundaries. I'll rename it (e.g. removeAliasesFromASTSubtree); using InDepthNodeVisitor is fine here too — the function is small but the visitor pattern is more conventional. Minor refactor.

(2) "Why constraint expressions specifically? Can't it appear elsewhere?"

I tested empirically against master with parallel_replicas_local_plan = 0, all queries using cond as a projection alias:

Clause Master result
PREWHERE cond fails with MULTIPLE_EXPRESSIONS_FOR_ALIAS
WHERE cond passes
HAVING cond passes
ORDER BY cond passes
GROUP BY cond passes

So in practice only PREWHERE triggers the asymmetry. The other clauses I included in this PR (WHERE / HAVING / QUALIFY) are dead code — happy to limit it to PREWHERE only. The reason PREWHERE is special is the third comment's territory.

(3) "Mechanical fixup of something that is already broken. Can't we fix it on a deeper level?"

Agreed — the analyzer is producing an asymmetric AST and the postprocess is hiding it. The deeper bug is in src/Analyzer/Resolve/QueryAnalyzer.cpp. Trace for the issue's repro:

SELECT cast(id1 AS UInt16) AS cond1, (id2 % 40000) AS cond2, (cond1 AND cond2) AS cond
FROM t PREWHERE cond

The receiver sees:

SELECT
    CAST(__table1.id1, 'UInt16') AS cond1,
    __table1.id2 % 40000        AS cond2,
    CAST(__table1.id1, 'UInt16') AND (__table1.id2 % 40000) AS cond     -- no inner aliases
FROM default.t AS __table1
PREWHERE
    (CAST(__table1.id1, 'UInt16') AS cond1) AND ((__table1.id2 % 40000) AS cond2) AS cond  -- inner aliases preserved

The bodies differ: the projection's cond body has inner cond1/cond2 aliases stripped, while PREWHERE's retains them.

Why PREWHERE ends up different

  1. Projection resolution (line 5479). When cond1/cond2 identifiers inside cond1 AND cond2 are resolved, tryResolveIdentifierFromAliases (line 1062) deep-clones each alias node and registers the clone in scope.aliases.node_to_remove_aliases:

    alias_node = alias_node->clone();
    scope_to_resolve_alias_expression->aliases.node_to_remove_aliases.push_back(alias_node);

    The projection's cond body now contains tracked clones C1/C2 of node1/node2 — they will get their aliases cleared at line 5683.

  2. PREWHERE resolution (line 5486). The cond identifier resolves to a deep clone D of the AND node. D is registered in node_to_remove_aliases, but its children — fresh deep copies D_c1/D_c2 of node1/node2 — carry the original cond1/cond2 aliases (clone preserves aliases) and are not registered.

  3. Line 5501 — second deep clone.

    prewhere_node = prewhere_node->clone();   // E = D.clone()
    ReplaceColumnsVisitor replace_visitor(scope.join_columns_with_changed_types, scope.context);
    replace_visitor.visit(prewhere_node);

    prewhere_node is replaced by a second deep clone so that ReplaceColumnsVisitor (for JOIN USING type changes) can rewrite the prewhere copy without leaking back into the projection. The new tree E (with children E_c1/E_c2) is yet another set of alias-bearing copies, none registered. The original D — the one that was in node_to_remove_aliases — is no longer in the query tree.

  4. Alias removal at line 5683 is non-recursive.

    for (auto & node : scope.aliases.node_to_remove_aliases)
        node->removeAlias();

    Only registered roots get cleared. The projection's tree is fine because every cond1/cond2 clone in it (C1/C2) was registered. PREWHERE's tree's inner clones — E_c1/E_c2 — aren't reached.

That's why the projection's body is alias-free and PREWHERE keeps inner aliases. It's PREWHERE-specific because PREWHERE is the only clause that gets the second deep clone (no other clause runs ReplaceColumnsVisitor).

Proposed deeper fix

Two orthogonal changes:

A. Make alias removal recursive. Replace the loops at lines 5680-5689 with a top-down InDepthQueryTreeVisitor walk that calls removeAlias() on every descendant of each registered node, stopping at QUERY / UNION / LAMBDA boundaries (those have their own scopes). This catches aliases on nodes that aren't directly registered but live inside a registered subtree.

B. Register the post-clone prewhere_node. After line 5503, push the new clone into node_to_remove_aliases:

prewhere_node = prewhere_node->clone();
ReplaceColumnsVisitor replace_visitor(scope.join_columns_with_changed_types, scope.context);
replace_visitor.visit(prewhere_node);
scope.aliases.node_to_remove_aliases.push_back(prewhere_node);   // NEW

Combined with A, the recursive walk over prewhere_node clears E, E_c1, E_c2. The asymmetry disappears.

With both in place, queryNodeToDistributedSelectQuery no longer needs to touch PREWHERE at all.

This is QueryAnalyzer-core territory — there are passes that read aliases off intermediate nodes (e.g. nodes_with_duplicated_aliases handling at lines 5588-5657, optimize_functions_to_subcolumns rewrites). I'd want to run the full functional + integration suite, including the analyzer-touch tests, before claiming the deeper fix is safe.

What about the regression test's case 2?

04145 case 2 is SELECT * over a self-join with joined_subquery_requires_alias = 0:

SELECT * FROM (SELECT * FROM t) ANY LEFT JOIN (SELECT * FROM t) USING id

This produces a projection list with two columns sharing alias name but bound to different sources (__table1.name vs __table3.name). Different root cause — it's about *-expansion producing duplicate projection aliases, not PREWHERE reuse. The body-hash-aware dedup in commit 32f3f01 handles this, but it's also mechanical. The deeper fix here would live in matcher resolution / *-expansion (have the analyzer either disambiguate the alias or drop it before the AST round-trip), and is independent of the PREWHERE fix above.

Plan

  1. Small (any time): rename the helper, drop the dead WHERE/HAVING/QUALIFY paths.
  2. Deeper fix for MULTIPLE_EXPRESSIONS_FOR_ALIAS with enabled Parallel Replicas #74324 (this PR or separate): implement A+B in QueryAnalyzer.cpp, drop the PREWHERE postprocessing, run the full test suite. Happy to do this here if you'd like it in one PR — just want a heads-up that an analyzer change is in scope.
  3. Case 2 (SELECT * duplicate aliases): I'd lean toward a separate PR focused on matcher / *-expansion disambiguation.

Let me know which direction you want to go and I'll push.

Comment thread src/Planner/Utils.cpp Outdated
Comment on lines +260 to +263
stripAliasesFromConstraintExpression(select_query->prewhere());
stripAliasesFromConstraintExpression(select_query->where());
stripAliasesFromConstraintExpression(select_query->having());
stripAliasesFromConstraintExpression(select_query->qualify());

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@groeneai This logic could be done on top of the query tree.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — fully agree. I posted a deeper-level proposal in #103806 (comment) (reply to @alexey-milovidov on the same mechanical fixup thread): the analyzer's node_to_remove_aliases walk in QueryAnalyzer::resolveQuery is non-recursive, and the post-clone prewhere_node (at Resolve/QueryAnalyzer.cpp:5501) is never registered for cleanup, which is why inner aliases survive on PREWHERE's body but not on the projection's. Fix is two changes inside QueryAnalyzer.cpp (recursive removeAlias walk + register post-clone prewhere_node), then this whole Utils.cpp postprocess for PREWHERE/WHERE/HAVING/QUALIFY goes away.

I was waiting on @alexey-milovidov to confirm the direction before scoping the analyzer change into this PR (vs a separate PR). With your independent agreement, I'd like to proceed with the deeper fix in this PR — let me know if you want to weigh in before I push it. The case-2 self-join projection-alias issue (your other comment) is a separate root cause and I'd address it at the deeper level too — see my reply on that thread.

Comment thread src/Planner/Utils.cpp Outdated
stripAliasesFromConstraintExpression(select_query->having());
stripAliasesFromConstraintExpression(select_query->qualify());

/// 2. Strip duplicate projection aliases ONLY when the duplicate has a

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@groeneai, duplicate projection aliases should not happen. Why do you need it?

@@ -0,0 +1,74 @@
-- Tags: long, no-parallel

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@groeneai Why no-parallel? Maybe it could be dropped?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — dropped in 7d4c68a. The test creates uniquely-named tables (t_74324_*) that are dropped at the end and uses session-scoped settings, so it has no shared state that requires sequential execution. The long tag is kept because the test exercises three configurations against parallel replicas.

The test creates uniquely-named tables (`t_74324_*`) that are dropped at
the end and uses session-scoped settings, so it has no shared state that
requires sequential execution. Per @novikd's review on PR ClickHouse#103806.
@devcrafter devcrafter self-assigned this May 7, 2026
@devcrafter

Copy link
Copy Markdown
Member

Addressing the MULTIPLE_EXPRESSIONS_FOR_ALIAS issue case by case
First pull request - #104303

alexey-milovidov added a commit that referenced this pull request May 14, 2026
…ysis`

Per review comment (#80310 (comment)...),
the query `SELECT *, day + 365 AS day` in this test triggers an unrelated
analyzer bug (`MULTIPLE_EXPRESSIONS_FOR_ALIAS`, issue #74324, fix PR
#103806 still open) when executed under parallel replicas. Random auto-PR
settings randomization can enable parallel replicas in any CI suite, so
CI was failing on the unrelated bug instead of validating #48881.

Add `SET enable_parallel_replicas = 0` and `SET automatic_parallel_replicas_mode = 0`
at the top of the test, with a comment pointing to #74324, so the guard
can be removed once that bug is fixed. This keeps the test deterministic
without resorting to the `no-parallel-replicas` tag or the
`parallel_replicas_blacklist.txt` entry that earlier reviewers asked to
remove.

CI report:
#80310

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…arallel-replicas

Resolve conflict in tests/parallel_replicas_blacklist.txt:
- Master removed the 74335 / Bad get / 00502_sum_map block; keep that removal.
- This branch removes the 74324 / MULTIPLE_EXPRESSIONS_FOR_ALIAS block
  (00597_push_down_predicate_long, 02559_multiple_read_steps_in_prewhere)
  because the alias-stripping fix in src/Planner/Utils.cpp makes those
  tests pass under parallel replicas.

src/Planner/Utils.cpp auto-merged cleanly: our stripAliasesFromConstraintExpression
helper coexists with master's standalone-subquery wrapping inside buildFilterInfo.
@groeneai

Copy link
Copy Markdown
Contributor Author

Rebased onto current master (merge commit f139a40f4907) — PR was DIRTY/CONFLICTING after ~9 days of master churn. The conflict was in tests/parallel_replicas_blacklist.txt: master removed the 74335 / Bad get / 00502_sum_map block; this branch removes the 74324 / MULTIPLE_EXPRESSIONS_FOR_ALIAS block (00597_push_down_predicate_long, 02559_multiple_read_steps_in_prewhere). Resolution: take both removals.

src/Planner/Utils.cpp auto-merged cleanly — stripAliasesFromConstraintExpression coexists with master's standalone-subquery wrapping in buildFilterInfo (PR #100695).

Diff vs origin/master is unchanged (4 files, +186 / -5):

  • src/Planner/Utils.cpp (+97)
  • tests/parallel_replicas_blacklist.txt (-5)
  • tests/queries/0_stateless/04145_*.sql (+74)
  • tests/queries/0_stateless/04145_*.reference (+15)

Still awaiting direction from @alexey-milovidov / @novikd on whether to fold the deeper analyzer-level fix (recursive removeAlias + register post-clone prewhere_node in QueryAnalyzer.cpp, per #103806 (comment)) into this PR or split it out. @devcrafter's parallel PR #104303 (narrower scope: only SELECT * over JOIN, in QueryNode::toASTImpl) is still open.

Address reviewer feedback by moving the `PREWHERE` half of the fix out
of `queryNodeToDistributedSelectQuery` and into the analyzer itself.

The original PR postprocessed the AST after `queryNodeToDistributedSelectQuery`
to clear aliases from constraint clauses. Reviewers (ClickHouse#103806 review threads
at `Utils.cpp:214`, `:223`, `:267`) flagged the approach as a "mechanical
fixup of something already broken" and asked for a deeper fix in the query
tree. This commit implements that.

Root cause (traced in `QueryAnalyzer::resolveQuery`):

* When resolving `PREWHERE cond` where `cond` is a projection alias, the
  analyzer deep-clones the alias body and registers the clone's root in
  `scope.aliases.node_to_remove_aliases` (see line ~1135 of
  `QueryAnalyzer.cpp`). The clone's *descendants*, which carry the inner
  aliases (e.g. `cond1`, `cond2`) from the original projection body, are
  not registered.
* `PREWHERE` then deep-clones the tree a second time so that
  `ReplaceColumnsVisitor` can rewrite types for `JOIN USING` without
  leaking back into the projection. After this clone the originally-
  registered node is no longer in the query tree, while the new clone's
  root carries the `cond` alias and is unregistered.
* The cleanup loop at line ~5840 calls `removeAlias` on each registered
  node non-recursively, so the projection's inner aliases get cleared
  while `PREWHERE`'s do not. When the AST is shipped to a remote replica
  under `parallel_replicas_local_plan = 0` the receiver sees the same
  alias bound to two different bodies and throws
  `MULTIPLE_EXPRESSIONS_FOR_ALIAS`.

Fix:

1. Make alias removal recursive in `QueryAnalyzer.cpp`: walk each
   registered node top-down and clear aliases on every descendant,
   stopping at `QUERY` / `UNION` / `LAMBDA` boundaries (those are
   separate scopes that handle their own cleanup).
2. Register the post-clone `prewhere_node` in `node_to_remove_aliases`
   so the recursive walk reaches its inner aliases.

`Utils.cpp` no longer touches `PREWHERE` / `WHERE` / `HAVING` /
`QUALIFY`. The empirical check in the issue thread showed only `PREWHERE`
ever triggered the asymmetry; the rest of the constraint clauses were
dead code in the postprocess and are dropped together with it.

The duplicate-projection-alias dedup remains in
`queryNodeToDistributedSelectQuery` for now: it covers case 2 of the
regression test (`SELECT *` over a self-join with overlapping column
names) and a different root cause is being addressed in ClickHouse#104303 by
@devcrafter. Once that lands the dedup here can be removed.

Verified locally with the new binary against the regression test
`04145_multiple_expressions_for_alias_parallel_replicas` and the
original repro from ClickHouse#74324; both pass. Existing alias tests
(`00057_join_aliases`, `00138_table_aliases`,
`00157_aliases_and_lambda_formal_parameters`) remain green.

Refs ClickHouse#74324
Refs ClickHouse#103806 (comment)
Refs ClickHouse#104303

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member

Pushed 772bd60 — implements the deeper fix in QueryAnalyzer.cpp that was discussed in the Mechanical fixup and This logic could be done on top of the query tree review threads.

What changed

  1. src/Analyzer/Resolve/QueryAnalyzer.cpp — make node_to_remove_aliases cleanup recursive (stopping at QUERY / UNION / LAMBDA boundaries), and register the post-clone prewhere_node in node_to_remove_aliases so the recursive walk reaches its inner aliases.

  2. src/Planner/Utils.cpp — drop the PREWHERE / WHERE / HAVING / QUALIFY postprocessing. The empirical check in this comment showed only PREWHERE ever triggered the asymmetry, and the analyzer fix above handles that at the root.

  3. The duplicate-projection-alias dedup is kept in queryNodeToDistributedSelectQuery for now, covering case 2 of the regression test (SELECT * over a self-join with overlapping column names). That root cause is being addressed at a deeper level in @devcrafter's Fix MULTIPLE_EXPRESSIONS_FOR_ALIAS on SELECT * over JOIN under parallel replicas #104303; once it lands, this dedup can be removed too.

Why this targets the right thing

Trace from the comment thread, condensed: when PREWHERE cond resolves where cond is a projection alias built from inner aliases like cond1, cond2:

  • The projection's body has its inner aliases stripped because every cond1/cond2 clone gets registered for cleanup.
  • PREWHERE clones the alias body twice (once during identifier resolution, once after for ReplaceColumnsVisitor). Only the first-clone root is registered; the second-clone descendants are unreachable from the cleanup loop, so the inner aliases survive on PREWHERE's body.
  • The receiver's analyzer sees cond with no inner aliases on the projection AND cond with inner cond1/cond2 on PREWHERE — two different bodies → MULTIPLE_EXPRESSIONS_FOR_ALIAS.

The two-line fix in the analyzer kills both halves: (A) the recursive walk reaches the inner clones; (B) registering the post-clone prewhere_node ensures the walk has the right root.

Validation

Verified locally against the new binary:

  • 04145_multiple_expressions_for_alias_parallel_replicas (all three cases): passes against the new binary.
  • The original repro from MULTIPLE_EXPRESSIONS_FOR_ALIAS with enabled Parallel Replicas #74324: passes.
  • Spot-check of existing alias tests (00057_join_aliases, 00138_table_aliases, 00157_aliases_and_lambda_formal_parameters): all still pass.
  • Various analyzer patterns (HAVING / WHERE / subquery / JOIN USING + PREWHERE): all still pass.

cc @alexey-milovidov @novikd @devcrafter

@alexey-milovidov

Copy link
Copy Markdown
Member

This was fixed by #105146. Let's update the branch.

Comment thread src/Planner/Utils.cpp
groeneai and others added 2 commits May 23, 2026 11:22
The clickhouse-gh review on PR ClickHouse#103806 (2026-05-23) flagged a missing
test case for the alias body sequence `A, B, B` (e.g.
`SELECT r.name, l.name, l.name` with `parallel_replicas_local_plan = 0`).

Tracing the actual AST emitted by `queryNodeToDistributedSelectQuery`
shows the analyzer disambiguates the second and third columns to a
distinct alias `l.name`, so the alias groups in the sent AST are
`name` (one occurrence) and `l.name` (two occurrences with the same
body). The dedup then leaves both `l.name` entries intact via the
body-hash equality branch and never collapses the body-B occurrences
against body A.

Added two cases to `04145_multiple_expressions_for_alias_parallel_replicas`:
  4. `SELECT r.name, l.name, l.name` over a 2-way join (the example).
  5. `SELECT *` over a 3-way self-join with `USING`, where the
     projection produces the same bare alias three times with bodies
     from three different sources, exercising the dedup beyond the
     existing 2-occurrence case.
@clickhouse-gh

clickhouse-gh Bot commented May 30, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.40% 84.40% +0.00%
Functions 91.40% 91.40% +0.00%
Branches 77.00% 76.90% -0.10%

Changed lines: Changed C/C++ lines covered by tests: 33/36 (91.67%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov merged commit d53619c into ClickHouse:master May 30, 2026
165 of 166 checks passed
@robot-ch-test-poll1 robot-ch-test-poll1 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 30, 2026
alexey-milovidov added a commit that referenced this pull request Jun 1, 2026
…y_analysis

Issue #74324 (`MULTIPLE_EXPRESSIONS_FOR_ALIAS` under parallel replicas) was
closed by the fix #103806, so the old comment ("remove this guard once #74324
is fixed") is misleading: it invites removing the guard now that #74324 is
closed.

However, #103806 covered other shapes of that bug, not the self-shadowing
alias used by this test (`SELECT *, day + 365 AS day`, where `day` is also a
column). Verified locally against master with #103806 present: under forced
parallel replicas the remote replica's analyzer still sees two bodies for the
alias `day` and throws `MULTIPLE_EXPRESSIONS_FOR_ALIAS`. The `SET enable_parallel_replicas = 0`
guard is therefore still required for this test to stay green in the
`ParallelReplicas` CI suite (the in-query `SET` overrides the suite-level
forcing). Auto-PR statistics mode (`automatic_parallel_replicas_mode = 2`)
already passes, since it does not actually execute with parallel replicas.

Update the comment to state this accurately and tie removal of the guard to the
analyzer handling this shape, not to #74324 being closed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MULTIPLE_EXPRESSIONS_FOR_ALIAS with enabled Parallel Replicas

5 participants