Fix LOGICAL_ERROR when a column transformer sets an alias on an asterisk by groeneai · Pull Request #109223 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix LOGICAL_ERROR when a column transformer sets an alias on an asterisk#109223

Open
groeneai wants to merge 8 commits into
ClickHouse:masterfrom
groeneai:fix/109214-column-transformer-alias-on-asterisk
Open

Fix LOGICAL_ERROR when a column transformer sets an alias on an asterisk#109223
groeneai wants to merge 8 commits into
ClickHouse:masterfrom
groeneai:fix/109214-column-transformer-alias-on-asterisk

Conversation

@groeneai

@groeneai groeneai commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Closes: #109214

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed a LOGICAL_ERROR ("Can't set alias of ... of QualifiedAsterisk", server abort in debug/sanitizer builds) when a column transformer that assigns a name (APPLY with a name prefix, or REPLACE) is applied to an expression that expands to an asterisk or COLUMNS matcher, e.g. SELECT * APPLY (x -> t.*, 'p_'). Such queries now return a clear BAD_ARGUMENTS error.

Description

Found by the AST fuzzer (AST fuzzer (amd_debug, targeted), STID 1493-3ccb):
report https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=109082&sha=62139e418dccb913277c31a483b2547ed4e230e7&name_0=PR&name_1=AST%20fuzzer%20%28amd_debug%2C%20targeted%29

Fuzzed query:

SELECT DISTINCT * APPLY (x -> compound_value.*, 'f_') FROM test_table__fuzz_23 FINAL PREWHERE ...

Root cause: ASTColumnsApplyTransformer::transform and ASTColumnsReplaceTransformer::transform call setAlias on the transformer's result node. Only ASTWithAlias nodes (identifiers, functions, literals, subqueries) support aliases. When the APPLY lambda body (or the REPLACE replacement) expands to an ASTAsterisk / ASTQualifiedAsterisk / ASTColumnsMatcher, the default IAST::setAlias throws LOGICAL_ERROR, which aborts the server in debug/sanitizer builds. This is the old-analyzer path (enable_analyzer=0, via TranslateQualifiedNamesVisitor); the analyzer resolves the lambda body via resolveLambda and already returns a clean exception.

Fix: a small setTransformerAlias helper checks the node is an ASTWithAlias before setting the alias, otherwise throws a clear BAD_ARGUMENTS. This mirrors the existing precedent for the same class of error (asterisk with column aliases in CREATE VIEW, and the EXCEPT transformer defensive handling).

Reproducers that aborted before this change and now return BAD_ARGUMENTS (all with enable_analyzer=0):

SELECT * APPLY (x -> compound_value.*, 'f_') FROM (SELECT 1 AS a);
SELECT * APPLY (x -> *, 'f_') FROM (SELECT 1 AS a);
SELECT * APPLY (x -> COLUMNS('a'), 'f_') FROM (SELECT 1 AS a);
SELECT * REPLACE (compound_value.* AS a) FROM (SELECT 1 AS a);

@groeneai

groeneai commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

@groeneai

groeneai commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

cc @evillique @george-larionov — could you review this? It converts a LOGICAL_ERROR (server abort in debug/sanitizer) into a clean BAD_ARGUMENTS when a column transformer that assigns a name (APPLY with a prefix, or REPLACE) is applied to an expression expanding to an asterisk / qualified asterisk / COLUMNS matcher. The fix guards the three setAlias call sites in ASTColumnsTransformers.cpp with an ASTWithAlias check.

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Jul 2, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [72e4ca8]

Summary:

job_name test_name status info comment
AST fuzzer (amd_debug, targeted, old_compatibility) FAIL
Logical error: Block structure mismatch in A stream: different columns: (STID: 0993-2cc2) FAIL cidb, issue

AI Review

Summary

This PR hardens named column transformers around asterisk / matcher expansions and restores old-analyzer parity for prefixed APPLY / named REPLACE in the default analyzer, including the untuple naming edge cases that came up during review. I re-checked the current head against the full prior discussion and did not find any remaining correctness or parity issues in the modified code.

Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Jul 2, 2026
…fix in the analyzer

Two related column-transformer issues:

1. Old analyzer crash. APPLY (with a name prefix) and REPLACE assign an
   alias to their result node. Only ASTWithAlias nodes support aliases, so
   when the transformer expression expands to an asterisk / qualified
   asterisk / COLUMNS matcher (e.g. SELECT * APPLY (x -> t.*, 'p_')), the
   default IAST::setAlias threw a LOGICAL_ERROR and aborted the server in
   debug/sanitizer builds. Reject such queries with a clear BAD_ARGUMENTS.
   Found by the AST fuzzer (STID 1493-3ccb).

2. New (default) analyzer dropped the APPLY name prefix. APPLY (expr, 'p_')
   is documented and works under the old analyzer (result named p_ + column
   name), but the analyzer never carried column_name_prefix from the parser
   into ApplyColumnTransformerNode, so the prefix was silently ignored (e.g.
   SELECT * APPLY (toString, 'f_') produced toString(a) instead of f_a).
   Carry the prefix QueryTreeBuilder -> ApplyColumnTransformerNode ->
   resolveMatcher / toASTImpl so the projection name matches the old analyzer.

Closes: ClickHouse#109214

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai groeneai force-pushed the fix/109214-column-transformer-alias-on-asterisk branch from 9213084 to 81bdafb Compare July 3, 2026 00:04
@groeneai

groeneai commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author
Pre-PR validation gate (click to expand) — analyzer follow-up
# Question Answer
a Deterministic repro? Yes. clickhouse-local -q "SELECT * APPLY (toString, 'f_') FROM (SELECT 1 AS a, 2 AS b) FORMAT TSVWithNames SETTINGS enable_analyzer=1" printed toString(a) toString(b) (prefix dropped) before the fix; now prints f_a f_b, matching enable_analyzer=0.
b Root cause explained? buildColumnTransformers (QueryTreeBuilder.cpp) built ApplyColumnTransformerNode from only the lambda/function and never copied the parser's column_name_prefix. The node had no field for it, resolveMatcher used only the lambda/function projection name, and toASTImpl could not reconstruct it. So APPLY (expr, 'prefix') (documented, working under the old analyzer) silently ignored the prefix under the default analyzer.
c Fix matches root cause? Yes. Carry column_name_prefix end to end: parser AST -> ApplyColumnTransformerNode (new field + ctor arg, plumbed through hash/equal/clone) -> resolveMatcher prepends it to the accumulated projection name -> toASTImpl restores it for round-trips.
d Test intent preserved / new tests added? Extended 04401_columns_transformer_alias_on_asterisk with an enable_analyzer=1 section asserting the prefix is applied for APPLY (func, 'f_'), APPLY (lambda, 'f_'), chained APPLY (p_) APPLY (q_), REPLACE, and an EXPLAIN QUERY TREE round-trip. The existing old-analyzer crash guards are unchanged.
e Both directions demonstrated? Yes. Pre-fix binary (Build ID 314fd246): the new enable_analyzer=1 assertions FAIL (diff shows toString(a), plus(a, 1), plus(plus(a, 1), 1), EXPLAIN 0). Post-fix binary (12ab4e8c): test passes 30/30 with randomization.
f Fix is general across code paths? Yes. Both APPLY variants (lambda and function) carry the prefix via the single ctor argument; resolveMatcher applies it in the one finalization point shared by both. The old-analyzer setAlias guard from the first commit still covers the asterisk-expansion crash on that path.
g Fix generalizes across inputs? Yes. Verified old-vs-new analyzer parity (name and value) across: single function/lambda prefix, chained uniform prefixes, EXCEPT+prefixed-APPLY, REPLACE+prefixed-APPLY, aggregate APPLY (sum, 's_'), qualified a.*, empty prefix, and multi-row tables. Exotic mixed prefix / no-prefix aggregate chains keep a pre-existing analyzer projection-name difference that exists without any prefix too; values are identical and those cases are not asserted.
h Backward compatible? Yes. No setting, format, or on-disk change. It makes the default analyzer match the documented old-analyzer naming that had been silently dropped; queries that worked keep working (parity verified).
i Invariants and contracts preserved? Yes. The prefix is prepended only when an APPLY transformer with a non-empty prefix executed (execute_apply_transformer && !prefix.empty()); otherwise the existing node_projection_names[0] path is untouched. updateTreeHashImpl / isEqualImpl now include the prefix so equal trees stay equal and differing prefixes hash apart. toASTImpl restores the prefix so QueryTree -> AST round-trips are stable.

Session id: cron:clickhouse-worker-slot-3:20260702-233000

Comment thread src/Analyzer/Resolve/QueryAnalyzer.cpp Outdated
…e names

The new-analyzer APPLY name-prefix support added earlier in this PR took the
prefix base from node_to_projection_name, which qualifyColumnNodesWithProjectionNames
may set to a qualified name (t1.a) in a scope that requires qualification. The
legacy AST path prefixes ASTIdentifier::shortName(). So `t1.* APPLY (toString, 'f_')`
in a qualifying scope produced f_t1.a instead of f_a.

Fix: accumulate the prefixed display name from the short column name, independent
of the (possibly qualified) node_to_projection_name value.

The prefix also failed to reach a chained transformer when the first transformer
was an identity lambda (`APPLY (x -> x, 'p_') APPLY toString` gave toString(a),
not toString(p_a)): the identity lambda leaves the resolved node pointing at the
original matched node, which is already present in both node_to_projection_name
and resolved_expressions, so an emplace did not update the name. Overwrite both
maps when the resolved node pointer is reused.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

groeneai commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author
Pre-PR validation gate (click to expand)
# Question Answer
a Deterministic repro? Yes. SELECT 99 AS a, x.* APPLY (toString, 'f_') FROM (SELECT 1 AS a, 2 AS b) AS x (enable_analyzer=1) deterministically gave f_x.a; SELECT * APPLY (x -> x, 'p_') APPLY toString FROM (SELECT 1 AS a) gave toString(a).
b Root cause explained? Yes. (1) The prefix was applied to result_projection_names.back(), seeded from node_to_projection_name which qualifyColumnNodesWithProjectionNames sets to a qualified name (x.a) in a qualifying scope -> f_x.a. (2) An identity lambda leaves the resolved node pointing at the original matched node, already present in node_to_projection_name (so emplace was a no-op) and cached in resolved_expressions (read by resolveExpressionNode before the map) -> chained transformer saw the pre-prefix name.
c Fix matches root cause? Yes. (1) Accumulate the prefixed name from the short column_name in a separate variable. (2) insert_or_assign the map AND refresh the existing resolved_expressions entry for the reused node. No symptom-masking.
d Test intent preserved / new tests added? Yes. Existing cases unchanged. Added enable_analyzer=1 coverage: qualifying-scope short name (f_a not f_x.a), and chained-identity for both function and lambda forms (toString(p_a)).
e Both directions demonstrated? Yes. Built a pre-fix binary (Build ID 18f5722) and a fixed binary (48825ab). Pre-fix: f_x.a / toString(a). Fixed: f_a / toString(p_a).
f Fix is general across code paths? Yes. Both APPLY branches (LAMBDA and FUNCTION) flow through the single shared prefix/emplace block that was fixed. REPLACE is unaffected (no prefix).
g Fix generalizes across inputs? Yes. Verified chain orderings: APPLY toString APPLY (upper, 'p_') -> p_toString(a); APPLY (identity, 'p_') APPLY toString APPLY (upper, 'q_') -> q_toString(p_a); APPLY (x -> x, 'p_') APPLY (x -> x, 'q_') -> q_p_a. Non-prefixed transformers correctly reset the accumulator; standalone column refs (SELECT a, ...) are not contaminated by the map overwrite.
h Backward compatible? Yes. Projection-name-only change on the new analyzer that makes it match the legacy (enable_analyzer=0) semantics that were the intended behavior. No setting default, on-disk, wire, or metadata format change.
i Invariants and contracts preserved? Yes. node_to_projection_name and resolved_expressions stay consistent (both updated to the same value); result_projection_names.size() and the 1-projection-name invariant are unchanged; the overwrite is keyed by the specific reused node pointer, so it does not affect other nodes.

Session id: cron:clickhouse-worker-slot-1:20260703-004300

…sion, not the prefix alias

The earlier analyzer fix stored the prefixed display name (f_a, p_a) into
node_to_projection_name for every APPLY result, including freshly created
function/lambda nodes. A later unprefixed APPLY then formatted its argument
from that prefix alias instead of the actual previous expression, so the
analyzer diverged from the legacy AST path for function-form chains:

  SELECT * APPLY (identity, 'p_') APPLY toString  -> toString(p_a)   (analyzer)
                                                  -> toString(identity(a))  (legacy)
  SELECT * APPLY (toString, 'f_') APPLY upper     -> upper(f_a)      (analyzer)
                                                  -> upper(toString(a))     (legacy)

Split the two roles:
- Terminal display name of this column keeps the accumulated prefix (f_a, q_p_a).
- The name a chained transformer observes for this node is the natural resolved
  name for a freshly created node (toString(identity(a))), matching the legacy
  path. Only a reused node (an identity lambda x -> x that resolves back to the
  original matched node, which has no legacy equivalent) carries the accumulated
  prefix forward, so a chained transformer still sees toString(p_a) there.

The overwrite of node_to_projection_name / resolved_expressions is now gated on
node-pointer reuse, so it no longer rewrites the cached name of a fresh function
node. The short-name accumulator for qualified scopes (f_a, not f_x.a) is kept.

Regression coverage in 04401 asserts the analyzer section matches the legacy
section for both function-form chains and the identity-lambda reuse case.
@groeneai

groeneai commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author
Pre-PR validation gate (click to expand)
# Question Answer
a Deterministic repro? Yes. SELECT * APPLY (identity, 'p_') APPLY toString FROM (SELECT 1 AS a) and SELECT * APPLY (toString, 'f_') APPLY upper FROM (SELECT 1 AS a) with enable_analyzer=1 deterministically produce toString(p_a) / upper(f_a) on the pre-fix binary; legacy enable_analyzer=0 produces toString(identity(a)) / upper(toString(a)).
b Root cause explained? The earlier analyzer fix stored the prefixed display name into node_to_projection_name (and refreshed resolved_expressions) for every APPLY result, including freshly created function nodes. A later unprefixed APPLY f reads that cached name as its argument, so it formatted f(p_a) instead of f(<previous expr>), diverging from the legacy AST path.
c Fix matches root cause? Yes. Split the display name (keeps the accumulated prefix) from the chaining name (natural resolved name for a fresh node). The overwrite of node_to_projection_name / resolved_expressions is gated on node-pointer reuse, so it no longer rewrites a fresh function node's cached name.
d Test intent preserved / new tests added? Yes. 04401 now asserts the analyzer section matches the legacy section for function-form chains (upper(toString(a)), toString(identity(a))) and preserves the identity-lambda reuse case (toString(p_a)). No assertions weakened.
e Both directions demonstrated? Yes. Pre-fix binary (Build ID 48825ab) FAILS 04401 (upper(f_a) / toString(p_a)); fixed binary (8b87c12) PASSES, 30/30 with randomization and 5/5 with --no-random-*. Only the analyzer section changed; the legacy section already produced the correct values.
f Fix is general across code paths? Yes. The single accumulation site in resolveMatcher handles both APPLY forms (lambda and function) and chained transformers uniformly; the reuse gate is computed once per transformer iteration from the input node pointer, covering identity function, identity lambda, and wrapping function/lambda alike. No sibling code path carries this logic.
g Fix generalizes across inputs? Yes. Verified across chain shapes: prefixed-then-unprefixed (upper(toString(a))), unprefixed-then-prefixed (q_toString(a)), prefixed-prefixed (q_p_a, g_f_a), triple chains (q_toString(identity(a))), identity function vs identity lambda, and qualifying scope (short name f_a, not f_x.a). REPLACE and plain columns unaffected.
h Backward compatible? N/A. No setting default, on-disk/wire format, or validation change. This corrects analyzer projection-name output to match the legacy path for a feature added earlier in this same PR (never shipped).
i Invariants and contracts preserved? Yes. The invariant is legacy/analyzer parity of projection names, which is now restored for every chain shape. node_to_projection_name and resolved_expressions continue to map a node pointer to its correct display name; the fresh-node branch uses emplace (a fresh pointer is never already present), the reuse branch uses insert_or_assign to update the existing entry.

Session id: cron:clickhouse-worker-slot-1:20260703-031100

Comment thread src/Analyzer/Resolve/QueryAnalyzer.cpp
…BAD_ARGUMENTS

The default analyzer accepted a named transformer whose expression is a bare
matcher/asterisk (APPLY (x -> *, 'prefix'), APPLY (x -> COLUMNS(...), 'prefix'),
REPLACE (COLUMNS(...) AS a)): on a one-column input it renamed the reused column
(f_a), and on a multi-column input it threw a generic UNSUPPORTED_METHOD. The
legacy analyzer rejects all of these with BAD_ARGUMENTS (setAlias on a
non-aliasable asterisk node).

Reject them in resolveMatcher before the matcher is expanded, when the lambda
body (for a prefixed APPLY) or the REPLACE replacement is itself a MatcherNode.
A matcher nested inside a function (x -> tuple(*)) is not a bare matcher and
stays allowed, matching the legacy path.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

groeneai commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author
Pre-PR validation gate (click to expand)
# Question Answer
a Deterministic repro? Yes. SELECT * APPLY (x -> *, 'f_') FROM (SELECT 1 AS a) SETTINGS enable_analyzer=1 returned f_a (wrongly accepted); the multi-column form returned UNSUPPORTED_METHOD; REPLACE (COLUMNS('a') AS a) returned a. All should be BAD_ARGUMENTS.
b Root cause explained? In resolveMatcher, a named transformer whose expression is a bare matcher/asterisk is resolved into a ListNode; the block only rejected size != 1 and flattened node_list_nodes[0]. On a one-column input it renamed the single reused column (f_a); on multi-column it hit the generic UNSUPPORTED_METHOD. Neither mirrors the legacy path, which rejects with BAD_ARGUMENTS when setTransformerAlias runs on a non-aliasable asterisk node.
c Fix matches root cause? Yes. Reject up front, before the matcher is expanded: for a prefixed APPLY when the lambda body is a MatcherNode, and for REPLACE when the replacement expression is a MatcherNode, throw BAD_ARGUMENTS. This is the analyzer-side equivalent of the legacy setTransformerAlias guard.
d Test intent preserved / new tests added? Yes. 04401 gains enable_analyzer=1 negative coverage for every case (one-column and multi-column APPLY/REPLACE over *, t.*, COLUMNS(...)), plus a positive x -> tuple(*) case in both sections to guard against over-rejection. Prior positive cases unchanged.
e Both directions demonstrated? Yes. Pre-fix binary (Build ID 8b87c12) accepts/misreports; fixed binary (57dcee9) rejects with BAD_ARGUMENTS. Full .sql output matches the reference on the fixed binary and differs on the pre-fix binary.
f Fix is general across code paths? Yes. Both named-transformer sites are covered: the prefixed-APPLY lambda branch and the REPLACE branch. The function-form APPLY expression is never a bare matcher, so it needs no guard.
g Fix generalizes across inputs? Yes. Verified for *, qualified t.*, and COLUMNS('a'); for one-column and multi-column inputs; and for both APPLY-lambda and REPLACE. Over-rejection boundary checked: a matcher nested in a function (x -> tuple(*), x -> [*]) and untuple stay accepted, matching the legacy path.
h Backward compatible? Yes. This tightens the default analyzer to match the long-standing legacy-analyzer BAD_ARGUMENTS behavior for a query shape that either errored differently or silently renamed a column. No setting/format change; no SettingsChangesHistory entry needed.
i Invariants and contracts preserved? Yes. The guard runs before node is mutated, so the downstream size != 1 / single-projection-name invariants are unchanged for all still-valid inputs. The rejection is a plain early throw with no state left behind.

Session id: cron:clickhouse-worker-slot-2:20260703-044900

Comment thread src/Analyzer/Resolve/QueryAnalyzer.cpp
A named APPLY over untuple (SELECT * APPLY (x -> untuple(x), 'f_')) resolved
to a ListNode of tupleElement calls, one per tuple field, and the matcher
rejected it with a generic UNSUPPORTED_METHOD (size != 1). The legacy analyzer
instead expands each element with the prefix applied to the short column name
(f_a.1, f_a.id). Expand the list into sibling projection columns to match, but
only when untuple is the terminal transformer (a transformer chained after
untuple is rejected by both analyzers).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

groeneai commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate (commit 247a715)

Pre-PR validation gate (click to expand)
# Question Answer
a Deterministic repro? Yes. SET enable_analyzer=1; SELECT * APPLY (x -> untuple(x), 'f_') FROM (SELECT (1,2) AS a) throws UNSUPPORTED_METHOD ("resolved as list node with size 2. Expected 1") every time; the old analyzer returns f_a.1 f_a.2.
b Root cause explained? untuple resolves to a ListNode of tupleElement calls (one per tuple field). resolveMatcher's named-APPLY path only handles a single result node: the size != 1 branch throws UNSUPPORTED_METHOD. The legacy analyzer instead expands each element and prefixes the short column name (f_a.1, f_a.id), so the new analyzer diverged from the PR's old/new parity invariant.
c Fix matches root cause? Yes. In the size != 1 block, when the result is an untuple expansion (every element is tupleElement(arg, const_name)) and this is the terminal transformer, expand the list into sibling projection columns named <prefix><short_column>.<field> (field taken from the constant second arg, exactly as the legacy path builds it). Not a bounds-widening or tag band-aid.
d Test intent preserved / new tests added? Preserved. 04401 gains enable_analyzer=1 coverage for unnamed/named/multi-column/function-form untuple plus a chained-after-untuple negative (serverError UNSUPPORTED_METHOD). Old-analyzer section unchanged; existing untuple/projection/transformer tests still pass.
e Both directions demonstrated? Yes, at the real test-harness level (not just clickhouse local). Pre-fix binary (Build ID 57dcee9): 04401 FAILS with the cited UNSUPPORTED_METHOD. Fixed binary (Build ID c6d5741): PASSES, 20/20 with randomization.
f Fix is general across code paths? Yes. The single ListNode expansion site in resolveMatcher is where every named APPLY result funnels (both LAMBDA and FUNCTION forms), so both APPLY (x -> untuple(x), 'f_') and APPLY (untuple, 'f_') are covered. No sibling site duplicates this logic.
g Fix generalizes across inputs (params/datatypes/wrappers)? Yes. Verified parity for unnamed tuples (f_a.1), named tuples (f_a.id), multi-column matchers (f_a.1 f_a.2 f_b.1 f_b.2), 3-field and mismatched-arity tuples, and function-form untuple. Non-tuple input still errors identically in both analyzers (untuple argument must have compound type, from resolveFunction, unchanged).
h Backward compatible? Yes. No setting default, on-disk/wire format, or replication metadata changed. The change only makes the new analyzer accept a query the old analyzer already accepted, converging behavior (no SettingsChangesHistory.cpp entry needed).
i Invariants and contracts preserved? Yes. The matcher output list / result_projection_names count invariant is upheld: k tupleElement nodes are pushed as k sibling nodes with k names (k-1 in the loop + the last via the existing loop tail), so resolveExpressionNodeList sees matching node/name counts. The expansion fires only when untuple is the terminal transformer, so a chained transformer still hits the unchanged size != 1 throw, matching the old analyzer's rejection of chained untuple.

Session id: cron:clickhouse-worker-slot-2:20260703-063600

Comment thread src/Analyzer/Resolve/QueryAnalyzer.cpp Outdated
…y name

A named untuple chained after an earlier transformer prefixed each output
element from the original short column name (q_a.N), diverging from the legacy
path which uses the accumulated display name of the node feeding untuple.
Base the element names on apply_prefixed_projection_name so chained forms come
out as q_identity(a).N / q_p_a.N (matching enable_analyzer=0), while a direct
untuple stays f_a.N.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

groeneai commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author
Pre-PR validation gate (click to expand)
# Question Answer
a Deterministic repro? Yes. SET enable_analyzer=1; SELECT * APPLY identity APPLY (untuple, 'q_') FROM (SELECT (1, 2) AS a) produced q_a.1 q_a.2 (Build ID c6d5741); should be q_identity(a).1 q_identity(a).2 per the legacy path.
b Root cause explained? The untuple special-case in resolveMatcher built each element name as apply_column_name_prefix + column_name + '.' + field, using the original short column name. It ignored apply_prefixed_projection_name, the display-name accumulator prior transformers update, so a chained untuple hardcoded q_a.N instead of following the node feeding untuple.
c Fix matches root cause? Yes. Base each element on apply_prefixed_projection_name (the accumulated display name) instead of the original short column_name. Untuple is the terminal transformer, so the accumulator holds exactly the pre-untuple display name.
d Test intent preserved / new tests added? Yes. 04401 gains a chained-untuple regression under both enable_analyzer=0 and =1 (q_identity(a).N, q_p_a.N), both sections asserting identical output (parity). Prior cases unchanged.
e Both directions demonstrated? Yes, Build-ID-verified. Pre-fix (c6d5741): both chains give q_a.N. Post-fix (23357d2): q_identity(a).N / q_p_a.N, matching enable_analyzer=0. Test FAILs before / PASSes after (20/20 runs, deterministic + randomized).
f Fix is general across code paths? Yes. This is the single site that names untuple-expanded elements. The direct-untuple case (accumulator = short column name) is unchanged; only the chained case is corrected. No sibling path builds these names.
g Fix generalizes across inputs? Yes. Verified parity for longer/nested chains (q_r_p_a.N, q_identity(identity(a)).N), function-form and prefix-form chains, named-field tuples (q_p_a.id/q_p_a.v), 3-field tuples, and multi-column inputs (q_p_a.N q_p_b.N). Direct untuple stays f_a.N.
h Backward compatible? N/A behavior-wise (both prior behaviors were buggy/inconsistent). This makes enable_analyzer=1 match the long-standing enable_analyzer=0 naming; no setting default, on-disk, or wire format changes.
i Invariants and contracts preserved? Yes. node/name count invariant into resolveExpressionNodeList is unchanged (only the name string changes). The untuple branch still requires is_last_transformer; node_to_projection_name per-element entries still use the same per-element name. The old/new analyzer parity invariant this PR upholds is restored for chained untuple.

Session id: cron:clickhouse-worker-slot-0:20260703-090000

Comment thread src/Analyzer/Resolve/QueryAnalyzer.cpp
The untuple list-expansion special-case in QueryAnalyzer::resolveMatcher only
ran for execute_apply_transformer, so a named REPLACE over untuple such as
SELECT * REPLACE (untuple(a) AS a) FROM (SELECT (1, 2) AS a) fell through to the
size != 1 UNSUPPORTED_METHOD throw under enable_analyzer=1, while enable_analyzer=0
accepted it and expanded to a.1, a.2 (ActionsVisitor::doUntuple aliases the untuple
to the REPLACE target name).

Relax the guard to also fire for execute_replace_transformer. For REPLACE the
prefix is empty and the display-name accumulator holds the replaced column name,
so the element names come out as the REPLACE target base (a.1, a.id), matching the
old analyzer. The is_last_transformer gate is unchanged: a transformer chained
after untuple is rejected by both analyzers.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

groeneai commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author
Pre-PR validation gate (click to expand)
# Question Answer
a Deterministic repro? Yes. SELECT * REPLACE (untuple(a) AS a) FROM (SELECT (1, 2) AS a) SETTINGS enable_analyzer=1 throws UNSUPPORTED_METHOD ("resolved as list node with size 2") on the pre-fix binary, every run.
b Root cause explained? The untuple list-expansion branch in QueryAnalyzer::resolveMatcher was gated on execute_apply_transformer && !prefix.empty(). A named REPLACE sets execute_replace_transformer instead, so it skipped the expansion and hit the size != 1 throw, while the legacy path (ASTColumnsReplaceTransformer::transform + ActionsVisitor::doUntuple) expands to a.1, a.2.
c Fix matches root cause? Yes. The guard now also fires for execute_replace_transformer; no separate code path.
d Test intent preserved / new tests added? Yes. Added enable_analyzer=0/=1 REPLACE-untuple cases to 04401 (anon fields, named fields, other-columns-preserved, target-vs-source base). No existing assertion weakened.
e Both directions demonstrated? Yes. Pre-fix (Build ID 23357d2) 04401 FAILS; fixed (1f0b2f5) PASSES 20/20 randomized + 5/5 deterministic.
f Fix is general across code paths? Yes. The APPLY and REPLACE untuple cases now share the single expansion branch. The sibling bare-matcher rejection (both APPLY and REPLACE) is unchanged and still fires before expansion.
g Fix generalizes across inputs? Yes. Verified anonymous 2/3-field tuples (a.1 a.2 a.3), explicitly named fields (a.id a.v), multi-column input (other cols preserved), and target-vs-source naming (REPLACE (untuple(b) AS a) -> a.1 a.2, uses the target a not the source b). A transformer chained after untuple stays rejected in both analyzers.
h Backward compatible? Yes. No setting, format, or metadata change. This only makes enable_analyzer=1 accept a query the old analyzer already accepts (old/new parity), producing identical output.
i Invariants and contracts preserved? Yes. The node/name count invariant for resolveExpressionNodeList is preserved (k-1 nodes pushed in the loop, last via the loop tail, k names in result_projection_names), same as the existing APPLY-untuple expansion. apply_prefixed_projection_name (empty prefix for REPLACE) yields the target column name as the element base.

Session id: cron:clickhouse-worker-slot-4:20260703-100700

Comment thread src/Analyzer/Resolve/QueryAnalyzer.cpp
… tuples

The named-untuple expansion only ran when untuple resolved to a multi-element
list (node_list_nodes_size != 1). A single-field tuple resolves to a one-element
list, so it skipped the expansion and the generic single-node path dropped the
field suffix: APPLY (untuple, 'f_') over Tuple(id UInt8) produced f_a instead of
f_a.id, and REPLACE (untuple(a) AS a) produced a instead of a.id.

The legacy transformer path (ActionsVisitor::doUntuple) keeps the field suffix
for every tuple, including single-field ones (see 02890_untuple_column_names).
Drop the size != 1 requirement; the isUntupleExpansion guard (every element is
tupleElement(arg, const)) already restricts this to genuine untuple results, so
the expansion now covers one-element tuples too.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

groeneai commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author
Pre-PR validation gate (click to expand)
# Question Answer
a Deterministic repro? Yes. SELECT * APPLY (untuple, 'f_') FROM (SELECT CAST(tuple(1), 'Tuple(id UInt8)') AS a) -> analyzer gave f_a, old analyzer f_a.id. Reproduced 100% on pre-fix binary 1f0b2f5.
b Root cause explained? The named-untuple expansion in resolveMatcher was gated on node_list_nodes_size != 1. A single-field tuple resolves to a one-element tupleElement list, so it skipped the expansion and fell to the generic single-node path, which renames the reused column to the accumulated display name (f_a / a) and drops the .field suffix.
c Fix matches root cause? Yes. Removed the != 1 requirement from the expand_named_untuple guard so single-element untuple lists take the same per-element expansion path.
d Test intent preserved / new tests added? Yes. Added single-field-tuple coverage (f_a.id, f_a.1, a.id, a.1) to both the enable_analyzer=0 and enable_analyzer=1 sections of 04401, asserting old/new parity. No existing assertions weakened.
e Both directions demonstrated? Yes, at the harness level. Pre-fix (1f0b2f5): 04401 FAILS on exactly the new lines. Fixed (f6f81b1): PASSES 30/30 randomized + deterministic.
f Fix is general across code paths? Yes. The single expand_named_untuple guard is the only place the size gate lived; it covers both the APPLY-prefix and REPLACE untuple paths. No sibling code path has the same != 1 gate.
g Fix generalizes across inputs? Yes. Verified single-field named (Tuple(id)) and anonymous (tuple(1)) tuples, multi-field (2/3-field) unchanged, explicit tupleElement/tuple(*)/plain single-column APPLY still format as f_a, and chained-after-untuple still rejected (both analyzers).
h Backward compatible? Yes. No setting, format, or on-disk change. It aligns the default analyzer's projection names with the old analyzer (a naming bug fix); no gate needed.
i Invariants and contracts preserved? Yes. The expansion keeps the node/name count invariant for resolveExpressionNodeList (pushes k names and k nodes, last via the loop tail), and isUntupleExpansion guarantees every element is a tupleElement(arg, const) so getTupleElementName reads a valid constant.

Session id: cron:clickhouse-worker-slot-4:20260703-111800

@clickhouse-gh

clickhouse-gh Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.60% 85.60% +0.00%
Functions 92.70% 92.70% +0.00%
Branches 77.80% 77.80% +0.00%

Changed lines: Changed C/C++ lines covered: 119/131 (90.84%) · Uncovered code

Full report · Diff report

@groeneai

groeneai commented Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

CI finish ledger — 72e4ca8

Every failure below has an owner: a fixing PR (ours or external). Only CH Inc sync is exempt.

Check / test Reason Owner / fixing PR
AST fuzzer (amd_debug, targeted, old_compatibility) / Block structure mismatch in IntersectOrExceptStep (STID 0993-2cc2) chronic trunk fuzzer finding (65 unrelated PRs / 5 master hits / 30d — appears on master and across many unrelated PRs; the failing fuzzed query EXPLAIN PIPELINE SELECT DISTINCT ... INTERSECT ... is unrelated to this PR's analyzer APPLY-prefix change) fixing PR #107719 (ours, open) — tracked in the 0993 block-structure-mismatch umbrella

No PR-caused failure. This PR's diff (analyzer column-transformer APPLY-prefix naming in ColumnsTransformer) does not touch the IntersectOrExceptStep block-structure path; the STID predates this PR and fires on master.

Session id: cron:our-pr-ci-monitor:20260703-163000

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logical error: Can't set alias of A of B (STID: 1493-3ccb)

2 participants