Fix LOGICAL_ERROR when a column transformer sets an alias on an asterisk#109223
Fix LOGICAL_ERROR when a column transformer sets an alias on an asterisk#109223groeneai wants to merge 8 commits into
Conversation
|
cc @evillique @george-larionov — could you review this? It converts a |
|
Workflow [PR], commit [72e4ca8] Summary: ❌
AI ReviewSummaryThis PR hardens named column transformers around asterisk / matcher expansions and restores old-analyzer parity for prefixed Final VerdictStatus: ✅ Approve |
…fix in the analyzer Two related column-transformer issues: 1. Old analyzer crash. APPLY (with a name prefix) and REPLACE assign an alias to their result node. Only ASTWithAlias nodes support aliases, so when the transformer expression expands to an asterisk / qualified asterisk / COLUMNS matcher (e.g. SELECT * APPLY (x -> t.*, 'p_')), the default IAST::setAlias threw a LOGICAL_ERROR and aborted the server in debug/sanitizer builds. Reject such queries with a clear BAD_ARGUMENTS. Found by the AST fuzzer (STID 1493-3ccb). 2. New (default) analyzer dropped the APPLY name prefix. APPLY (expr, 'p_') is documented and works under the old analyzer (result named p_ + column name), but the analyzer never carried column_name_prefix from the parser into ApplyColumnTransformerNode, so the prefix was silently ignored (e.g. SELECT * APPLY (toString, 'f_') produced toString(a) instead of f_a). Carry the prefix QueryTreeBuilder -> ApplyColumnTransformerNode -> resolveMatcher / toASTImpl so the projection name matches the old analyzer. Closes: ClickHouse#109214 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
9213084 to
81bdafb
Compare
Pre-PR validation gate (click to expand) — analyzer follow-up
Session id: cron:clickhouse-worker-slot-3:20260702-233000 |
…e names The new-analyzer APPLY name-prefix support added earlier in this PR took the prefix base from node_to_projection_name, which qualifyColumnNodesWithProjectionNames may set to a qualified name (t1.a) in a scope that requires qualification. The legacy AST path prefixes ASTIdentifier::shortName(). So `t1.* APPLY (toString, 'f_')` in a qualifying scope produced f_t1.a instead of f_a. Fix: accumulate the prefixed display name from the short column name, independent of the (possibly qualified) node_to_projection_name value. The prefix also failed to reach a chained transformer when the first transformer was an identity lambda (`APPLY (x -> x, 'p_') APPLY toString` gave toString(a), not toString(p_a)): the identity lambda leaves the resolved node pointing at the original matched node, which is already present in both node_to_projection_name and resolved_expressions, so an emplace did not update the name. Overwrite both maps when the resolved node pointer is reused. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pre-PR validation gate (click to expand)
Session id: cron:clickhouse-worker-slot-1:20260703-004300 |
…sion, not the prefix alias
The earlier analyzer fix stored the prefixed display name (f_a, p_a) into
node_to_projection_name for every APPLY result, including freshly created
function/lambda nodes. A later unprefixed APPLY then formatted its argument
from that prefix alias instead of the actual previous expression, so the
analyzer diverged from the legacy AST path for function-form chains:
SELECT * APPLY (identity, 'p_') APPLY toString -> toString(p_a) (analyzer)
-> toString(identity(a)) (legacy)
SELECT * APPLY (toString, 'f_') APPLY upper -> upper(f_a) (analyzer)
-> upper(toString(a)) (legacy)
Split the two roles:
- Terminal display name of this column keeps the accumulated prefix (f_a, q_p_a).
- The name a chained transformer observes for this node is the natural resolved
name for a freshly created node (toString(identity(a))), matching the legacy
path. Only a reused node (an identity lambda x -> x that resolves back to the
original matched node, which has no legacy equivalent) carries the accumulated
prefix forward, so a chained transformer still sees toString(p_a) there.
The overwrite of node_to_projection_name / resolved_expressions is now gated on
node-pointer reuse, so it no longer rewrites the cached name of a fresh function
node. The short-name accumulator for qualified scopes (f_a, not f_x.a) is kept.
Regression coverage in 04401 asserts the analyzer section matches the legacy
section for both function-form chains and the identity-lambda reuse case.
Pre-PR validation gate (click to expand)
Session id: cron:clickhouse-worker-slot-1:20260703-031100 |
…BAD_ARGUMENTS The default analyzer accepted a named transformer whose expression is a bare matcher/asterisk (APPLY (x -> *, 'prefix'), APPLY (x -> COLUMNS(...), 'prefix'), REPLACE (COLUMNS(...) AS a)): on a one-column input it renamed the reused column (f_a), and on a multi-column input it threw a generic UNSUPPORTED_METHOD. The legacy analyzer rejects all of these with BAD_ARGUMENTS (setAlias on a non-aliasable asterisk node). Reject them in resolveMatcher before the matcher is expanded, when the lambda body (for a prefixed APPLY) or the REPLACE replacement is itself a MatcherNode. A matcher nested inside a function (x -> tuple(*)) is not a bare matcher and stays allowed, matching the legacy path. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pre-PR validation gate (click to expand)
Session id: cron:clickhouse-worker-slot-2:20260703-044900 |
A named APPLY over untuple (SELECT * APPLY (x -> untuple(x), 'f_')) resolved to a ListNode of tupleElement calls, one per tuple field, and the matcher rejected it with a generic UNSUPPORTED_METHOD (size != 1). The legacy analyzer instead expands each element with the prefix applied to the short column name (f_a.1, f_a.id). Expand the list into sibling projection columns to match, but only when untuple is the terminal transformer (a transformer chained after untuple is rejected by both analyzers). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pre-PR validation gate (commit 247a715)Pre-PR validation gate (click to expand)
Session id: cron:clickhouse-worker-slot-2:20260703-063600 |
…y name A named untuple chained after an earlier transformer prefixed each output element from the original short column name (q_a.N), diverging from the legacy path which uses the accumulated display name of the node feeding untuple. Base the element names on apply_prefixed_projection_name so chained forms come out as q_identity(a).N / q_p_a.N (matching enable_analyzer=0), while a direct untuple stays f_a.N. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pre-PR validation gate (click to expand)
Session id: cron:clickhouse-worker-slot-0:20260703-090000 |
The untuple list-expansion special-case in QueryAnalyzer::resolveMatcher only ran for execute_apply_transformer, so a named REPLACE over untuple such as SELECT * REPLACE (untuple(a) AS a) FROM (SELECT (1, 2) AS a) fell through to the size != 1 UNSUPPORTED_METHOD throw under enable_analyzer=1, while enable_analyzer=0 accepted it and expanded to a.1, a.2 (ActionsVisitor::doUntuple aliases the untuple to the REPLACE target name). Relax the guard to also fire for execute_replace_transformer. For REPLACE the prefix is empty and the display-name accumulator holds the replaced column name, so the element names come out as the REPLACE target base (a.1, a.id), matching the old analyzer. The is_last_transformer gate is unchanged: a transformer chained after untuple is rejected by both analyzers. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pre-PR validation gate (click to expand)
Session id: cron:clickhouse-worker-slot-4:20260703-100700 |
… tuples The named-untuple expansion only ran when untuple resolved to a multi-element list (node_list_nodes_size != 1). A single-field tuple resolves to a one-element list, so it skipped the expansion and the generic single-node path dropped the field suffix: APPLY (untuple, 'f_') over Tuple(id UInt8) produced f_a instead of f_a.id, and REPLACE (untuple(a) AS a) produced a instead of a.id. The legacy transformer path (ActionsVisitor::doUntuple) keeps the field suffix for every tuple, including single-field ones (see 02890_untuple_column_names). Drop the size != 1 requirement; the isUntupleExpansion guard (every element is tupleElement(arg, const)) already restricts this to genuine untuple results, so the expansion now covers one-element tuples too. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pre-PR validation gate (click to expand)
Session id: cron:clickhouse-worker-slot-4:20260703-111800 |
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered: 119/131 (90.84%) · Uncovered code |
CI finish ledger — 72e4ca8Every failure below has an owner: a fixing PR (ours or external). Only No PR-caused failure. This PR's diff (analyzer column-transformer APPLY-prefix naming in Session id: cron:our-pr-ci-monitor:20260703-163000 |

Closes: #109214
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fixed a
LOGICAL_ERROR("Can't set alias of ... of QualifiedAsterisk", server abort in debug/sanitizer builds) when a column transformer that assigns a name (APPLYwith a name prefix, orREPLACE) is applied to an expression that expands to an asterisk orCOLUMNSmatcher, e.g.SELECT * APPLY (x -> t.*, 'p_'). Such queries now return a clearBAD_ARGUMENTSerror.Description
Found by the AST fuzzer (
AST fuzzer (amd_debug, targeted), STID1493-3ccb):report https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=109082&sha=62139e418dccb913277c31a483b2547ed4e230e7&name_0=PR&name_1=AST%20fuzzer%20%28amd_debug%2C%20targeted%29
Fuzzed query:
Root cause:
ASTColumnsApplyTransformer::transformandASTColumnsReplaceTransformer::transformcallsetAliason the transformer's result node. OnlyASTWithAliasnodes (identifiers, functions, literals, subqueries) support aliases. When theAPPLYlambda body (or theREPLACEreplacement) expands to anASTAsterisk/ASTQualifiedAsterisk/ASTColumnsMatcher, the defaultIAST::setAliasthrowsLOGICAL_ERROR, which aborts the server in debug/sanitizer builds. This is the old-analyzer path (enable_analyzer=0, viaTranslateQualifiedNamesVisitor); the analyzer resolves the lambda body viaresolveLambdaand already returns a clean exception.Fix: a small
setTransformerAliashelper checks the node is anASTWithAliasbefore setting the alias, otherwise throws a clearBAD_ARGUMENTS. This mirrors the existing precedent for the same class of error (asterisk with column aliases inCREATE VIEW, and theEXCEPTtransformer defensive handling).Reproducers that aborted before this change and now return
BAD_ARGUMENTS(all withenable_analyzer=0):