Support column matcher expansion for default value expressions and index expressions#105045
Support column matcher expansion for default value expressions and index expressions#105045niyue wants to merge 73 commits into
Conversation
6d0b6df to
308133e
Compare
Let's avoid adding it in this PR. We plan to make the ordinary tuple constructor return named tuples whenever possible. |
We plan to enable it by default. |
Allow and matchers in , , and expressions, and in skip index expressions. Matchers are expanded before validation so normal expression analysis still reports arity and type exceptions. Issue: ClickHouse#92266
Add `namedTuple` as a strict variant of `tuple` that always returns a named `Tuple` and reports an exception for duplicate or invalid argument names.
Extend the default-expression matcher stateless test to cover `asterisk_include_materialized_columns` and `asterisk_include_alias_columns` during default value evaluation.
Add a stateless test case where a `DEFAULT` expression with `*` expands into an indirect dependency cycle and reports `CYCLIC_ALIASES`.
…use `tuple` with a setting directly
Build the validation column snapshot with updated `default_desc.kind` values for `ADD COLUMN` and `MODIFY COLUMN`, so `*` and `COLUMNS` expansion sees the post-alter default kinds.
981800b to
2632f1b
Compare
Run the expanded default-cycle check before collecting identifiers from missing column defaults in `MergeTree` read dependency discovery. This keeps old parts with setting-dependent `DEFAULT` / `MATERIALIZED` cycles reporting `CYCLIC_ALIASES` instead of recursing until `TOO_DEEP_RECURSION`. Add a stateless regression test for reading a matcher-based `DEFAULT` column from an old part after adding a dependent `MATERIALIZED` column.
Reject setting-dependent cycles after expanding matcher-based default expressions in mutation paths before dependency discovery or mutation expression execution can use the expanded AST. Cover `MATERIALIZE COLUMN` and `CLEAR COLUMN` with `MATERIALIZED` columns whose `tuple(*)` expression becomes recursive when `asterisk_include_materialized_columns` is enabled.
Apply `FIRST` and `AFTER` positions to the validation snapshot used for stored default expressions. This keeps matcher expansion during `ALTER ADD COLUMN` and `ALTER MODIFY COLUMN` validation aligned with the schema order that `apply` will persist. Add a stateless test for order-sensitive `tuple(*)` defaults.
Run `validateNoCyclicAliasesAfterExpansion` after expanding matcher-based default expressions for column TTL actions and before building expression actions. Add a stateless test for a late `ALIAS` cycle after `OPTIMIZE TABLE`.
Replace the root lambda argument when expanding `APPLY` matchers and avoid capturing identifiers that are local to nested lambdas. Add a stateless test for default matcher expansion with lambda `APPLY`.
Run `validateNoCyclicAliasesAfterExpansion` before replacing nested alias columns in old-analyzer direct-clone paths. Add a stateless test for matcher-expanded `ALIAS` cycles through `MATERIALIZED` columns.
Collect required columns for expanded default expressions with `RequiredSourceColumnsVisitor` so lambda-local identifiers are not treated as storage dependencies. Add a stateless test for matcher-based defaults over old `MergeTree` parts with lambda-local names.
When replacing an `ALIAS` inside a lambda, recursively normalize the alias body with caller `private_aliases` cleared so table aliases are not captured by lambda parameters. Add a stateless test for persisted skip index expressions that reference alias columns inside lambda predicates.
Move lambda argument name extraction from `RequiredSourceColumnsMatcher::extractNamesFromLambda` to parser-level `getASTLambdaArgumentNames` so `Parsers` and `Interpreters` code paths reuse the same validation and extraction logic. Use the shared helper in column matcher `APPLY` expansion and existing lambda scope visitors.
The test expected `SELECT b` to raise `CYCLIC_ALIASES` after `OPTIMIZE` expired the column `b`, relying on a `b` -> `x` -> `b` cycle formed when `asterisk_include_alias_columns = 1` makes the matcher in `b`'s `DEFAULT` expand to include the alias column `x`. This worked under the new analyzer (referencing the table resolves every `ALIAS` column expression in the query context, expanding `x` and detecting the cycle), but failed under the old analyzer: `SELECT b` only requires the `DEFAULT` column `b`, so the old analyzer never expands the unreferenced alias `x`, and the read-time reconstruction of the expired column runs in the storage context (server defaults, alias columns excluded from `*`), so no cycle is formed and the query returns the value. Read the alias column `x` instead of `b`, with `optimize_respect_aliases = 1`, so the matcher expansion and the late cycle check run at query time under both analyzers. This mirrors the existing `04317_alias_matcher_late_cycle_check`. `OPTIMIZE` runs the `TTL` materialization in the background merge context (server defaults), which excludes alias columns from `*`, so it forms no cycle and must succeed; the cycle only manifests at query time. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Merged the latest Root cause: the test did
Fix: read the alias column For the record, |
…atIfIndexEstimator` This PR added a `ContextPtr context` parameter to `createMergeTreeSequentialSource` (needed to expand column matchers in default expressions during merges), but the call site in `WhatIfIndexEstimator.cpp` — a separate caller added on `master` — was not updated, breaking the build with "no matching function for call to 'createMergeTreeSequentialSource'" (requires 13 arguments, but 12 were provided). Pass the in-scope `context` (the same one already used for alter conversions and query limits a few lines below) as the final argument. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Merged the latest Root cause: this PR adds a Fix: pass the in-scope For the record, I checked the rest of the cross-cut against the freshly merged |
Column matchers (`*`, `COLUMNS(...)`) in `DEFAULT`/`MATERIALIZED`/`ALIAS`
expressions are stored unexpanded and re-expanded against the table's
`ColumnsDescription` every time the default is evaluated. With the default
`flatten_nested = 1`, `getColumnsDescription` flattens the stored metadata
(`res.flattenNested()`), so at insert time the matcher is expanded against the
flattened columns (`n.x`), while validation in
`validateColumnsDefaultsAndGetSampleBlock` was expanding it against the
un-flattened `columns_for_default_validation` (`n`).
That mismatch let `CREATE TABLE` persist a default that executes differently
from what was validated. For example
`CREATE TABLE t (n Nested(x UInt8), b UInt64 DEFAULT length(COLUMNS('^n$')))`
validated against `n` but, after flattening, expanded to zero columns at insert
time and threw `NUMBER_OF_ARGUMENTS_DOESNT_MATCH`.
Flatten `columns_for_default_validation` under the same condition as `res` so
matcher expansion during validation sees exactly the schema used at execution
time. Add a stateless regression test.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed the AI Review Request changes finding (matcher defaults validated against a pre-flatten schema) in b71da69. Root cause. Column matchers in Fix. Flatten Test. Added Net diff vs the previous head is exactly these 3 files (+60). The AI Review thread is resolved; waiting on fresh CI. |
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered: 705/742 (95.01%) · Uncovered code |
|
Fixed the Root cause: this PR adds a Fix: pass the in-scope The only remaining red is |
|
📊 Cloud Performance Report ✅ AI verdict: This PR adds support for column matchers (*, COLUMNS(...)) and transformer modifiers in column DEFAULT/MATERIALIZED/ALIAS and index expressions, plus a refactor of lambda-argument-name extraction. All of that runs at DDL, INSERT-default, mutation, and alias-expansion time — paths only reached for tables that actually declare such columns. The four flagged ClickBench queries (Q4, Q15, Q28, Q34) are plain scan/aggregation SELECTs over tables with no alias or default-matcher columns, so the query-execution hot path they exercise is untouched. The reported improvements (ranging -5.7% to -19%) cannot plausibly be caused by this change and the base measurements were noticeably noisier than the source, so all four are downgraded to not-sure as run-to-run variance rather than real PR effects. clickbenchFlagged queries (4 of 43)
q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on. tpch_adapted_1_official🟢 No significant changes Debug info
|

This PR closes #92266
Support column matchers in column
DEFAULT,ALIAS,MATERIALIZED, andEPHEMERALexpressions, and in data skipping index expressions. This allows expressions such as*,COLUMNS('...'),COLUMNS(a, b),EXCEPT,APPLY, andREPLACEto be expanded before expression validation and execution.The change also addsnamedTuplefunction to make matcher-expanded named tuple expressions ergonomic in tests and user queries.The tests cover these use cases, direct and indirect cyclic default-expression dependency detection, and nested matcher expansion.
Changelog category:
Changelog entry:
*andCOLUMNSin column default value expressions,DEFAULT,ALIAS,MATERIALIZED, andEPHEMERALexpressions, and in data skipping index expressions.Note
About the newly addednamedTuplefunction: I read the previous discussions in [1], [2], and [3], and my impression is that the existingenable_named_columns_in_function_tuplesetting is not very discoverable. A separate function name may make the intention clearer and the feature easier to use, especially in this use case. It also avoids changing the behavior of the existing tuple function, so it should not introduce compatibility issues. Please let me know if this direction is not desirable.[1] #63524
[2] #54921
[3] #54881
Note
Medium Risk
Touches core expression/DDL validation paths (defaults, aliases, ALTER, mutations, skip indexes), which can affect query planning and error behavior if matcher expansion or cycle detection is incorrect.
Overview
Enables column matchers (e.g.
*,COLUMNS(...)plusEXCEPT/APPLY/REPLACE) inside columnDEFAULT/MATERIALIZED/ALIAS/EPHEMERALexpressions by expanding matchers before validation/execution, honoringasterisk_include_*settings and rejecting qualified matchers.Updates DDL/default validation, alias expansion, read-order optimization, merge/SELECT paths, and mutation/materialize flows to use a shared
cloneAndExpandColumnDefaultExpressionhelper and adds cycle detection that accounts for matcher-expanded dependencies.Extends skip index parsing/analysis to normalize matcher and alias usage (including cyclic-alias detection) before
TreeRewriteranalysis, and adds docs + comprehensive stateless tests covering matcher expansion, errors, and ALTER/mutation/index scenarios.Reviewed by Cursor Bugbot for commit 48ab80f. Bugbot is set up for automated code reviews on this repo. Configure here.