iframe-proxy

niyue · 2026-05-15T15:10:34Z

This PR closes #92266

Support column matchers in column DEFAULT, ALIAS, MATERIALIZED, and EPHEMERAL expressions, and in data skipping index expressions. This allows expressions such as *, COLUMNS('...'), COLUMNS(a, b), EXCEPT, APPLY, and REPLACE to be expanded before expression validation and execution. ~~The change also adds namedTuple function to make matcher-expanded named tuple expressions ergonomic in tests and user queries.~~

The tests cover these use cases, direct and indirect cyclic default-expression dependency detection, and nested matcher expansion.

Changelog category:

New Feature

Changelog entry:

Support column matchers such as * and COLUMNS in column default value expressions, DEFAULT, ALIAS, MATERIALIZED, and EPHEMERAL expressions, and in data skipping index expressions.

Note

About the newly added namedTuple function: I read the previous discussions in [1], [2], and [3], and my impression is that the existing enable_named_columns_in_function_tuple setting is not very discoverable. A separate function name may make the intention clearer and the feature easier to use, especially in this use case. It also avoids changing the behavior of the existing tuple function, so it should not introduce compatibility issues. Please let me know if this direction is not desirable.

[1] #63524
[2] #54921
[3] #54881

Note

Medium Risk
Touches core expression/DDL validation paths (defaults, aliases, ALTER, mutations, skip indexes), which can affect query planning and error behavior if matcher expansion or cycle detection is incorrect.

Overview
Enables column matchers (e.g. *, COLUMNS(...) plus EXCEPT/APPLY/REPLACE) inside column DEFAULT/MATERIALIZED/ALIAS/EPHEMERAL expressions by expanding matchers before validation/execution, honoring asterisk_include_* settings and rejecting qualified matchers.

Updates DDL/default validation, alias expansion, read-order optimization, merge/SELECT paths, and mutation/materialize flows to use a shared cloneAndExpandColumnDefaultExpression helper and adds cycle detection that accounts for matcher-expanded dependencies.

Extends skip index parsing/analysis to normalize matcher and alias usage (including cyclic-alias detection) before TreeRewriter analysis, and adds docs + comprehensive stateless tests covering matcher expansion, errors, and ALTER/mutation/index scenarios.

^{Reviewed by Cursor Bugbot for commit 48ab80f. Bugbot is set up for automated code reviews on this repo. Configure here.}

clickhouse-gh · 2026-05-15T23:32:31Z

alexey-milovidov · 2026-05-15T23:33:28Z

The change also adds namedTuple function to make matcher-expanded named tuple expressions ergonomic in tests and user queries.

Let's avoid adding it in this PR. We plan to make the ordinary tuple constructor return named tuples whenever possible.

alexey-milovidov · 2026-05-15T23:33:56Z

the existing enable_named_columns_in_function_tuple setting is not very discoverable

We plan to enable it by default.

Allow and matchers in , , and expressions, and in skip index expressions. Matchers are expanded before validation so normal expression analysis still reports arity and type exceptions. Issue: ClickHouse#92266

Add `namedTuple` as a strict variant of `tuple` that always returns a named `Tuple` and reports an exception for duplicate or invalid argument names.

Extend the default-expression matcher stateless test to cover `asterisk_include_materialized_columns` and `asterisk_include_alias_columns` during default value evaluation.

Add a stateless test case where a `DEFAULT` expression with `*` expands into an indirect dependency cycle and reports `CYCLIC_ALIASES`.

…use `tuple` with a setting directly

Build the validation column snapshot with updated `default_desc.kind` values for `ADD COLUMN` and `MODIFY COLUMN`, so `*` and `COLUMNS` expansion sees the post-alter default kinds.

Run the expanded default-cycle check before collecting identifiers from missing column defaults in `MergeTree` read dependency discovery. This keeps old parts with setting-dependent `DEFAULT` / `MATERIALIZED` cycles reporting `CYCLIC_ALIASES` instead of recursing until `TOO_DEEP_RECURSION`. Add a stateless regression test for reading a matcher-based `DEFAULT` column from an old part after adding a dependent `MATERIALIZED` column.

Reject setting-dependent cycles after expanding matcher-based default expressions in mutation paths before dependency discovery or mutation expression execution can use the expanded AST. Cover `MATERIALIZE COLUMN` and `CLEAR COLUMN` with `MATERIALIZED` columns whose `tuple(*)` expression becomes recursive when `asterisk_include_materialized_columns` is enabled.

Apply `FIRST` and `AFTER` positions to the validation snapshot used for stored default expressions. This keeps matcher expansion during `ALTER ADD COLUMN` and `ALTER MODIFY COLUMN` validation aligned with the schema order that `apply` will persist. Add a stateless test for order-sensitive `tuple(*)` defaults.

Run `validateNoCyclicAliasesAfterExpansion` after expanding matcher-based default expressions for column TTL actions and before building expression actions. Add a stateless test for a late `ALIAS` cycle after `OPTIMIZE TABLE`.

Replace the root lambda argument when expanding `APPLY` matchers and avoid capturing identifiers that are local to nested lambdas. Add a stateless test for default matcher expansion with lambda `APPLY`.

Run `validateNoCyclicAliasesAfterExpansion` before replacing nested alias columns in old-analyzer direct-clone paths. Add a stateless test for matcher-expanded `ALIAS` cycles through `MATERIALIZED` columns.

Collect required columns for expanded default expressions with `RequiredSourceColumnsVisitor` so lambda-local identifiers are not treated as storage dependencies. Add a stateless test for matcher-based defaults over old `MergeTree` parts with lambda-local names.

When replacing an `ALIAS` inside a lambda, recursively normalize the alias body with caller `private_aliases` cleared so table aliases are not captured by lambda parameters. Add a stateless test for persisted skip index expressions that reference alias columns inside lambda predicates.

Move lambda argument name extraction from `RequiredSourceColumnsMatcher::extractNamesFromLambda` to parser-level `getASTLambdaArgumentNames` so `Parsers` and `Interpreters` code paths reuse the same validation and extraction logic. Use the shared helper in column matcher `APPLY` expansion and existing lambda scope visitors.

…olumn-matcher

The test expected `SELECT b` to raise `CYCLIC_ALIASES` after `OPTIMIZE` expired the column `b`, relying on a `b` -> `x` -> `b` cycle formed when `asterisk_include_alias_columns = 1` makes the matcher in `b`'s `DEFAULT` expand to include the alias column `x`. This worked under the new analyzer (referencing the table resolves every `ALIAS` column expression in the query context, expanding `x` and detecting the cycle), but failed under the old analyzer: `SELECT b` only requires the `DEFAULT` column `b`, so the old analyzer never expands the unreferenced alias `x`, and the read-time reconstruction of the expired column runs in the storage context (server defaults, alias columns excluded from `*`), so no cycle is formed and the query returns the value. Read the alias column `x` instead of `b`, with `optimize_respect_aliases = 1`, so the matcher expansion and the late cycle check run at query time under both analyzers. This mirrors the existing `04317_alias_matcher_late_cycle_check`. `OPTIMIZE` runs the `TTL` materialization in the background merge context (server defaults), which excludes alias columns from `*`, so it forms no cycle and must succeed; the cycle only manifests at query time. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-06-29T14:03:10Z

Merged the latest master (clean, no conflicts) and fixed the one real CI failure: 04405_ttl_default_matcher_late_cycle was failing under the old analyzer (the amd_llvm_coverage stateless job).

Root cause: the test did SELECT b and expected CYCLIC_ALIASES. The b → x → b cycle only forms once asterisk_include_alias_columns = 1 makes the matcher in b's DEFAULT expand to include the alias column x.

Under the new analyzer this is detected because referencing the table resolves every ALIAS column expression in the query context, so x is expanded and the cycle is found.
Under the old analyzer, SELECT b only requires the DEFAULT column b, so the unreferenced alias x is never expanded, and the read-time reconstruction of the TTL-expired column runs in the storage context (server defaults, alias columns excluded from *) — no cycle, so the query returned a value.

Fix: read the alias column x (with optimize_respect_aliases = 1) instead of b, so the matcher expansion and the late-cycle check run at query time under both analyzers. This mirrors the existing, passing 04317_alias_matcher_late_cycle_check. OPTIMIZE runs the TTL materialization in the background merge context (server defaults, no alias columns), so it forms no cycle and must succeed; the cycle only manifests at query time.

For the record, validateNoCyclicAliasesAfterExpansion in TTLTransform is a defensive guard: the merge always expands the default under the background context (alias columns excluded by default), so it does not surface this particular cycle during the merge itself.

…atIfIndexEstimator` This PR added a `ContextPtr context` parameter to `createMergeTreeSequentialSource` (needed to expand column matchers in default expressions during merges), but the call site in `WhatIfIndexEstimator.cpp` — a separate caller added on `master` — was not updated, breaking the build with "no matching function for call to 'createMergeTreeSequentialSource'" (requires 13 arguments, but 12 were provided). Pass the in-scope `context` (the same one already used for alter conversions and query limits a few lines below) as the final argument. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-06-30T06:32:06Z

Merged the latest master (clean, no conflicts — the branch had drifted ~320 commits) and fixed the one real CI failure: the Fast test and Build (arm_tidy) builds were failing to compile WhatIfIndexEstimator.cpp.

Root cause: this PR adds a ContextPtr context parameter to createMergeTreeSequentialSource (so column matchers in default expressions can be expanded during merges/reads). All call sites that existed when the PR was written were updated, but WhatIfIndexEstimator.cpp — a separate caller of createMergeTreeSequentialSource that landed on master — was not, so the build broke with:

WhatIfIndexEstimator.cpp:412:25: error: no matching function for call to 'createMergeTreeSequentialSource'
note: candidate function not viable: requires 13 arguments, but 12 were provided

Fix: pass the in-scope context (the same ContextPtr already used a few lines below for getAlterConversionsForPart and the query limits) as the final argument.

For the record, I checked the rest of the cross-cut against the freshly merged master: every other function this PR changed the signature of (getReadTaskColumnsForMerge, injectRequiredColumns, validateColumnsDefaults/validateColumnsDefaultsAndGetSampleBlock, performRequiredConversions, and the extractNamesFromLambda → getASTLambdaArgumentNames refactor) has all of its callers updated in the merged tree, and WhatIfIndexEstimator.cpp was the only stray caller. A fresh CI run is in progress on the updated head.

Column matchers (`*`, `COLUMNS(...)`) in `DEFAULT`/`MATERIALIZED`/`ALIAS` expressions are stored unexpanded and re-expanded against the table's `ColumnsDescription` every time the default is evaluated. With the default `flatten_nested = 1`, `getColumnsDescription` flattens the stored metadata (`res.flattenNested()`), so at insert time the matcher is expanded against the flattened columns (`n.x`), while validation in `validateColumnsDefaultsAndGetSampleBlock` was expanding it against the un-flattened `columns_for_default_validation` (`n`). That mismatch let `CREATE TABLE` persist a default that executes differently from what was validated. For example `CREATE TABLE t (n Nested(x UInt8), b UInt64 DEFAULT length(COLUMNS('^n$')))` validated against `n` but, after flattening, expanded to zero columns at insert time and threw `NUMBER_OF_ARGUMENTS_DOESNT_MATCH`. Flatten `columns_for_default_validation` under the same condition as `res` so matcher expansion during validation sees exactly the schema used at execution time. Add a stateless regression test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-06-30T08:14:08Z

Addressed the AI Review Request changes finding (matcher defaults validated against a pre-flatten schema) in b71da69.

Root cause. Column matchers in DEFAULT/MATERIALIZED/ALIAS expressions are stored unexpanded and re-expanded against the table's ColumnsDescription on every evaluation. getColumnsDescription validates defaults (validateColumnsDefaultsAndGetSampleBlock) against columns_for_default_validation, which was built before res.flattenNested(). With flatten_nested = 1 the stored schema is flattened (n → n.x), so at insert time a matcher expands against the flattened columns while validation expanded against the un-flattened ones — letting CREATE TABLE persist a default that executes differently from what was validated.

Fix. Flatten columns_for_default_validation under the same condition as res, so validation-time matcher expansion sees exactly the execution-time schema. This rejects e.g. length(COLUMNS('^n$')) over a Nested column at DDL time (NUMBER_OF_ARGUMENTS_DOESNT_MATCH) and keeps valid flattened-subcolumn matchers (length(COLUMNS('x')) → length(n.x)) both validating and evaluating consistently.

Test. Added 04489_default_expr_matcher_flatten_nested covering the accepted DEFAULT/MATERIALIZED cases and the rejected case with flatten_nested = 1. Verified the matcher-expansion semantics (COLUMNS('x') → n.x; COLUMNS('^n$') → zero columns → code 42) against a master clickhouse-local; the C++ change reuses the existing ColumnsDescription::flattenNested already called a few lines below, so it only affects tables that contain Nested columns.

Net diff vs the previous head is exactly these 3 files (+60). The AI Review thread is resolved; waiting on fresh CI.

clickhouse-gh · 2026-06-30T11:58:57Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	85.40%	85.40%	+0.00%
Functions	92.70%	92.70%	+0.00%
Branches	77.60%	77.70%	+0.10%

Changed lines: Changed C/C++ lines covered: 705/742 (95.01%) · Uncovered code

Full report · Diff report

alexey-milovidov · 2026-06-30T13:18:15Z

Fixed the CH Inc sync failure, which was red with build failed (a private-side build break not visible in the public CI).

Root cause: this PR adds a ContextPtr context parameter to getReadTaskColumns so that column matchers in default expressions can be expanded during reads. Every public call site was updated, but a caller that exists only in the internal repository still used the old 10-argument overload, so the internal build failed across all configurations:

StorageMergeTreeParts.cpp:342:31: error: no matching function for call to 'getReadTaskColumns'
MergeTreeBlockReadUtils.h:33:26: note: candidate function not viable: requires 11 arguments, but 10 were provided

Fix: pass the in-scope getContext() as the new argument on the sync branch, mirroring the public MergeTreeReadPoolBase caller (which passes getContext() in the same slot). This is the same class of cross-cut as the earlier WhatIfIndexEstimator.cpp build fix; I verified across the whole internal tree that this is the only caller of any function whose signature this PR changes that the public CI does not cover. A fresh internal CI run is in progress.

The only remaining red is Hung check failed, possible deadlock found (Stress test (arm_asan_ubsan)), which is the endemic AST-fuzzer hung-check flake tracked in #107941 — it reproduces on master directly and does not touch any code path this PR changes.

clickhouse-gh · 2026-06-30T15:43:15Z

📊 Cloud Performance Report

✅ AI verdict: no_change — no significant changes across 38 queries analysed

This PR adds support for column matchers (*, COLUMNS(...)) and transformer modifiers in column DEFAULT/MATERIALIZED/ALIAS and index expressions, plus a refactor of lambda-argument-name extraction. All of that runs at DDL, INSERT-default, mutation, and alias-expansion time — paths only reached for tables that actually declare such columns. The four flagged ClickBench queries (Q4, Q15, Q28, Q34) are plain scan/aggregation SELECTs over tables with no alias or default-matcher columns, so the query-execution hot path they exercise is untouched. The reported improvements (ranging -5.7% to -19%) cannot plausibly be caused by this change and the base measurements were noticeably noisier than the source, so all four are downgraded to not-sure as run-to-run variance rather than real PR effects.

clickbench

⚠️ 4 inconclusive

Flagged queries (4 of 43)

	Query	Verdict	Baseline median (ms)	PR median (ms)	Change	q-value	Hint
⚠️	4	not_sure	262	212	-19.1%	<0.0001	PR only touches DEFAULT/ALIAS/matcher expansion; Q4 scan path untouched, so the -19% is off-path variance
⚠️	15	not_sure	245	201	-18.0%	<0.0001	Off-path: ClickBench tables have no ALIAS/DEFAULT matchers; Q15 execution can't be changed by this diff
⚠️	28	not_sure	6671	6289	-5.7%	<0.0001	Off-path: aggregation query, no alias/default expansion involved; -5.7% not attributable to this PR
⚠️	34	not_sure	1461	1377	-5.8%	<0.0001	Off-path: query exercises scan/agg, not column-matcher/default expansion; -5.8% is run-to-run variance

_{q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on.}

tpch_adapted_1_official

🟢 No significant changes

Debug info

StressHouse run: 347d2724-083e-468d-a5b8-94f3412c658b
MIRAI run: 3d7e0b0e-bf52-47f8-a258-02ef6bba5473
PR check IDs:
- clickbench_504125_1782832886
- clickbench_504132_1782832886
- clickbench_504142_1782832886
- tpch_adapted_1_official_504150_1782832887
- tpch_adapted_1_official_504163_1782832886
- tpch_adapted_1_official_504174_1782832886

alexey-milovidov · 2026-07-02T03:08:40Z

niyue changed the title ~~Feat/default expr column matcher~~ Add column matcher expansion for default value expressions May 15, 2026

niyue changed the title ~~Add column matcher expansion for default value expressions~~ Support column matcher expansion for default value expressions May 15, 2026

niyue changed the title ~~Support column matcher expansion for default value expressions~~ Support column matcher expansion for default value expressions and index expressions May 15, 2026

niyue mentioned this pull request May 15, 2026

Support * and column matchers inside default expressions for columns and indices #92266

Open

niyue force-pushed the feat/default-expr-column-matcher branch from 6d0b6df to 308133e Compare May 15, 2026 15:46

alexey-milovidov self-assigned this May 15, 2026

alexey-milovidov added the can be tested Allows running workflows for external contributors label May 15, 2026

clickhouse-gh Bot added the pr-feature Pull request with new product feature label May 15, 2026

clickhouse-gh Bot reviewed May 15, 2026

View reviewed changes

Comment thread src/Storages/AlterCommands.cpp Outdated

clickhouse-gh Bot reviewed May 15, 2026

View reviewed changes

Comment thread src/Storages/IndicesDescription.cpp Outdated

clickhouse-gh Bot reviewed May 15, 2026

View reviewed changes

Comment thread src/Functions/tuple.cpp Outdated

clickhouse-gh Bot reviewed May 15, 2026

View reviewed changes

Comment thread docs/en/sql-reference/statements/create/table.md

niyue added 14 commits May 16, 2026 21:15

Support column matchers in default expressions

eca4ae0

Allow and matchers in , , and expressions, and in skip index expressions. Matchers are expanded before validation so normal expression analysis still reports arity and type exceptions. Issue: ClickHouse#92266

Add namedTuple function

1125dae

Add `namedTuple` as a strict variant of `tuple` that always returns a named `Tuple` and reports an exception for duplicate or invalid argument names.

Test asterisk settings in default expression matchers

75aab6c

Extend the default-expression matcher stateless test to cover `asterisk_include_materialized_columns` and `asterisk_include_alias_columns` during default value evaluation.

Test indirect cycle in default expression matchers

28d23c5

Add a stateless test case where a `DEFAULT` expression with `*` expands into an indirect dependency cycle and reports `CYCLIC_ALIASES`.

Test alias and materialized matcher self cycles

ecef88b

Test default expression matchers with EPHEMERAL columns

475530a

Test invalid regexp in default expression COLUMNS matcher

2b09770

Remove duplicate namedTuple checks from matcher test

d45f5ff

Fix default expression matcher test reference

9dcd835

Document default expression column matchers

31b2818

Remove the newly added namedTuple function since it is expected to …

8223e5c

…use `tuple` with a setting directly

Fix ALTER validation for default expression matchers

15acbde

Build the validation column snapshot with updated `default_desc.kind` values for `ADD COLUMN` and `MODIFY COLUMN`, so `*` and `COLUMNS` expansion sees the post-alter default kinds.

Expand index matchers after alias replacement

464aada

Add matchers to aspell dictionary

2632f1b

niyue force-pushed the feat/default-expr-column-matcher branch from 981800b to 2632f1b Compare May 16, 2026 13:17

niyue added 2 commits June 21, 2026 17:11

Merge branch 'master' into feat/default-expr-column-matcher

9e7b790