{{ message }}
Prune named-Tuple subfields whose values are all type-defaults at INSERT and merge#107409
Open
amosbird wants to merge 10 commits into
Open
Prune named-Tuple subfields whose values are all type-defaults at INSERT and merge#107409amosbird wants to merge 10 commits into
amosbird wants to merge 10 commits into
Conversation
…fields Each subfield of a named `Tuple` is stored as a separate stream named by the field path (e.g. `data.c2s.statistics.heros_statistics.damage.bin`), not by position. Adding a new subfield at any position only introduces new stream files; all preexisting subfield streams are name-stable and remain valid. Reading an old part for a newly-added subfield falls back to the type's default value via the existing `fillMissingColumns` infrastructure, and the existing CAST(named-Tuple -> named-Tuple-superset) handles whole-tuple reads element-wise by name. Extend `isMetadataOnlyConversion` to recognize named-Tuple subfield additions (recursive, through Array / Nullable / Map / Tuple wrappers). For named Tuples, every old subfield must still be present by name with a metadata-only-compatible type, and any new subfields are allowed in any position. Removing/renaming subfields or changing an existing subfield's type in a non-metadata-only way still requires a full mutation. Map recursion is added in the same change to enable `Map(K, Tuple(a, b))` -> `Map(K, Tuple(a, b, c))` and is naturally useful on its own. Impact: on a 2M-row / 1.3 GiB game-analytics table with a deeply nested `Tuple(Tuple(Array(Tuple(...))))` schema, the customer's `ALTER MODIFY COLUMN data Tuple(..., new_field Nullable(T))` drops from ~9s (full part rewrite) to ~12ms (metadata only), with no `Mutation` created and no part rewritten. Subsequent merges materialize the new subfield's stream files transparently.
Run key, index, and projection safety checks for any `MODIFY_COLUMN` that changes the column's type, regardless of whether the change is metadata-only or requires a mutation. Previously these checks were gated behind `isRequireMutationStage`, so metadata-only conversions (e.g. adding a subfield to a named `Tuple` whose subcolumn appears in `ORDER BY`) bypassed them, leaving stale `primary.idx` / partition key bytes that no longer match the new tuple arity. This was the blocking AI review finding on ClickHouse#107305. Add regression tests covering: - a subcolumn of the modified Tuple appearing in `ORDER BY` - a subcolumn appearing in `PARTITION BY` - the whole Tuple column appearing in `ORDER BY` (`primary.idx` arity would mismatch — caught by `isSafeForKeyConversion`) - sanity: a Tuple not in any key is still metadata-only ALTER-able
The previous attempt to validate key/index safety for metadata-only `MODIFY COLUMN` ran the checks for any type change, which broke five existing tests: `isSafeForKeyConversion` is stricter than `isMetadataOnlyConversion` (e.g. it rejects `DateTime` -> `UInt32` on a sorting-key column even though the binary representation is identical), and the column-conversion dry-run was incorrectly invoked on columns that became `ALIAS` after the alter. The actual hazard introduced by this PR is narrower: only named-Tuple subfield additions change the on-disk shape of a key column. Add `tupleAddsSubfieldsOnly(from, to)` that returns true iff the type change is a named-Tuple subfield addition (recursing through Array / Nullable / Map / nested Tuple) and gate the new key/index safety checks on it. Restore the original mutation-stage branch unchanged.
Two follow-ups to the previous `tupleAddsSubfieldsOnly`-gated check: 1. Nested Tuple additions where the outer field count does not change. The previous implementation walked through one wrapper at a time and only checked the field set at the first Tuple it reached. A change like `Tuple(a UInt64, sub Tuple(x UInt64)) -> Tuple(a UInt64, sub Tuple(x UInt64, y UInt64))` returned false because the outer field names match, so the key/index guard was skipped and a column whose subcolumn appears in `ORDER BY` could still have stale `primary.idx` bytes. Reimplement as a recursive walk: for every named-Tuple level, if a field is present in both old and new with the same type, continue; if the field's type itself changes, require it to be either a nested Tuple addition (recursion) or a metadata-only conversion (e.g. Enum widening, Date <-> UInt16). A nested Tuple addition flips `found_addition` so the outer "size didn't grow but inner did" case is detected. To call `isMetadataOnlyConversion` from the recursion, move the helper into the same anonymous namespace and keep a small free-function wrapper exposed for callers without a query context. 2. Explicit skip indexes were only rejected under `alter_column_secondary_index_mode` modes `THROW` / `COMPATIBILITY`. Under the default `REBUILD` mode the rebuild path is mutation-based: a metadata-only Tuple subfield addition never produces a `MutationCommand::READ_COLUMN`, so the rebuild logic in `MutationsInterpreter` cannot refresh the stale `skp_idx_*.idx` bytes. Reject explicit-skip-index columns unconditionally, with a clear error message asking the user to drop and re-create the index after the ALTER. New test cases (added to 04319_named_tuple_metadata_only_alter): - 17a/17b/17c cover nested Tuple additions with the parent whole-column, the subcolumn under the addition, and a sibling subcolumn each appearing in ORDER BY. - 17d confirms a nested Tuple addition with no key dependency still succeeds as metadata-only. - 18a/18b cover an explicit `set` skip index on the whole column, rejected under both the default and the `rebuild` mode.
Reorder-only changes like `Tuple(a UInt64, b UInt64) -> Tuple(b UInt64, a UInt64)` were treated as metadata-only by `isNamedTupleMetadataOnlyChange`, which only verified that every old name still existed in the new type. But `primary.idx` bytes and explicit skip-index bytes for an embedded `Tuple` value are serialized in the old field order, while the new type deserializes and compares in the new order, so the metadata-only path produced inconsistent on-disk vs in-memory ordering. Tighten both `isNamedTupleMetadataOnlyChange` and the helper `tupleAddsSubfieldsOnly` to additionally require that the existing fields appear in `to` in the same relative order as in `from`. Insertions are still allowed at any position because they do not move existing fields relative to each other. The Compact-part concern from the AI review is not a real issue: each part records its own `columns.txt` and `columns_substreams.txt`, so `SerializationTuple` deserializes by named substream offsets even when all substreams share `data.bin`. Adding a new subfield simply leaves the new substream absent from `columns_substreams.txt`, and `CAST` fills it with the type default during `performRequiredConversions`. Verified by running the existing test 04319 on a Compact part. New tests: - 19a covers a pure reorder triggering a mutation (not metadata-only). - 19b covers insertion-preserving-order remaining metadata-only.
`generateRandom()` without an explicit structure argument does NOT consult the INSERT target schema in server mode — it infers a random schema each time (observed column counts: 54, 122, 134, ...), so the subsequent `INSERT INTO t SELECT * FROM generateRandom() LIMIT 100` intermittently fails with `NUMBER_OF_COLUMNS_DOESNT_MATCH`. Local mode happened to recover because its single-pass interpreter could push the target schema down, masking the bug. Case 15 only verifies that four metadata-only ALTERs add zero mutations and that the freshly-added subfields read back as defaults. The data content is irrelevant. Switch to a deterministic INSERT that writes 100 rows where every Nullable subfield is NULL, every Map is empty, and the inner Array is empty — matching the existing `.reference` line by line.
Contributor
Collaborator
Author
…s JSON-hint metadata-only conversions `tupleAddsSubfieldsOnly` was calling `isMetadataOnlyConversion(..., nullptr)`. When the caller in `MergeTreeData::checkAlterIsPossible` analyzes an ALTER that combines a Tuple subfield addition with a JSON-hint-only change (gated by `allow_experimental_json_lazy_type_hints`), the JSON-hint branch inside `isMetadataOnlyConversion` requires a context to look up the setting. Without it, the helper returns false for the JSON change; `tupleAddsSubfieldsOnlyImpl` then concludes the inner ALTER is not metadata-only and returns false, so the key-safety rejection is bypassed. Meanwhile `isRequireMutationStage` (called with the real context) sees the JSON change as metadata-only and skips the mutation branch, so on a key/index column the ALTER would proceed without rewriting old `primary.idx` or skip-index bytes. Pass `local_context` from `MergeTreeData::checkAlterIsPossible` through to the helper so its classification matches `isRequireMutationStage`.
e19d35c to
f8b3f03
Compare
f8b3f03 to
08e7ffb
Compare
08e7ffb to
d0987f9
Compare
d0987f9 to
2562431
Compare
…renced by an explicit skip index `columns_in_explicit_indices` is populated via `index.expression->getRequiredColumns()`, which records the literal column reference. For an index over a tuple subcolumn (e.g. `INDEX idx t.sub TYPE set(0)`), the key is the subcolumn name `t.sub`, not the storage column `t`. The existing key-safety guard for metadata-only Tuple additions in `MergeTreeData::checkAlterIsPossible` only looked up the storage column name in `columns_in_explicit_indices`, so it missed indices keyed on a subcolumn — and the ALTER would proceed, leaving `skp_idx_*.idx` bytes serialized with the old tuple shape behind. Walk the map for any key starting with `column_name + "."` so subcolumn-indexed `ALTER MODIFY COLUMN t ...` is rejected unconditionally, the same way as the whole-column-indexed case.
2562431 to
5f67bb9
Compare
5f67bb9 to
36cfddd
Compare
36cfddd to
c503bd2
Compare
…id false-positive prefix match The previous metadata-only Tuple-subfield guard checked whether any indexed column key starts with `column_name + "."`. That aliases a real subcolumn of the altered storage column (e.g. `t.sub` of column `t`) with an unrelated top-level column whose name happens to start with that prefix (e.g. column `data.deeper` when altering `data`). Resolve each indexed key through `tryGetColumnOrSubcolumn` and only reject when `getNameInStorage()` matches the column being altered. Add Case 18d to cover the false-positive that the previous check produced.
c503bd2 to
57257f0
Compare
bf0219d to
1ffba07
Compare
1ffba07 to
095cbbf
Compare
095cbbf to
468e305
Compare
…ERT and merge When all values of a named-Tuple subfield in a part are type-defaults, the writer omits that subfield's stream files and narrows the part's columns.txt Tuple type so it no longer mentions the subfield. Reads see the narrowed Tuple type and use `CAST(narrowed_tuple, full_tuple)` to materialize defaults, relying on the metadata-only ALTER work in ClickHouse#107305. This optimization is most useful for `PARTITION BY` schemes where different partitions populate different subsets of a wide schema's subfields: the on-disk part keeps only the substreams whose subfield actually appears in that partition. Approach - Reuse the existing whole-column pruning path (`IMergedBlockOutputStream::removeEmptyColumnsFromPart` consuming `new_data_part->expired_columns`). - Extend that path to accept dotted subfield names (`data.c2s.gold`) and narrow the column's Tuple type via the new `narrowDataTypeByExpiredSubstreams` helper in `DataTypes/Utils`. - After the prune pass, keep `columns_substreams.txt` consistent with the on-disk files via the new `ColumnsSubstreams::removeSubstreams` helper. - Preserve each kept subfield's `SerializationInfo` (its sparse / default kind and per-element `num_rows` / `num_defaults`) when narrowing the enclosing Tuple, via the new `narrowSerializationInfo` helper. - INSERT: `MergeTreeDataWriter` traverses each named-Tuple column with the new `IColumn::hasOnlyTypeDefaults` to spot all-default subtrees and contributes their dotted paths to `expired_columns`. - Merge (Sub-case A): `MergeTask::prepare` computes the union of leaf substreams across all source parts and marks any leaf absent from every source as expired in the merged part. This is monotonic: a merged part never re-materializes default values for a subfield that was consistently pruned in the inputs. Why top-level all-default columns are intentionally NOT pruned If we erased a top-level Tuple column whose value is entirely default, the part would semantically lose that column ("missing column" — equivalent to a column that was added by a later `ALTER ADD COLUMN`). A subsequent `ALTER MODIFY COLUMN ... DEFAULT <new_expr>` would then re-materialize the column with the NEW default expression on read, retroactively changing historical data. That is exactly the quirk tracked by ClickHouse#92475 (`ALTER MODIFY ... DEFAULT` rewriting old parts). This PR sidesteps the problem by leaving top-level columns alone: subfield pruning only narrows the Tuple type of a column that still exists. The materialized 0 / '' / `[]` bytes of the kept columns pin the part's semantics; future `ALTER MODIFY ... DEFAULT` changes apply only to parts written after the ALTER, matching today's whole-column behavior. Named-Tuple subfields have no per-subfield DEFAULT expression syntax (`Tuple(a Int64 DEFAULT 5)` is not a valid type), so pruning a subfield can only ever fall back to the language's type-default (0 / '' / NULL). This is also why the optimization composes cleanly with the per-column DEFAULT RFC in ClickHouse#92475 (comment 4334850399): subfield pruning operates entirely below the column boundary the RFC will redefine. What is NOT touched - Compact parts: early return preserved; pruning only fires for Wide parts. - Patch parts: skipped (mirrors the existing whole-column behavior). - Mutate path: not pruned; mutations preserve the existing schema. - Top-level all-default columns: see note above. - `PR ClickHouse#98472`'s column-level `skip_empty_columns_on_insert` mechanism: only the `hasOnlyTypeDefaults` column primitives are lifted, none of its signalling layer (no `WITH_SKIPPED_COLUMNS` serialization version, no JSON `skipped_columns` field, no DEFAULT-expression interaction). Gate - `enable_tuple_subfield_pruning` (default true) gates the entire feature in `MergeTreeSettings`. The history entry is recorded under 26.6. Compatibility - No on-disk format change: parts written by this PR are readable by any server that has the metadata-only-ALTER work in ClickHouse#107305. Tests - `tests/queries/0_stateless/04320_tuple_subfield_pruning.sql` exercises 36 cases: flat / nested Tuple, Nullable wrap, Array(Tuple) (all-empty and non-empty), Map(K, Tuple), `LowCardinality(String)`, deep customer-like schema, `PARTITION BY` per-partition narrowing, setting OFF, Compact-part preservation, two-part merge variants (both pruned, one pruned, different subfields pruned), `INSERT SELECT` / async INSERT / materialized view, `ReplacingMergeTree` merge, vertical merge, `LWD`, `ALTER MODIFY ADD subfield + INSERT`, `ALTER UPDATE` mutation on narrowed part, multi-granule part, `DETACH / ATTACH PARTITION`, top-level column with a dot in its name, force-sparse + pruning interaction, subcolumn reads of pruned subfields, `CHECK TABLE` on a pruned part, and `bytes_on_disk` comparison. ### Documentation entry for user-facing changes - [x] Documentation is not required. ### Changelog category (leave one): - Improvement ### Changelog entry: Automatically prune named-Tuple subfields whose values in a part are entirely type-defaults: the writer omits their stream files and records a narrowed Tuple type in `columns.txt`; reads materialize defaults via `CAST`. Gated by the new MergeTree-level setting `enable_tuple_subfield_pruning` (default on).
468e305 to
b3b6ce4
Compare
Contributor
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered by tests: 510/641 (79.56%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 45 line(s) · Uncovered code |
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

When all values of a named-
Tuplesubfield in a part are type-defaults, the writer omits that subfield's stream files and narrows the part'scolumns.txtTupletype. Reads see the narrowed type and let ClickHouse's existing missing-substream /fillMissingColumnsmachinery fill in defaults — same code path that handles columns added by a laterALTER ADD COLUMN. Independent of #107305.Most useful for
PARTITION BYschemes where different partitions populate different subsets of a wide schema's subfields: each part keeps only the streams whose subfield actually appears.Approach
IMergedBlockOutputStream::removeEmptyColumnsFromPartconsumingnew_data_part->expired_columns); extend it to accept dotted subfield names (data.c2s.gold) and narrow the column'sTupletype via the newnarrowDataTypeByExpiredSubstreamshelper inDataTypes/Utils.SerializationInfo(sparse/default kind,num_rows,num_defaults) when narrowing the enclosingTuple, via the newnarrowSerializationInfohelper. Without this, sparse-encoded streams are misread as default-encoded.columns_substreams.txtconsistent with on-disk files via the newColumnsSubstreams::removeSubstreams.MergeTreeDataWritertraverses each named-Tuplecolumn with the newIColumn::hasOnlyTypeDefaultsand contributes all-default subtree paths toexpired_columns.MergeTask::preparetakes the union of leaf substreams across all source parts; any leaf absent from every source is expired in the merged part.Why top-level all-default columns are intentionally NOT pruned
Erasing a top-level column whose value is entirely default would turn it into "missing column" — a subsequent
ALTER MODIFY COLUMN ... DEFAULT <new_expr>would then re-materialize it with the NEW default, retroactively changing historical data. That is the quirk tracked by #92475.This PR sidesteps the problem by leaving top-level columns alone; the materialized
0/''/[]bytes pin the part's semantics. Subfield pruning is also forward-compatible with the per-columnDEFAULTRFC in #92475 (comment), because named-Tuplesubfields have noDEFAULTexpression syntax (Tuple(a Int64 DEFAULT 5)is not a valid type), so pruning can only ever fall back to the language type-default.Not touched
Compact parts, patch parts, mutations, top-level all-default columns. The
hasOnlyTypeDefaultsprimitives are lifted from #98472 but its signalling layer (noWITH_SKIPPED_COLUMNSserialization version, no JSONskipped_columns, noDEFAULT-expression interaction) is not.Gate
enable_tuple_subfield_pruning(defaulttrue), recorded inSettingsChangesHistoryunder26.6.Compatibility
No on-disk format change; the narrowed
columns.txt+ missing stream files are read as a regular type-promotion (Tuple(a, b)part →Tuple(a, b, c)schema). Any version of ClickHouse can read parts written by this PR.Tests
tests/queries/0_stateless/04320_tuple_subfield_pruning.sql— 36 cases covering flat/nestedTuple,Nullablewrap,Array(Tuple)(all-empty and non-empty),Map(K, Tuple),LowCardinality(String), deep customer schema,PARTITION BYper-partition narrowing, setting OFF, Compact-part preservation, two-part merges (both pruned / one pruned / different subfields pruned),INSERT SELECT, asyncINSERT, materialized view,ReplacingMergeTree, vertical merge, lightweightDELETE,ALTER MODIFYadding subfield +INSERT,ALTER UPDATEmutation on narrowed part, multi-granule part,DETACH/ATTACH PARTITION, top-level column with dot in name, force-sparse + pruning interaction, subcolumn reads,CHECK TABLE,bytes_on_diskcomparison.Documentation entry for user-facing changes
Changelog category (leave one):
Changelog entry:
Automatically prune named-
Tuplesubfields whose values in a part are entirely type-defaults: the writer omits their stream files and records a narrowedTupletype incolumns.txt; reads materialize defaults viaCAST. Gated by the new MergeTree-level settingenable_tuple_subfield_pruning(default on).