Prune named-Tuple subfields whose values are all type-defaults at INSERT and merge by amosbird · Pull Request #107409 · ClickHouse/ClickHouse · GitHub
Skip to content

Prune named-Tuple subfields whose values are all type-defaults at INSERT and merge#107409

Open
amosbird wants to merge 10 commits into
ClickHouse:masterfrom
amosbird:tuple-subfield-pruning
Open

Prune named-Tuple subfields whose values are all type-defaults at INSERT and merge#107409
amosbird wants to merge 10 commits into
ClickHouse:masterfrom
amosbird:tuple-subfield-pruning

Conversation

@amosbird

@amosbird amosbird commented Jun 13, 2026

Copy link
Copy Markdown
Collaborator

Stacked on top of #107305 in this branch's git history; the diff against master therefore includes #107305's commits. The new code introduced by this PR is in commit e19d35ce34e. The two PRs are independent in scope and can be reviewed / merged in any order.

When all values of a named-Tuple subfield in a part are type-defaults, the writer omits that subfield's stream files and narrows the part's columns.txt Tuple type. Reads see the narrowed type and let ClickHouse's existing missing-substream / fillMissingColumns machinery fill in defaults — same code path that handles columns added by a later ALTER ADD COLUMN. Independent of #107305.

Most useful for PARTITION BY schemes where different partitions populate different subsets of a wide schema's subfields: each part keeps only the streams whose subfield actually appears.

Approach

  • Reuse the existing whole-column pruning path (IMergedBlockOutputStream::removeEmptyColumnsFromPart consuming new_data_part->expired_columns); extend it to accept dotted subfield names (data.c2s.gold) and narrow the column's Tuple type via the new narrowDataTypeByExpiredSubstreams helper in DataTypes/Utils.
  • Preserve each kept subfield's SerializationInfo (sparse/default kind, num_rows, num_defaults) when narrowing the enclosing Tuple, via the new narrowSerializationInfo helper. Without this, sparse-encoded streams are misread as default-encoded.
  • Keep columns_substreams.txt consistent with on-disk files via the new ColumnsSubstreams::removeSubstreams.
  • INSERT: MergeTreeDataWriter traverses each named-Tuple column with the new IColumn::hasOnlyTypeDefaults and contributes all-default subtree paths to expired_columns.
  • Merge (Sub-case A, monotonic): MergeTask::prepare takes the union of leaf substreams across all source parts; any leaf absent from every source is expired in the merged part.

Why top-level all-default columns are intentionally NOT pruned

Erasing a top-level column whose value is entirely default would turn it into "missing column" — a subsequent ALTER MODIFY COLUMN ... DEFAULT <new_expr> would then re-materialize it with the NEW default, retroactively changing historical data. That is the quirk tracked by #92475.

This PR sidesteps the problem by leaving top-level columns alone; the materialized 0 / '' / [] bytes pin the part's semantics. Subfield pruning is also forward-compatible with the per-column DEFAULT RFC in #92475 (comment), because named-Tuple subfields have no DEFAULT expression syntax (Tuple(a Int64 DEFAULT 5) is not a valid type), so pruning can only ever fall back to the language type-default.

Not touched

Compact parts, patch parts, mutations, top-level all-default columns. The hasOnlyTypeDefaults primitives are lifted from #98472 but its signalling layer (no WITH_SKIPPED_COLUMNS serialization version, no JSON skipped_columns, no DEFAULT-expression interaction) is not.

Gate

enable_tuple_subfield_pruning (default true), recorded in SettingsChangesHistory under 26.6.

Compatibility

No on-disk format change; the narrowed columns.txt + missing stream files are read as a regular type-promotion (Tuple(a, b) part → Tuple(a, b, c) schema). Any version of ClickHouse can read parts written by this PR.

Tests

tests/queries/0_stateless/04320_tuple_subfield_pruning.sql — 36 cases covering flat/nested Tuple, Nullable wrap, Array(Tuple) (all-empty and non-empty), Map(K, Tuple), LowCardinality(String), deep customer schema, PARTITION BY per-partition narrowing, setting OFF, Compact-part preservation, two-part merges (both pruned / one pruned / different subfields pruned), INSERT SELECT, async INSERT, materialized view, ReplacingMergeTree, vertical merge, lightweight DELETE, ALTER MODIFY adding subfield + INSERT, ALTER UPDATE mutation on narrowed part, multi-granule part, DETACH/ATTACH PARTITION, top-level column with dot in name, force-sparse + pruning interaction, subcolumn reads, CHECK TABLE, bytes_on_disk comparison.

Documentation entry for user-facing changes

  • Documentation is not required.

Changelog category (leave one):

  • Improvement

Changelog entry:

Automatically prune named-Tuple subfields whose values in a part are entirely type-defaults: the writer omits their stream files and records a narrowed Tuple type in columns.txt; reads materialize defaults via CAST. Gated by the new MergeTree-level setting enable_tuple_subfield_pruning (default on).

amosbird added 6 commits June 12, 2026 14:17
…fields

Each subfield of a named `Tuple` is stored as a separate stream named by the
field path (e.g. `data.c2s.statistics.heros_statistics.damage.bin`), not by
position. Adding a new subfield at any position only introduces new stream
files; all preexisting subfield streams are name-stable and remain valid.
Reading an old part for a newly-added subfield falls back to the type's
default value via the existing `fillMissingColumns` infrastructure, and
the existing CAST(named-Tuple -> named-Tuple-superset) handles whole-tuple
reads element-wise by name.

Extend `isMetadataOnlyConversion` to recognize named-Tuple subfield
additions (recursive, through Array / Nullable / Map / Tuple wrappers).
For named Tuples, every old subfield must still be present by name with a
metadata-only-compatible type, and any new subfields are allowed in any
position. Removing/renaming subfields or changing an existing subfield's
type in a non-metadata-only way still requires a full mutation.

Map recursion is added in the same change to enable
`Map(K, Tuple(a, b))` -> `Map(K, Tuple(a, b, c))` and is naturally
useful on its own.

Impact: on a 2M-row / 1.3 GiB game-analytics table with a deeply nested
`Tuple(Tuple(Array(Tuple(...))))` schema, the customer's
`ALTER MODIFY COLUMN data Tuple(..., new_field Nullable(T))` drops from
~9s (full part rewrite) to ~12ms (metadata only), with no `Mutation`
created and no part rewritten. Subsequent merges materialize the new
subfield's stream files transparently.
Run key, index, and projection safety checks for any `MODIFY_COLUMN`
that changes the column's type, regardless of whether the change is
metadata-only or requires a mutation.

Previously these checks were gated behind `isRequireMutationStage`, so
metadata-only conversions (e.g. adding a subfield to a named `Tuple`
whose subcolumn appears in `ORDER BY`) bypassed them, leaving stale
`primary.idx` / partition key bytes that no longer match the new tuple
arity. This was the blocking AI review finding on ClickHouse#107305.

Add regression tests covering:
- a subcolumn of the modified Tuple appearing in `ORDER BY`
- a subcolumn appearing in `PARTITION BY`
- the whole Tuple column appearing in `ORDER BY` (`primary.idx`
  arity would mismatch — caught by `isSafeForKeyConversion`)
- sanity: a Tuple not in any key is still metadata-only ALTER-able
The previous attempt to validate key/index safety for metadata-only
`MODIFY COLUMN` ran the checks for any type change, which broke five
existing tests: `isSafeForKeyConversion` is stricter than
`isMetadataOnlyConversion` (e.g. it rejects `DateTime` -> `UInt32`
on a sorting-key column even though the binary representation is
identical), and the column-conversion dry-run was incorrectly invoked
on columns that became `ALIAS` after the alter.

The actual hazard introduced by this PR is narrower: only named-Tuple
subfield additions change the on-disk shape of a key column. Add
`tupleAddsSubfieldsOnly(from, to)` that returns true iff the type
change is a named-Tuple subfield addition (recursing through Array /
Nullable / Map / nested Tuple) and gate the new key/index safety
checks on it. Restore the original mutation-stage branch unchanged.
Two follow-ups to the previous `tupleAddsSubfieldsOnly`-gated check:

1. Nested Tuple additions where the outer field count does not change.
   The previous implementation walked through one wrapper at a time and
   only checked the field set at the first Tuple it reached. A change
   like `Tuple(a UInt64, sub Tuple(x UInt64)) -> Tuple(a UInt64, sub
   Tuple(x UInt64, y UInt64))` returned false because the outer field
   names match, so the key/index guard was skipped and a column whose
   subcolumn appears in `ORDER BY` could still have stale
   `primary.idx` bytes.

   Reimplement as a recursive walk: for every named-Tuple level, if a
   field is present in both old and new with the same type, continue;
   if the field's type itself changes, require it to be either a nested
   Tuple addition (recursion) or a metadata-only conversion (e.g. Enum
   widening, Date <-> UInt16). A nested Tuple addition flips
   `found_addition` so the outer "size didn't grow but inner did"
   case is detected. To call `isMetadataOnlyConversion` from the
   recursion, move the helper into the same anonymous namespace and
   keep a small free-function wrapper exposed for callers without a
   query context.

2. Explicit skip indexes were only rejected under
   `alter_column_secondary_index_mode` modes `THROW` / `COMPATIBILITY`.
   Under the default `REBUILD` mode the rebuild path is mutation-based:
   a metadata-only Tuple subfield addition never produces a
   `MutationCommand::READ_COLUMN`, so the rebuild logic in
   `MutationsInterpreter` cannot refresh the stale `skp_idx_*.idx`
   bytes. Reject explicit-skip-index columns unconditionally, with a
   clear error message asking the user to drop and re-create the index
   after the ALTER.

New test cases (added to 04319_named_tuple_metadata_only_alter):
- 17a/17b/17c cover nested Tuple additions with the parent
  whole-column, the subcolumn under the addition, and a sibling
  subcolumn each appearing in ORDER BY.
- 17d confirms a nested Tuple addition with no key dependency still
  succeeds as metadata-only.
- 18a/18b cover an explicit `set` skip index on the whole column,
  rejected under both the default and the `rebuild` mode.
Reorder-only changes like `Tuple(a UInt64, b UInt64) -> Tuple(b UInt64,
a UInt64)` were treated as metadata-only by
`isNamedTupleMetadataOnlyChange`, which only verified that every old name
still existed in the new type. But `primary.idx` bytes and explicit
skip-index bytes for an embedded `Tuple` value are serialized in the old
field order, while the new type deserializes and compares in the new
order, so the metadata-only path produced inconsistent on-disk vs
in-memory ordering.

Tighten both `isNamedTupleMetadataOnlyChange` and the helper
`tupleAddsSubfieldsOnly` to additionally require that the existing fields
appear in `to` in the same relative order as in `from`. Insertions are
still allowed at any position because they do not move existing fields
relative to each other.

The Compact-part concern from the AI review is not a real issue: each
part records its own `columns.txt` and `columns_substreams.txt`, so
`SerializationTuple` deserializes by named substream offsets even when
all substreams share `data.bin`. Adding a new subfield simply leaves the
new substream absent from `columns_substreams.txt`, and `CAST` fills it
with the type default during `performRequiredConversions`. Verified by
running the existing test 04319 on a Compact part.

New tests:
- 19a covers a pure reorder triggering a mutation (not metadata-only).
- 19b covers insertion-preserving-order remaining metadata-only.
`generateRandom()` without an explicit structure argument does NOT consult the
INSERT target schema in server mode — it infers a random schema each time
(observed column counts: 54, 122, 134, ...), so the subsequent
`INSERT INTO t SELECT * FROM generateRandom() LIMIT 100` intermittently fails
with `NUMBER_OF_COLUMNS_DOESNT_MATCH`. Local mode happened to recover because
its single-pass interpreter could push the target schema down, masking the bug.

Case 15 only verifies that four metadata-only ALTERs add zero mutations and
that the freshly-added subfields read back as defaults. The data content is
irrelevant. Switch to a deterministic INSERT that writes 100 rows where every
Nullable subfield is NULL, every Map is empty, and the inner Array is empty —
matching the existing `.reference` line by line.
@clickhouse-gh

clickhouse-gh Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-improvement Pull request with some product improvements label Jun 13, 2026
Comment thread src/Storages/AlterCommands.cpp Outdated
Comment thread src/Storages/MergeTree/MergeTreeDataWriter.cpp Outdated
@amosbird

Copy link
Copy Markdown
Collaborator Author

@Fgrtue This is a narrower, safer special case of #98472 — restricted to named-Tuple subfields, which have no DEFAULT-expression syntax and therefore can't trigger the ALTER MODIFY ... DEFAULT quirk from #92475 (comment). Would appreciate your review.

…s JSON-hint metadata-only conversions

`tupleAddsSubfieldsOnly` was calling `isMetadataOnlyConversion(..., nullptr)`. When the
caller in `MergeTreeData::checkAlterIsPossible` analyzes an ALTER that combines a Tuple
subfield addition with a JSON-hint-only change (gated by
`allow_experimental_json_lazy_type_hints`), the JSON-hint branch inside
`isMetadataOnlyConversion` requires a context to look up the setting. Without it, the
helper returns false for the JSON change; `tupleAddsSubfieldsOnlyImpl` then concludes
the inner ALTER is not metadata-only and returns false, so the key-safety rejection
is bypassed. Meanwhile `isRequireMutationStage` (called with the real context) sees the
JSON change as metadata-only and skips the mutation branch, so on a key/index column the
ALTER would proceed without rewriting old `primary.idx` or skip-index bytes.

Pass `local_context` from `MergeTreeData::checkAlterIsPossible` through to the helper
so its classification matches `isRequireMutationStage`.
@amosbird amosbird force-pushed the tuple-subfield-pruning branch from e19d35c to f8b3f03 Compare June 13, 2026 17:01
Comment thread src/Storages/MergeTree/IMergedBlockOutputStream.cpp Outdated
Comment thread src/Core/SettingsChangesHistory.cpp
@amosbird amosbird force-pushed the tuple-subfield-pruning branch from f8b3f03 to 08e7ffb Compare June 13, 2026 18:00
Comment thread src/Columns/ColumnString.cpp Outdated
@amosbird amosbird force-pushed the tuple-subfield-pruning branch from 08e7ffb to d0987f9 Compare June 14, 2026 03:29
Comment thread src/Storages/MergeTree/MergeTask.cpp Outdated
Comment thread src/Storages/MergeTree/MergeTreeData.cpp
@amosbird amosbird force-pushed the tuple-subfield-pruning branch from d0987f9 to 2562431 Compare June 14, 2026 06:53
Comment thread src/Storages/MergeTree/MergeTask.cpp
…renced by an explicit skip index

`columns_in_explicit_indices` is populated via `index.expression->getRequiredColumns()`,
which records the literal column reference. For an index over a tuple subcolumn
(e.g. `INDEX idx t.sub TYPE set(0)`), the key is the subcolumn name `t.sub`, not the
storage column `t`. The existing key-safety guard for metadata-only Tuple additions
in `MergeTreeData::checkAlterIsPossible` only looked up the storage column name in
`columns_in_explicit_indices`, so it missed indices keyed on a subcolumn — and the
ALTER would proceed, leaving `skp_idx_*.idx` bytes serialized with the old tuple
shape behind.

Walk the map for any key starting with `column_name + "."` so subcolumn-indexed
`ALTER MODIFY COLUMN t ...` is rejected unconditionally, the same way as the
whole-column-indexed case.
@amosbird amosbird force-pushed the tuple-subfield-pruning branch from 2562431 to 5f67bb9 Compare June 14, 2026 07:50
Comment thread src/DataTypes/Utils.cpp
@amosbird amosbird force-pushed the tuple-subfield-pruning branch from 5f67bb9 to 36cfddd Compare June 14, 2026 08:15
Comment thread src/Columns/ColumnVector.cpp
@amosbird amosbird force-pushed the tuple-subfield-pruning branch from 36cfddd to c503bd2 Compare June 14, 2026 08:37
Comment thread src/Storages/MergeTree/MergeTreeSettings.cpp Outdated
Comment thread src/Storages/MergeTree/MergeTreeData.cpp Outdated
…id false-positive prefix match

The previous metadata-only Tuple-subfield guard checked whether any indexed column
key starts with `column_name + "."`. That aliases a real subcolumn of the altered
storage column (e.g. `t.sub` of column `t`) with an unrelated top-level column
whose name happens to start with that prefix (e.g. column `data.deeper` when
altering `data`).

Resolve each indexed key through `tryGetColumnOrSubcolumn` and only reject when
`getNameInStorage()` matches the column being altered. Add Case 18d to cover the
false-positive that the previous check produced.
@amosbird amosbird force-pushed the tuple-subfield-pruning branch from c503bd2 to 57257f0 Compare June 14, 2026 09:00
Comment thread src/Storages/MergeTree/IMergedBlockOutputStream.cpp
@amosbird amosbird force-pushed the tuple-subfield-pruning branch 2 times, most recently from bf0219d to 1ffba07 Compare June 14, 2026 09:42
Comment thread src/Storages/MergeTree/MergeTreeSettings.cpp Outdated
@amosbird amosbird force-pushed the tuple-subfield-pruning branch from 1ffba07 to 095cbbf Compare June 14, 2026 10:00
Comment thread src/Storages/MergeTree/MergeTreeDataWriter.cpp
Comment thread src/Storages/MergeTree/MergeTreeDataWriter.cpp
@amosbird amosbird force-pushed the tuple-subfield-pruning branch from 095cbbf to 468e305 Compare June 14, 2026 10:22
Comment thread src/Storages/MergeTree/MergeTask.cpp
Comment thread src/Storages/MergeTree/MergeTask.cpp
…ERT and merge

When all values of a named-Tuple subfield in a part are type-defaults, the
writer omits that subfield's stream files and narrows the part's columns.txt
Tuple type so it no longer mentions the subfield. Reads see the narrowed
Tuple type and use `CAST(narrowed_tuple, full_tuple)` to materialize defaults,
relying on the metadata-only ALTER work in ClickHouse#107305.

This optimization is most useful for `PARTITION BY` schemes where different
partitions populate different subsets of a wide schema's subfields: the
on-disk part keeps only the substreams whose subfield actually appears in
that partition.

Approach

- Reuse the existing whole-column pruning path
  (`IMergedBlockOutputStream::removeEmptyColumnsFromPart` consuming
  `new_data_part->expired_columns`).
- Extend that path to accept dotted subfield names (`data.c2s.gold`) and
  narrow the column's Tuple type via the new
  `narrowDataTypeByExpiredSubstreams` helper in `DataTypes/Utils`.
- After the prune pass, keep `columns_substreams.txt` consistent with the
  on-disk files via the new `ColumnsSubstreams::removeSubstreams` helper.
- Preserve each kept subfield's `SerializationInfo` (its sparse / default
  kind and per-element `num_rows` / `num_defaults`) when narrowing the
  enclosing Tuple, via the new `narrowSerializationInfo` helper.
- INSERT: `MergeTreeDataWriter` traverses each named-Tuple column with the
  new `IColumn::hasOnlyTypeDefaults` to spot all-default subtrees and
  contributes their dotted paths to `expired_columns`.
- Merge (Sub-case A): `MergeTask::prepare` computes the union of leaf
  substreams across all source parts and marks any leaf absent from every
  source as expired in the merged part. This is monotonic: a merged part
  never re-materializes default values for a subfield that was consistently
  pruned in the inputs.

Why top-level all-default columns are intentionally NOT pruned

If we erased a top-level Tuple column whose value is entirely default, the
part would semantically lose that column ("missing column" — equivalent to a
column that was added by a later `ALTER ADD COLUMN`). A subsequent
`ALTER MODIFY COLUMN ... DEFAULT <new_expr>` would then re-materialize the
column with the NEW default expression on read, retroactively changing
historical data. That is exactly the quirk tracked by ClickHouse#92475
(`ALTER MODIFY ... DEFAULT` rewriting old parts).

This PR sidesteps the problem by leaving top-level columns alone: subfield
pruning only narrows the Tuple type of a column that still exists. The
materialized 0 / '' / `[]` bytes of the kept columns pin the part's
semantics; future `ALTER MODIFY ... DEFAULT` changes apply only to parts
written after the ALTER, matching today's whole-column behavior.

Named-Tuple subfields have no per-subfield DEFAULT expression syntax
(`Tuple(a Int64 DEFAULT 5)` is not a valid type), so pruning a subfield can
only ever fall back to the language's type-default (0 / '' / NULL). This is
also why the optimization composes cleanly with the per-column DEFAULT RFC
in ClickHouse#92475 (comment 4334850399): subfield pruning operates entirely below
the column boundary the RFC will redefine.

What is NOT touched

- Compact parts: early return preserved; pruning only fires for Wide parts.
- Patch parts: skipped (mirrors the existing whole-column behavior).
- Mutate path: not pruned; mutations preserve the existing schema.
- Top-level all-default columns: see note above.
- `PR ClickHouse#98472`'s column-level `skip_empty_columns_on_insert` mechanism: only
  the `hasOnlyTypeDefaults` column primitives are lifted, none of its
  signalling layer (no `WITH_SKIPPED_COLUMNS` serialization version, no
  JSON `skipped_columns` field, no DEFAULT-expression interaction).

Gate

- `enable_tuple_subfield_pruning` (default true) gates the entire feature in
  `MergeTreeSettings`. The history entry is recorded under 26.6.

Compatibility

- No on-disk format change: parts written by this PR are readable by any
  server that has the metadata-only-ALTER work in ClickHouse#107305.

Tests

- `tests/queries/0_stateless/04320_tuple_subfield_pruning.sql` exercises 36
  cases: flat / nested Tuple, Nullable wrap, Array(Tuple) (all-empty and
  non-empty), Map(K, Tuple), `LowCardinality(String)`, deep customer-like
  schema, `PARTITION BY` per-partition narrowing, setting OFF, Compact-part
  preservation, two-part merge variants (both pruned, one pruned, different
  subfields pruned), `INSERT SELECT` / async INSERT / materialized view,
  `ReplacingMergeTree` merge, vertical merge, `LWD`, `ALTER MODIFY ADD
  subfield + INSERT`, `ALTER UPDATE` mutation on narrowed part,
  multi-granule part, `DETACH / ATTACH PARTITION`, top-level column with a
  dot in its name, force-sparse + pruning interaction, subcolumn reads of
  pruned subfields, `CHECK TABLE` on a pruned part, and `bytes_on_disk`
  comparison.

### Documentation entry for user-facing changes

- [x] Documentation is not required.

### Changelog category (leave one):

- Improvement

### Changelog entry:

Automatically prune named-Tuple subfields whose values in a part are
entirely type-defaults: the writer omits their stream files and records a
narrowed Tuple type in `columns.txt`; reads materialize defaults via
`CAST`. Gated by the new MergeTree-level setting
`enable_tuple_subfield_pruning` (default on).
@amosbird amosbird force-pushed the tuple-subfield-pruning branch from 468e305 to b3b6ce4 Compare June 14, 2026 10:43
@clickhouse-gh

clickhouse-gh Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.80% 84.80% +0.00%
Functions 92.40% 92.40% +0.00%
Branches 77.30% 77.40% +0.10%

Changed lines: Changed C/C++ lines covered by tests: 510/641 (79.56%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 45 line(s) · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant