Skip writing all-default columns during MergeTree INSERT by amosbird · Pull Request #98472 · ClickHouse/ClickHouse · GitHub
Skip to content

Skip writing all-default columns during MergeTree INSERT#98472

Open
amosbird wants to merge 1 commit into
ClickHouse:masterfrom
amosbird:skip-empty-columns
Open

Skip writing all-default columns during MergeTree INSERT#98472
amosbird wants to merge 1 commit into
ClickHouse:masterfrom
amosbird:skip-empty-columns

Conversation

@amosbird

@amosbird amosbird commented Mar 2, 2026

Copy link
Copy Markdown
Collaborator

Summary

During MergeTree INSERT, columns whose values are entirely type-defaults (e.g., all zeros for UInt64, all empty strings for String, all NULLs for Nullable) are detected and excluded from the part's column list before constructing MergedBlockOutputStream. This avoids writing unnecessary .bin files (Wide parts) or data streams (Compact parts), saving disk space for sparse-update workloads where most columns in each INSERT are left at their type's default. The optimization is opt-in via the MergeTree setting skip_empty_columns_on_insert (off by default). It additionally requires serialization_info_version to be set to with_missing_columns (the format version that records frozen defaults for missing columns), so that a cluster pinned to a lower version for compatibility never writes parts that older servers cannot read.

The block itself is passed intact to the writer, so skip indices, projections, primary index, and min-max index are all computed from the full data. Reading a part that lacks a column fills it with type-defaults automatically — the same mechanism used by ALTER TABLE ADD COLUMN on existing parts.

To keep reads stable, a structured missing_columns array is recorded in the part's serialization.json (a new WITH_MISSING_COLUMNS serialization-info version). Each entry carries the column name and a type_default marker. On read, fillMissingColumns consults this marker and fills a missing column with its type-default even if the column later gains a new DEFAULT expression, so that a subsequent ALTER MODIFY COLUMN ... DEFAULT does not retroactively change the values that were actually inserted. The marker is propagated through merges, mutations, and on-the-fly ALTER RENAME COLUMN, so the inserted type-defaults survive part-lifecycle operations.

The MissingColumnInfo struct also reserves a DefaultKind::Expression variant for future use (issue #92475: ALTER MODIFY DEFAULT freezes old expression into parts). This PR only writes type_default markers; reading an expression marker throws CORRUPTED_DATA to fail closed until Phase 2 implements expression evaluation.

Related: #4968, #92475

On-disk format (serialization.json):

{
  "missing_columns": [
    { "name": "b", "default": "type_default" }
  ],
  "version": 2
}

Changes:

  • Add MergeTree setting skip_empty_columns_on_insert (default false).
  • Add IColumn::hasOnlyTypeDefaults with optimized overrides for ColumnVector/ColumnDecimal (memoryIsZero), ColumnString/ColumnArray (zero offsets), ColumnNullable (all-1 null map), ColumnSparse (no stored non-default values), and ColumnTuple (delegates to sub-columns).
  • Filter all-default columns in MergeTreeDataWriter::writeTempPartImpl (skipEmptyColumnsOnInsert). Columns with a DEFAULT/MATERIALIZED/ALIAS expression are never skipped. A column is skipped only when IDataType::getDefault() coincides with the column's zero representation (isDefaultAt(0) on a column filled via getDefault()), which correctly excludes types like Date32 (type-default 1900-01-01 ≠ memory-zero) and Enum (first declared value ≠ 0). Patch parts are excluded, and at least one physical column is always kept.
  • Gate the optimization on serialization_info_version >= with_missing_columns. The populating step is authoritative about the part format version, so SerializationInfoByName::getVersion never silently upgrades a part past the configured value (which would make older servers reject it during a rolling upgrade).
  • Record the missing columns in serialization.json via SerializationInfoByName (new WITH_MISSING_COLUMNS version). Only type_default markers are written; expression markers are rejected on read until Phase 2. The list is written in sorted order so that identical parts produce identical checksums on different replicas.
  • Propagate the missing-columns marker through the part lifecycle: through merges (MergeTask, for columns that end up absent from the merged part), through mutations (MutateTask::getColumnsForNewDataPart, including renames of missing columns), through compact-part renames (splitAndModifyMutationCommands), and through on-the-fly ALTER RENAME COLUMN on read (IMergeTreeReader::fillMissingColumns translates the requested name back through alter_conversions).
  • Add a stateless test with 46 cases covering: basic skip; all-default keeps one column; merges (horizontal and vertical); mutation; Nullable; key columns; DEFAULT expression not skipped; Array; Tuple; ColumnSparse source; compact parts; LowCardinality; stable values across ALTER ... DEFAULT; a non-zero-default Enum; marker across mutation/merge/rename after ALTER DEFAULT; version gate; DETACH/ATTACH TABLE; DETACH/ATTACH PARTITION; BACKUP/RESTORE; FREEZE; CHECK TABLE; MATERIALIZE COLUMN; CLEAR COLUMN; lightweight DELETE; chained mutations; INSERT SELECT; REPLACE PARTITION; ATTACH PARTITION FROM; ALTER ADD/DROP COLUMN; projections; FixedString; Map; mixed parts merge; Date/DateTime; pre-ADD-COLUMN + skip merge; type-changing mutation; compact-part rename+mutation; Date32 regression.
  • Add an integration test (test_skip_empty_columns) with 6 cases: replicated consistency; merge marker propagation across replicas; mixed-version version gate; backward-compat fallback for old parts; restart durability; REPLACE PARTITION across replicated tables.

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

MergeTree INSERT can now skip writing columns whose values are entirely type-defaults (zeros, empty strings, NULLs), saving disk space for sparse-update workloads. Enabled by the MergeTree setting skip_empty_columns_on_insert together with serialization_info_version = 'with_missing_columns'. Missing columns carry frozen defaults in serialization.json, so a later ALTER MODIFY COLUMN ... DEFAULT does not retroactively change the inserted values.

Documentation entry for user-facing changes

  • Documentation is not required (bug fix, no new user-facing feature)

@clickhouse-gh

clickhouse-gh Bot commented Mar 2, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-feature Pull request with new product feature label Mar 2, 2026
Comment thread src/Storages/MergeTree/MergeTreeDataWriter.cpp Outdated
@amosbird amosbird force-pushed the skip-empty-columns branch from facf467 to 92d5f86 Compare March 19, 2026 09:31
@amosbird amosbird marked this pull request as ready for review March 19, 2026 23:54
@amosbird

Copy link
Copy Markdown
Collaborator Author

Hi @Fgrtue, I noticed you’ve assigned yourself to #92475. Would you be interested in taking a look at this one as well?

@Fgrtue Fgrtue self-assigned this Mar 20, 2026
@Fgrtue

Fgrtue commented Mar 20, 2026

Copy link
Copy Markdown
Contributor

@amosbird thank for letting me know! Indeed, I will take a look.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an INSERT-time optimization for MergeTree to avoid writing data streams/files for columns that are entirely type-default within an inserted block, reducing disk usage for sparse-update patterns.

Changes:

  • Introduces skip_empty_columns_on_insert MergeTree setting and applies it in MergeTreeDataWriter::writeTempPartImpl by filtering all-default columns from the part’s written column list.
  • Adds IColumn::hasOnlyDefaults() (with implementations/overrides for several column types) to efficiently detect all-default columns.
  • Adds a new stateless test and reference output covering several correctness scenarios around missing columns.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/Storages/MergeTree/MergeTreeDataWriter.cpp Filters all-default columns before constructing MergedBlockOutputStream; toggles reset_columns when filtering occurred.
src/Storages/MergeTree/MergeTreeSettings.cpp Documents new MergeTree setting skip_empty_columns_on_insert.
src/Core/SettingsChangesHistory.cpp Records the new setting in settings change history.
src/Columns/IColumn.h / src/Columns/IColumn.cpp Adds and implements (via IColumnHelper) the new hasOnlyDefaults() API.
src/Columns/ColumnVector.h Adds a fast-path hasOnlyDefaults() using memoryIsZero.
src/Columns/ColumnConst.h / ColumnFixedString.h / ColumnDecimal.h Adds hasOnlyDefaults() overrides for common fixed-size representations.
src/Columns/ColumnLazy.h/.cpp, ColumnUnique.h, ColumnCompressed.h, ColumnBLOB.h, ColumnFunction.h, ColumnAggregateFunction.h, IColumnDummy.h Adds required hasOnlyDefaults() overrides/implementations (some throw / return conservative defaults).
tests/queries/0_stateless/04006_skip_empty_columns_on_insert.sql New stateless test cases for the optimization.
tests/queries/0_stateless/04006_skip_empty_columns_on_insert.reference Expected output for the new stateless test.

Comment thread tests/queries/0_stateless/04006_skip_empty_columns_on_insert.sql
Comment on lines +842 to +851
if (!empty_columns.empty())
{
auto filtered = columns.eraseNames(empty_columns);
if (!filtered.empty())
{
columns = std::move(filtered);
has_empty_columns = true;
for (const auto & name : empty_columns)
infos.erase(name);
}

Copilot AI Mar 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filtering logic won’t skip any columns when all columns in the block are defaults: columns.eraseNames(empty_columns) will return an empty list, and the if (!filtered.empty()) guard prevents applying the filter at all. This contradicts the feature’s intent (skip all-default columns) and keeps writing unnecessary files for fully-default inserts. Consider keeping at least one “anchor” column (e.g., first physical column or a key column) and removing the rest, and update the test case to assert the reduced on-disk columns set.

Copilot uses AI. Check for mistakes.
Comment on lines +832 to +841
const auto & columns_description = metadata_snapshot->getColumns();
NameSet empty_columns;
for (const auto & col : block)
{
auto col_default = columns_description.getDefault(col.name);
if (col_default && col_default->expression)
continue;
if (col.column->hasOnlyDefaults())
empty_columns.insert(col.name);
}

Copilot AI Mar 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty_columns is collected by iterating over every column in block. At this point columns is getAllPhysical().filter(block.getNames()), while block can also contain temporary/subcolumns added for sorting key / skip-index expressions. Those non-physical columns can be marked “empty” even though they will never be written, which can cause has_empty_columns to become true without actually removing any physical columns and trigger reset_columns unnecessarily. Consider iterating over columns (and looking up each in block) instead of iterating over block.

Copilot uses AI. Check for mistakes.
Comment thread src/Storages/MergeTree/MergeTreeSettings.cpp Outdated
Comment thread src/Columns/ColumnVector.h Outdated
Comment on lines +321 to +324
bool hasOnlyDefaults() const override
{
return memoryIsZero(data.data(), 0, data.size() * sizeof(T));
}

Copilot AI Mar 20, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColumnVector<T>::hasOnlyDefaults() uses a raw memoryIsZero check, but isDefaultAt() uses data[n] == T{}. For floating-point types, -0.0 == 0.0 is true, so a column containing only -0.0 values is “all-default” per isDefaultAt() but will return false here (bytes are not all zero). This makes hasOnlyDefaults() inconsistent with the column’s own default semantics. Consider falling back to an isDefaultAt() loop for floating-point T (or otherwise aligning the definition).

Copilot uses AI. Check for mistakes.

@Fgrtue Fgrtue left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to suggest to add custom hasOnlyDefaults() implementation for the following columns:

  1. ColumnString -- if I understand correctly, we could just check that the offsets are all 0
  2. ColumnNull -- probably memoryIsByte could have been used?
  3. ColumnArray -- likely we could also check that offsets are empty (equal to 0), as in case of ColumnString
  4. ColumnSparse -- it seems that only checking the offsets would be enough
  5. ColumnTuple -- at the moment we will call isDefaultAt() NxM times irrespectively if the column inside the tuple have optimized custom version of hasOnlyDefaults(). If we rewrite hasOnlyDefaults() method to just propagating the call to the columns that are stored in tuple, we might get some performance improvement.

Comment thread src/Columns/IColumn.h Outdated
Comment thread src/Storages/MergeTree/MergeTreeDataWriter.cpp Outdated
@amosbird amosbird force-pushed the skip-empty-columns branch from 08e6a6d to ec0cdad Compare March 23, 2026 17:43
Comment thread src/Storages/MergeTree/MergeTreeSettings.cpp
@amosbird

Copy link
Copy Markdown
Collaborator Author

@Fgrtue It seems the CH Inc sync requires manual resolution.

@Fgrtue

Fgrtue commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

@amosbird should be done. I wanted to take a second quick look at the PR today, I will update you on the results. Just to make sure, that's the final version so far, right?

@amosbird

amosbird commented Mar 25, 2026

Copy link
Copy Markdown
Collaborator Author

Just to make sure, that's the final version so far, right?

Yes. (I mistakenly configured Copilot to force push, which appears to have overridden the existing reviews. Sorry about that.)

@Fgrtue

Fgrtue commented Mar 25, 2026

Copy link
Copy Markdown
Contributor

@amosbird, did you have a chance to see my previous review suggestion about adding optimized hasOnlyTypeDefaults() for ColumnString, ColumnNull, ColumnArray, ColumnSparse, and ColumnTuple? Do you think if would make sense to add them?

@amosbird

Copy link
Copy Markdown
Collaborator Author

Thanks for the tips! I've added optimized hasOnlyTypeDefaults for all five types in the latest push:

  • ColumnStringmemoryIsZero on the offsets array (all empty strings ⟹ all offsets are zero)
  • ColumnNullablememoryIsByte(..., 1) on the null map (all NULL ⟹ all bytes are 1)
  • ColumnArraymemoryIsZero on the offsets array (all empty arrays ⟹ all offsets are zero)
  • ColumnSparse — checks whether offsets is empty (no non-default values stored)
  • ColumnTuple — delegates to each sub-column's hasOnlyTypeDefaults with early exit

Test cases 5, 8, 9, and 10 exercise ColumnNullable, ColumnArray, ColumnTuple, and ColumnSparse respectively. ColumnString is covered by the existing cases (1, 3, 4).

(data.supportsTransactions() && context->getCurrentTransaction()) ? context->getCurrentTransaction()->tid : Tx::PrehistoricTID,
block.bytes(),
/*reset_columns=*/ false,
/*reset_columns=*/ has_empty_columns,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to understand whether we need to set reset_columns to true in case we have empty columns.

So far I found the following that we use reset_columns in three contexts:

  • MergedBlockOutputStream.cpp:40
  • MergedBlockOutputStream.cpp:221
  • MergedBlockOutputStream.cpp:442

In two of them (on line 40 and 442) it seems that we won't get any new information in infos. The third one (221) I did not verify completely (I will), but on the first glance it looks that reset_columns doesn't influence that part as well.

Could you check please if it is needed and why?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right — reset_columns = true is not needed here. I traced through all three sites:

  1. Constructor (IMergedBlockOutputStream.cpp:40-41): Initializes new_serialization_infos = SerializationInfoByName(columns_list, info_settings) — but note info_settings has choose_kind = false (line 32), while we already computed the real serialization info with choose_kind = true at MergeTreeDataWriter.cpp:875 and set it via new_data_part->setColumns(columns, infos, ...) at line 893.

  2. writeImpl (MergedBlockOutputStream.cpp:442-443): new_serialization_infos.add(block) — accumulates stats from the written block, but this is the same block we already ran infos.add(block) on at line 882. So it just recomputes equivalent statistics.

  3. finalizePartAsync (MergedBlockOutputStream.cpp:221-231): This does three things:

    • serialization_infos.replaceData(new_serialization_infos) — replaces only the data member (not kind_stack) with equivalent stats from step 2.
    • removeEmptyColumnsFromPart(new_part, part_columns, new_part->expired_columns, ...) — this is a no-op because expired_columns is empty. It is only populated in the merge path (MergeTask.cpp:610-626) for TTL-expired columns, never during INSERT.
    • new_part->setColumns(part_columns, serialization_infos, ...) — redundant, since we already called setColumns with the correct filtered columns and infos at line 893.

So indeed the entire reset_columns block is a no-op in our INSERT path. I will change it to false.

@Fgrtue Fgrtue left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are good. I wanted to suggest to add the following test cases:

  1. Testing for compact parts (i.e. use min_bytes_for_wide_part != 0, min_rows_for_wide_part = 0) for a) skipping column b) merging parts
  2. Could we also add a check for LowCardinality column as this is a pretty often use case?
  3. Regarding merges test, what do you think if we test both type of merges: vertical and horizontal ones?

Comment thread src/Columns/ColumnSparse.cpp Outdated

bool ColumnSparse::hasOnlyTypeDefaults() const
{
return _size == 0 || getOffsetsData().empty();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking of a case when sparce column consists only of of non-default type (for example 5). It seems to me that we will not distinguish between sparce column with just one (any) value.

Moreover, the generic version IColumnHelper<Derived, Parent>::hasOnlyTypeDefaults() seems to give a wrong result as well.

Even though now it does not lead to data corruption (i.e. returning type defaults instead of the DEFAULT elements at values[0]), it seems that this is a wrong implementation for this function.

If my reasoning is right, we could fix this by checking that the element at 0 index of values is default type itself, i.e values->isDefaultAt(0).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed: added && values->isDefaultAt(0) so we verify the actual stored default value, not just the absence of offsets. The generic IColumnHelper::hasOnlyTypeDefaults() fallback also gives wrong results for ColumnSparse (since isDefaultAt(n) just checks getValueIndex(n) == 0), but the specialized override now handles this correctly.

Comment thread src/Core/SettingsChangesHistory.cpp Outdated
{
addSettingsChanges(merge_tree_settings_changes_history, "26.4",
{
{"skip_empty_columns_on_insert", false, false, "New setting to skip writing all-default columns on INSERT"},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably it would be more accurate way to say:

Suggested change
{"skip_empty_columns_on_insert", false, false, "New setting to skip writing all-default columns on INSERT"},
{"skip_empty_columns_on_insert", false, false, "New setting to skip writing all type default columns on INSERT"},

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ORDER BY column;

SELECT 'case7_data';
SELECT key, a, b FROM t_skip_empty_default_expr ORDER BY key;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we SELECT key, a, b, c here? Or is this intentional?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added c to the SELECT. The reference now shows 1 5 0 50, confirming the MATERIALIZED expression a * 10 is correctly evaluated.

@amosbird

Copy link
Copy Markdown
Collaborator Author

Added 4 new test cases addressing the review:

  • Case 11: compact parts (skip + merge, with min_bytes_for_wide_part = 1000000000)
  • Case 12: LowCardinality(String) (generic isDefaultAt through dictionary)
  • Case 13: vertical merge (enable_vertical_merge_algorithm = 1)
  • Case 14: horizontal merge (enable_vertical_merge_algorithm = 0)

Comment thread src/Columns/ColumnString.h Outdated
@alexey-milovidov

Copy link
Copy Markdown
Member

The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix.

@alexey-milovidov

Copy link
Copy Markdown
Member

The Can't adjust last granule error in CI is a known issue. The fix is in #101641

@CurtizJ CurtizJ left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces an inconsistency: if a user inserts a column with all-zero values and then changes its default expression, the values returned on read will change as well. The same inconsistency already exists for columns added via ADD COLUMN whose default expression is later modified, so this is not a new problem, but it may be worth avoiding in this case.

Maybe we can store a marker in serialization_infos.json that records whether the column was physically written, so the reader can fill in defaults correctly?

Comment thread src/DataTypes/Serializations/SerializationInfo.cpp Outdated
@amosbird

Copy link
Copy Markdown
Collaborator Author

Maybe we can store a marker in serialization_infos.json that records whether the column was physically written, so the reader can fill in defaults correctly?

@CurtizJ This might conflict with the proposal in #92475

amosbird added a commit to amosbird/ClickHouse that referenced this pull request Jun 13, 2026
…ERT and merge

When all values of a named-Tuple subfield in a part are type-defaults, the
writer omits that subfield's stream files and narrows the part's columns.txt
Tuple type so it no longer mentions the subfield. Reads see the narrowed
Tuple type and use `CAST(narrowed_tuple, full_tuple)` to materialize defaults,
relying on the metadata-only ALTER work in ClickHouse#107305.

This optimization is most useful for `PARTITION BY` schemes where different
partitions populate different subsets of a wide schema's subfields: the
on-disk part keeps only the substreams whose subfield actually appears in
that partition.

Approach

- Reuse the existing whole-column pruning path
  (`IMergedBlockOutputStream::removeEmptyColumnsFromPart` consuming
  `new_data_part->expired_columns`).
- Extend that path to accept dotted subfield names (`data.c2s.gold`) and
  narrow the column's Tuple type via the new
  `narrowDataTypeByExpiredSubstreams` helper in `DataTypes/Utils`.
- After the prune pass, keep `columns_substreams.txt` consistent with the
  on-disk files via the new `ColumnsSubstreams::removeSubstreams` helper.
- Preserve each kept subfield's `SerializationInfo` (its sparse / default
  kind and per-element `num_rows` / `num_defaults`) when narrowing the
  enclosing Tuple, via the new `narrowSerializationInfo` helper.
- INSERT: `MergeTreeDataWriter` traverses each named-Tuple column with the
  new `IColumn::hasOnlyTypeDefaults` to spot all-default subtrees and
  contributes their dotted paths to `expired_columns`.
- Merge (Sub-case A): `MergeTask::prepare` computes the union of leaf
  substreams across all source parts and marks any leaf absent from every
  source as expired in the merged part. This is monotonic: a merged part
  never re-materializes default values for a subfield that was consistently
  pruned in the inputs.

Why top-level all-default columns are intentionally NOT pruned

If we erased a top-level Tuple column whose value is entirely default, the
part would semantically lose that column ("missing column" — equivalent to a
column that was added by a later `ALTER ADD COLUMN`). A subsequent
`ALTER MODIFY COLUMN ... DEFAULT <new_expr>` would then re-materialize the
column with the NEW default expression on read, retroactively changing
historical data. That is exactly the quirk tracked by ClickHouse#92475
(`ALTER MODIFY ... DEFAULT` rewriting old parts).

This PR sidesteps the problem by leaving top-level columns alone: subfield
pruning only narrows the Tuple type of a column that still exists. The
materialized 0 / '' / `[]` bytes of the kept columns pin the part's
semantics; future `ALTER MODIFY ... DEFAULT` changes apply only to parts
written after the ALTER, matching today's whole-column behavior.

Named-Tuple subfields have no per-subfield DEFAULT expression syntax
(`Tuple(a Int64 DEFAULT 5)` is not a valid type), so pruning a subfield can
only ever fall back to the language's type-default (0 / '' / NULL). This is
also why the optimization composes cleanly with the per-column DEFAULT RFC
in ClickHouse#92475 (comment 4334850399): subfield pruning operates entirely below
the column boundary the RFC will redefine.

What is NOT touched

- Compact parts: early return preserved; pruning only fires for Wide parts.
- Patch parts: skipped (mirrors the existing whole-column behavior).
- Mutate path: not pruned; mutations preserve the existing schema.
- Top-level all-default columns: see note above.
- `PR ClickHouse#98472`'s column-level `skip_empty_columns_on_insert` mechanism: only
  the `hasOnlyTypeDefaults` column primitives are lifted, none of its
  signalling layer (no `WITH_SKIPPED_COLUMNS` serialization version, no
  JSON `skipped_columns` field, no DEFAULT-expression interaction).

Gate

- `enable_tuple_subfield_pruning` (default true) gates the entire feature in
  `MergeTreeSettings`. The history entry is recorded under 26.6.

Compatibility

- No on-disk format change: parts written by this PR are readable by any
  server that has the metadata-only-ALTER work in ClickHouse#107305.

Tests

- `tests/queries/0_stateless/04320_tuple_subfield_pruning.sql` exercises 36
  cases: flat / nested Tuple, Nullable wrap, Array(Tuple) (all-empty and
  non-empty), Map(K, Tuple), `LowCardinality(String)`, deep customer-like
  schema, `PARTITION BY` per-partition narrowing, setting OFF, Compact-part
  preservation, two-part merge variants (both pruned, one pruned, different
  subfields pruned), `INSERT SELECT` / async INSERT / materialized view,
  `ReplacingMergeTree` merge, vertical merge, `LWD`, `ALTER MODIFY ADD
  subfield + INSERT`, `ALTER UPDATE` mutation on narrowed part,
  multi-granule part, `DETACH / ATTACH PARTITION`, top-level column with a
  dot in its name, force-sparse + pruning interaction, subcolumn reads of
  pruned subfields, `CHECK TABLE` on a pruned part, and `bytes_on_disk`
  comparison.

### Documentation entry for user-facing changes

- [x] Documentation is not required.

### Changelog category (leave one):

- Improvement

### Changelog entry:

Automatically prune named-Tuple subfields whose values in a part are
entirely type-defaults: the writer omits their stream files and records a
narrowed Tuple type in `columns.txt`; reads materialize defaults via
`CAST`. Gated by the new MergeTree-level setting
`enable_tuple_subfield_pruning` (default on).
amosbird added a commit to amosbird/ClickHouse that referenced this pull request Jun 14, 2026
…ERT and merge

When all values of a named-Tuple subfield in a part are type-defaults, the
writer omits that subfield's stream files and narrows the part's columns.txt
Tuple type so it no longer mentions the subfield. Reads see the narrowed
Tuple type and use `CAST(narrowed_tuple, full_tuple)` to materialize defaults,
relying on the metadata-only ALTER work in ClickHouse#107305.

This optimization is most useful for `PARTITION BY` schemes where different
partitions populate different subsets of a wide schema's subfields: the
on-disk part keeps only the substreams whose subfield actually appears in
that partition.

Approach

- Reuse the existing whole-column pruning path
  (`IMergedBlockOutputStream::removeEmptyColumnsFromPart` consuming
  `new_data_part->expired_columns`).
- Extend that path to accept dotted subfield names (`data.c2s.gold`) and
  narrow the column's Tuple type via the new
  `narrowDataTypeByExpiredSubstreams` helper in `DataTypes/Utils`.
- After the prune pass, keep `columns_substreams.txt` consistent with the
  on-disk files via the new `ColumnsSubstreams::removeSubstreams` helper.
- Preserve each kept subfield's `SerializationInfo` (its sparse / default
  kind and per-element `num_rows` / `num_defaults`) when narrowing the
  enclosing Tuple, via the new `narrowSerializationInfo` helper.
- INSERT: `MergeTreeDataWriter` traverses each named-Tuple column with the
  new `IColumn::hasOnlyTypeDefaults` to spot all-default subtrees and
  contributes their dotted paths to `expired_columns`.
- Merge (Sub-case A): `MergeTask::prepare` computes the union of leaf
  substreams across all source parts and marks any leaf absent from every
  source as expired in the merged part. This is monotonic: a merged part
  never re-materializes default values for a subfield that was consistently
  pruned in the inputs.

Why top-level all-default columns are intentionally NOT pruned

If we erased a top-level Tuple column whose value is entirely default, the
part would semantically lose that column ("missing column" — equivalent to a
column that was added by a later `ALTER ADD COLUMN`). A subsequent
`ALTER MODIFY COLUMN ... DEFAULT <new_expr>` would then re-materialize the
column with the NEW default expression on read, retroactively changing
historical data. That is exactly the quirk tracked by ClickHouse#92475
(`ALTER MODIFY ... DEFAULT` rewriting old parts).

This PR sidesteps the problem by leaving top-level columns alone: subfield
pruning only narrows the Tuple type of a column that still exists. The
materialized 0 / '' / `[]` bytes of the kept columns pin the part's
semantics; future `ALTER MODIFY ... DEFAULT` changes apply only to parts
written after the ALTER, matching today's whole-column behavior.

Named-Tuple subfields have no per-subfield DEFAULT expression syntax
(`Tuple(a Int64 DEFAULT 5)` is not a valid type), so pruning a subfield can
only ever fall back to the language's type-default (0 / '' / NULL). This is
also why the optimization composes cleanly with the per-column DEFAULT RFC
in ClickHouse#92475 (comment 4334850399): subfield pruning operates entirely below
the column boundary the RFC will redefine.

What is NOT touched

- Compact parts: early return preserved; pruning only fires for Wide parts.
- Patch parts: skipped (mirrors the existing whole-column behavior).
- Mutate path: not pruned; mutations preserve the existing schema.
- Top-level all-default columns: see note above.
- `PR ClickHouse#98472`'s column-level `skip_empty_columns_on_insert` mechanism: only
  the `hasOnlyTypeDefaults` column primitives are lifted, none of its
  signalling layer (no `WITH_SKIPPED_COLUMNS` serialization version, no
  JSON `skipped_columns` field, no DEFAULT-expression interaction).

Gate

- `enable_tuple_subfield_pruning` (default true) gates the entire feature in
  `MergeTreeSettings`. The history entry is recorded under 26.6.

Compatibility

- No on-disk format change: parts written by this PR are readable by any
  server that has the metadata-only-ALTER work in ClickHouse#107305.

Tests

- `tests/queries/0_stateless/04320_tuple_subfield_pruning.sql` exercises 36
  cases: flat / nested Tuple, Nullable wrap, Array(Tuple) (all-empty and
  non-empty), Map(K, Tuple), `LowCardinality(String)`, deep customer-like
  schema, `PARTITION BY` per-partition narrowing, setting OFF, Compact-part
  preservation, two-part merge variants (both pruned, one pruned, different
  subfields pruned), `INSERT SELECT` / async INSERT / materialized view,
  `ReplacingMergeTree` merge, vertical merge, `LWD`, `ALTER MODIFY ADD
  subfield + INSERT`, `ALTER UPDATE` mutation on narrowed part,
  multi-granule part, `DETACH / ATTACH PARTITION`, top-level column with a
  dot in its name, force-sparse + pruning interaction, subcolumn reads of
  pruned subfields, `CHECK TABLE` on a pruned part, and `bytes_on_disk`
  comparison.

### Documentation entry for user-facing changes

- [x] Documentation is not required.

### Changelog category (leave one):

- Improvement

### Changelog entry:

Automatically prune named-Tuple subfields whose values in a part are
entirely type-defaults: the writer omits their stream files and records a
narrowed Tuple type in `columns.txt`; reads materialize defaults via
`CAST`. Gated by the new MergeTree-level setting
`enable_tuple_subfield_pruning` (default on).
amosbird added a commit to amosbird/ClickHouse that referenced this pull request Jun 14, 2026
…ERT and merge

When all values of a named-Tuple subfield in a part are type-defaults, the
writer omits that subfield's stream files and narrows the part's columns.txt
Tuple type so it no longer mentions the subfield. Reads see the narrowed
Tuple type and use `CAST(narrowed_tuple, full_tuple)` to materialize defaults,
relying on the metadata-only ALTER work in ClickHouse#107305.

This optimization is most useful for `PARTITION BY` schemes where different
partitions populate different subsets of a wide schema's subfields: the
on-disk part keeps only the substreams whose subfield actually appears in
that partition.

Approach

- Reuse the existing whole-column pruning path
  (`IMergedBlockOutputStream::removeEmptyColumnsFromPart` consuming
  `new_data_part->expired_columns`).
- Extend that path to accept dotted subfield names (`data.c2s.gold`) and
  narrow the column's Tuple type via the new
  `narrowDataTypeByExpiredSubstreams` helper in `DataTypes/Utils`.
- After the prune pass, keep `columns_substreams.txt` consistent with the
  on-disk files via the new `ColumnsSubstreams::removeSubstreams` helper.
- Preserve each kept subfield's `SerializationInfo` (its sparse / default
  kind and per-element `num_rows` / `num_defaults`) when narrowing the
  enclosing Tuple, via the new `narrowSerializationInfo` helper.
- INSERT: `MergeTreeDataWriter` traverses each named-Tuple column with the
  new `IColumn::hasOnlyTypeDefaults` to spot all-default subtrees and
  contributes their dotted paths to `expired_columns`.
- Merge (Sub-case A): `MergeTask::prepare` computes the union of leaf
  substreams across all source parts and marks any leaf absent from every
  source as expired in the merged part. This is monotonic: a merged part
  never re-materializes default values for a subfield that was consistently
  pruned in the inputs.

Why top-level all-default columns are intentionally NOT pruned

If we erased a top-level Tuple column whose value is entirely default, the
part would semantically lose that column ("missing column" — equivalent to a
column that was added by a later `ALTER ADD COLUMN`). A subsequent
`ALTER MODIFY COLUMN ... DEFAULT <new_expr>` would then re-materialize the
column with the NEW default expression on read, retroactively changing
historical data. That is exactly the quirk tracked by ClickHouse#92475
(`ALTER MODIFY ... DEFAULT` rewriting old parts).

This PR sidesteps the problem by leaving top-level columns alone: subfield
pruning only narrows the Tuple type of a column that still exists. The
materialized 0 / '' / `[]` bytes of the kept columns pin the part's
semantics; future `ALTER MODIFY ... DEFAULT` changes apply only to parts
written after the ALTER, matching today's whole-column behavior.

Named-Tuple subfields have no per-subfield DEFAULT expression syntax
(`Tuple(a Int64 DEFAULT 5)` is not a valid type), so pruning a subfield can
only ever fall back to the language's type-default (0 / '' / NULL). This is
also why the optimization composes cleanly with the per-column DEFAULT RFC
in ClickHouse#92475 (comment 4334850399): subfield pruning operates entirely below
the column boundary the RFC will redefine.

What is NOT touched

- Compact parts: early return preserved; pruning only fires for Wide parts.
- Patch parts: skipped (mirrors the existing whole-column behavior).
- Mutate path: not pruned; mutations preserve the existing schema.
- Top-level all-default columns: see note above.
- `PR ClickHouse#98472`'s column-level `skip_empty_columns_on_insert` mechanism: only
  the `hasOnlyTypeDefaults` column primitives are lifted, none of its
  signalling layer (no `WITH_SKIPPED_COLUMNS` serialization version, no
  JSON `skipped_columns` field, no DEFAULT-expression interaction).

Gate

- `enable_tuple_subfield_pruning` (default true) gates the entire feature in
  `MergeTreeSettings`. The history entry is recorded under 26.6.

Compatibility

- No on-disk format change: parts written by this PR are readable by any
  server that has the metadata-only-ALTER work in ClickHouse#107305.

Tests

- `tests/queries/0_stateless/04320_tuple_subfield_pruning.sql` exercises 36
  cases: flat / nested Tuple, Nullable wrap, Array(Tuple) (all-empty and
  non-empty), Map(K, Tuple), `LowCardinality(String)`, deep customer-like
  schema, `PARTITION BY` per-partition narrowing, setting OFF, Compact-part
  preservation, two-part merge variants (both pruned, one pruned, different
  subfields pruned), `INSERT SELECT` / async INSERT / materialized view,
  `ReplacingMergeTree` merge, vertical merge, `LWD`, `ALTER MODIFY ADD
  subfield + INSERT`, `ALTER UPDATE` mutation on narrowed part,
  multi-granule part, `DETACH / ATTACH PARTITION`, top-level column with a
  dot in its name, force-sparse + pruning interaction, subcolumn reads of
  pruned subfields, `CHECK TABLE` on a pruned part, and `bytes_on_disk`
  comparison.

### Documentation entry for user-facing changes

- [x] Documentation is not required.

### Changelog category (leave one):

- Improvement

### Changelog entry:

Automatically prune named-Tuple subfields whose values in a part are
entirely type-defaults: the writer omits their stream files and records a
narrowed Tuple type in `columns.txt`; reads materialize defaults via
`CAST`. Gated by the new MergeTree-level setting
`enable_tuple_subfield_pruning` (default on).
amosbird added a commit to amosbird/ClickHouse that referenced this pull request Jun 14, 2026
…ERT and merge

When all values of a named-Tuple subfield in a part are type-defaults, the
writer omits that subfield's stream files and narrows the part's columns.txt
Tuple type so it no longer mentions the subfield. Reads see the narrowed
Tuple type and use `CAST(narrowed_tuple, full_tuple)` to materialize defaults,
relying on the metadata-only ALTER work in ClickHouse#107305.

This optimization is most useful for `PARTITION BY` schemes where different
partitions populate different subsets of a wide schema's subfields: the
on-disk part keeps only the substreams whose subfield actually appears in
that partition.

Approach

- Reuse the existing whole-column pruning path
  (`IMergedBlockOutputStream::removeEmptyColumnsFromPart` consuming
  `new_data_part->expired_columns`).
- Extend that path to accept dotted subfield names (`data.c2s.gold`) and
  narrow the column's Tuple type via the new
  `narrowDataTypeByExpiredSubstreams` helper in `DataTypes/Utils`.
- After the prune pass, keep `columns_substreams.txt` consistent with the
  on-disk files via the new `ColumnsSubstreams::removeSubstreams` helper.
- Preserve each kept subfield's `SerializationInfo` (its sparse / default
  kind and per-element `num_rows` / `num_defaults`) when narrowing the
  enclosing Tuple, via the new `narrowSerializationInfo` helper.
- INSERT: `MergeTreeDataWriter` traverses each named-Tuple column with the
  new `IColumn::hasOnlyTypeDefaults` to spot all-default subtrees and
  contributes their dotted paths to `expired_columns`.
- Merge (Sub-case A): `MergeTask::prepare` computes the union of leaf
  substreams across all source parts and marks any leaf absent from every
  source as expired in the merged part. This is monotonic: a merged part
  never re-materializes default values for a subfield that was consistently
  pruned in the inputs.

Why top-level all-default columns are intentionally NOT pruned

If we erased a top-level Tuple column whose value is entirely default, the
part would semantically lose that column ("missing column" — equivalent to a
column that was added by a later `ALTER ADD COLUMN`). A subsequent
`ALTER MODIFY COLUMN ... DEFAULT <new_expr>` would then re-materialize the
column with the NEW default expression on read, retroactively changing
historical data. That is exactly the quirk tracked by ClickHouse#92475
(`ALTER MODIFY ... DEFAULT` rewriting old parts).

This PR sidesteps the problem by leaving top-level columns alone: subfield
pruning only narrows the Tuple type of a column that still exists. The
materialized 0 / '' / `[]` bytes of the kept columns pin the part's
semantics; future `ALTER MODIFY ... DEFAULT` changes apply only to parts
written after the ALTER, matching today's whole-column behavior.

Named-Tuple subfields have no per-subfield DEFAULT expression syntax
(`Tuple(a Int64 DEFAULT 5)` is not a valid type), so pruning a subfield can
only ever fall back to the language's type-default (0 / '' / NULL). This is
also why the optimization composes cleanly with the per-column DEFAULT RFC
in ClickHouse#92475 (comment 4334850399): subfield pruning operates entirely below
the column boundary the RFC will redefine.

What is NOT touched

- Compact parts: early return preserved; pruning only fires for Wide parts.
- Patch parts: skipped (mirrors the existing whole-column behavior).
- Mutate path: not pruned; mutations preserve the existing schema.
- Top-level all-default columns: see note above.
- `PR ClickHouse#98472`'s column-level `skip_empty_columns_on_insert` mechanism: only
  the `hasOnlyTypeDefaults` column primitives are lifted, none of its
  signalling layer (no `WITH_SKIPPED_COLUMNS` serialization version, no
  JSON `skipped_columns` field, no DEFAULT-expression interaction).

Gate

- `enable_tuple_subfield_pruning` (default true) gates the entire feature in
  `MergeTreeSettings`. The history entry is recorded under 26.6.

Compatibility

- No on-disk format change: parts written by this PR are readable by any
  server that has the metadata-only-ALTER work in ClickHouse#107305.

Tests

- `tests/queries/0_stateless/04320_tuple_subfield_pruning.sql` exercises 36
  cases: flat / nested Tuple, Nullable wrap, Array(Tuple) (all-empty and
  non-empty), Map(K, Tuple), `LowCardinality(String)`, deep customer-like
  schema, `PARTITION BY` per-partition narrowing, setting OFF, Compact-part
  preservation, two-part merge variants (both pruned, one pruned, different
  subfields pruned), `INSERT SELECT` / async INSERT / materialized view,
  `ReplacingMergeTree` merge, vertical merge, `LWD`, `ALTER MODIFY ADD
  subfield + INSERT`, `ALTER UPDATE` mutation on narrowed part,
  multi-granule part, `DETACH / ATTACH PARTITION`, top-level column with a
  dot in its name, force-sparse + pruning interaction, subcolumn reads of
  pruned subfields, `CHECK TABLE` on a pruned part, and `bytes_on_disk`
  comparison.

### Documentation entry for user-facing changes

- [x] Documentation is not required.

### Changelog category (leave one):

- Improvement

### Changelog entry:

Automatically prune named-Tuple subfields whose values in a part are
entirely type-defaults: the writer omits their stream files and records a
narrowed Tuple type in `columns.txt`; reads materialize defaults via
`CAST`. Gated by the new MergeTree-level setting
`enable_tuple_subfield_pruning` (default on).
amosbird added a commit to amosbird/ClickHouse that referenced this pull request Jun 14, 2026
…ERT and merge

When all values of a named-Tuple subfield in a part are type-defaults, the
writer omits that subfield's stream files and narrows the part's columns.txt
Tuple type so it no longer mentions the subfield. Reads see the narrowed
Tuple type and use `CAST(narrowed_tuple, full_tuple)` to materialize defaults,
relying on the metadata-only ALTER work in ClickHouse#107305.

This optimization is most useful for `PARTITION BY` schemes where different
partitions populate different subsets of a wide schema's subfields: the
on-disk part keeps only the substreams whose subfield actually appears in
that partition.

Approach

- Reuse the existing whole-column pruning path
  (`IMergedBlockOutputStream::removeEmptyColumnsFromPart` consuming
  `new_data_part->expired_columns`).
- Extend that path to accept dotted subfield names (`data.c2s.gold`) and
  narrow the column's Tuple type via the new
  `narrowDataTypeByExpiredSubstreams` helper in `DataTypes/Utils`.
- After the prune pass, keep `columns_substreams.txt` consistent with the
  on-disk files via the new `ColumnsSubstreams::removeSubstreams` helper.
- Preserve each kept subfield's `SerializationInfo` (its sparse / default
  kind and per-element `num_rows` / `num_defaults`) when narrowing the
  enclosing Tuple, via the new `narrowSerializationInfo` helper.
- INSERT: `MergeTreeDataWriter` traverses each named-Tuple column with the
  new `IColumn::hasOnlyTypeDefaults` to spot all-default subtrees and
  contributes their dotted paths to `expired_columns`.
- Merge (Sub-case A): `MergeTask::prepare` computes the union of leaf
  substreams across all source parts and marks any leaf absent from every
  source as expired in the merged part. This is monotonic: a merged part
  never re-materializes default values for a subfield that was consistently
  pruned in the inputs.

Why top-level all-default columns are intentionally NOT pruned

If we erased a top-level Tuple column whose value is entirely default, the
part would semantically lose that column ("missing column" — equivalent to a
column that was added by a later `ALTER ADD COLUMN`). A subsequent
`ALTER MODIFY COLUMN ... DEFAULT <new_expr>` would then re-materialize the
column with the NEW default expression on read, retroactively changing
historical data. That is exactly the quirk tracked by ClickHouse#92475
(`ALTER MODIFY ... DEFAULT` rewriting old parts).

This PR sidesteps the problem by leaving top-level columns alone: subfield
pruning only narrows the Tuple type of a column that still exists. The
materialized 0 / '' / `[]` bytes of the kept columns pin the part's
semantics; future `ALTER MODIFY ... DEFAULT` changes apply only to parts
written after the ALTER, matching today's whole-column behavior.

Named-Tuple subfields have no per-subfield DEFAULT expression syntax
(`Tuple(a Int64 DEFAULT 5)` is not a valid type), so pruning a subfield can
only ever fall back to the language's type-default (0 / '' / NULL). This is
also why the optimization composes cleanly with the per-column DEFAULT RFC
in ClickHouse#92475 (comment 4334850399): subfield pruning operates entirely below
the column boundary the RFC will redefine.

What is NOT touched

- Compact parts: early return preserved; pruning only fires for Wide parts.
- Patch parts: skipped (mirrors the existing whole-column behavior).
- Mutate path: not pruned; mutations preserve the existing schema.
- Top-level all-default columns: see note above.
- `PR ClickHouse#98472`'s column-level `skip_empty_columns_on_insert` mechanism: only
  the `hasOnlyTypeDefaults` column primitives are lifted, none of its
  signalling layer (no `WITH_SKIPPED_COLUMNS` serialization version, no
  JSON `skipped_columns` field, no DEFAULT-expression interaction).

Gate

- `enable_tuple_subfield_pruning` (default true) gates the entire feature in
  `MergeTreeSettings`. The history entry is recorded under 26.6.

Compatibility

- No on-disk format change: parts written by this PR are readable by any
  server that has the metadata-only-ALTER work in ClickHouse#107305.

Tests

- `tests/queries/0_stateless/04320_tuple_subfield_pruning.sql` exercises 36
  cases: flat / nested Tuple, Nullable wrap, Array(Tuple) (all-empty and
  non-empty), Map(K, Tuple), `LowCardinality(String)`, deep customer-like
  schema, `PARTITION BY` per-partition narrowing, setting OFF, Compact-part
  preservation, two-part merge variants (both pruned, one pruned, different
  subfields pruned), `INSERT SELECT` / async INSERT / materialized view,
  `ReplacingMergeTree` merge, vertical merge, `LWD`, `ALTER MODIFY ADD
  subfield + INSERT`, `ALTER UPDATE` mutation on narrowed part,
  multi-granule part, `DETACH / ATTACH PARTITION`, top-level column with a
  dot in its name, force-sparse + pruning interaction, subcolumn reads of
  pruned subfields, `CHECK TABLE` on a pruned part, and `bytes_on_disk`
  comparison.

### Documentation entry for user-facing changes

- [x] Documentation is not required.

### Changelog category (leave one):

- Improvement

### Changelog entry:

Automatically prune named-Tuple subfields whose values in a part are
entirely type-defaults: the writer omits their stream files and records a
narrowed Tuple type in `columns.txt`; reads materialize defaults via
`CAST`. Gated by the new MergeTree-level setting
`enable_tuple_subfield_pruning` (default on).
amosbird added a commit to amosbird/ClickHouse that referenced this pull request Jun 14, 2026
…ERT and merge

When all values of a named-Tuple subfield in a part are type-defaults, the
writer omits that subfield's stream files and narrows the part's columns.txt
Tuple type so it no longer mentions the subfield. Reads see the narrowed
Tuple type and use `CAST(narrowed_tuple, full_tuple)` to materialize defaults,
relying on the metadata-only ALTER work in ClickHouse#107305.

This optimization is most useful for `PARTITION BY` schemes where different
partitions populate different subsets of a wide schema's subfields: the
on-disk part keeps only the substreams whose subfield actually appears in
that partition.

Approach

- Reuse the existing whole-column pruning path
  (`IMergedBlockOutputStream::removeEmptyColumnsFromPart` consuming
  `new_data_part->expired_columns`).
- Extend that path to accept dotted subfield names (`data.c2s.gold`) and
  narrow the column's Tuple type via the new
  `narrowDataTypeByExpiredSubstreams` helper in `DataTypes/Utils`.
- After the prune pass, keep `columns_substreams.txt` consistent with the
  on-disk files via the new `ColumnsSubstreams::removeSubstreams` helper.
- Preserve each kept subfield's `SerializationInfo` (its sparse / default
  kind and per-element `num_rows` / `num_defaults`) when narrowing the
  enclosing Tuple, via the new `narrowSerializationInfo` helper.
- INSERT: `MergeTreeDataWriter` traverses each named-Tuple column with the
  new `IColumn::hasOnlyTypeDefaults` to spot all-default subtrees and
  contributes their dotted paths to `expired_columns`.
- Merge (Sub-case A): `MergeTask::prepare` computes the union of leaf
  substreams across all source parts and marks any leaf absent from every
  source as expired in the merged part. This is monotonic: a merged part
  never re-materializes default values for a subfield that was consistently
  pruned in the inputs.

Why top-level all-default columns are intentionally NOT pruned

If we erased a top-level Tuple column whose value is entirely default, the
part would semantically lose that column ("missing column" — equivalent to a
column that was added by a later `ALTER ADD COLUMN`). A subsequent
`ALTER MODIFY COLUMN ... DEFAULT <new_expr>` would then re-materialize the
column with the NEW default expression on read, retroactively changing
historical data. That is exactly the quirk tracked by ClickHouse#92475
(`ALTER MODIFY ... DEFAULT` rewriting old parts).

This PR sidesteps the problem by leaving top-level columns alone: subfield
pruning only narrows the Tuple type of a column that still exists. The
materialized 0 / '' / `[]` bytes of the kept columns pin the part's
semantics; future `ALTER MODIFY ... DEFAULT` changes apply only to parts
written after the ALTER, matching today's whole-column behavior.

Named-Tuple subfields have no per-subfield DEFAULT expression syntax
(`Tuple(a Int64 DEFAULT 5)` is not a valid type), so pruning a subfield can
only ever fall back to the language's type-default (0 / '' / NULL). This is
also why the optimization composes cleanly with the per-column DEFAULT RFC
in ClickHouse#92475 (comment 4334850399): subfield pruning operates entirely below
the column boundary the RFC will redefine.

What is NOT touched

- Compact parts: early return preserved; pruning only fires for Wide parts.
- Patch parts: skipped (mirrors the existing whole-column behavior).
- Mutate path: not pruned; mutations preserve the existing schema.
- Top-level all-default columns: see note above.
- `PR ClickHouse#98472`'s column-level `skip_empty_columns_on_insert` mechanism: only
  the `hasOnlyTypeDefaults` column primitives are lifted, none of its
  signalling layer (no `WITH_SKIPPED_COLUMNS` serialization version, no
  JSON `skipped_columns` field, no DEFAULT-expression interaction).

Gate

- `enable_tuple_subfield_pruning` (default true) gates the entire feature in
  `MergeTreeSettings`. The history entry is recorded under 26.6.

Compatibility

- No on-disk format change: parts written by this PR are readable by any
  server that has the metadata-only-ALTER work in ClickHouse#107305.

Tests

- `tests/queries/0_stateless/04320_tuple_subfield_pruning.sql` exercises 36
  cases: flat / nested Tuple, Nullable wrap, Array(Tuple) (all-empty and
  non-empty), Map(K, Tuple), `LowCardinality(String)`, deep customer-like
  schema, `PARTITION BY` per-partition narrowing, setting OFF, Compact-part
  preservation, two-part merge variants (both pruned, one pruned, different
  subfields pruned), `INSERT SELECT` / async INSERT / materialized view,
  `ReplacingMergeTree` merge, vertical merge, `LWD`, `ALTER MODIFY ADD
  subfield + INSERT`, `ALTER UPDATE` mutation on narrowed part,
  multi-granule part, `DETACH / ATTACH PARTITION`, top-level column with a
  dot in its name, force-sparse + pruning interaction, subcolumn reads of
  pruned subfields, `CHECK TABLE` on a pruned part, and `bytes_on_disk`
  comparison.

### Documentation entry for user-facing changes

- [x] Documentation is not required.

### Changelog category (leave one):

- Improvement

### Changelog entry:

Automatically prune named-Tuple subfields whose values in a part are
entirely type-defaults: the writer omits their stream files and records a
narrowed Tuple type in `columns.txt`; reads materialize defaults via
`CAST`. Gated by the new MergeTree-level setting
`enable_tuple_subfield_pruning` (default on).
amosbird added a commit to amosbird/ClickHouse that referenced this pull request Jun 14, 2026
…ERT and merge

When all values of a named-Tuple subfield in a part are type-defaults, the
writer omits that subfield's stream files and narrows the part's columns.txt
Tuple type so it no longer mentions the subfield. Reads see the narrowed
Tuple type and use `CAST(narrowed_tuple, full_tuple)` to materialize defaults,
relying on the metadata-only ALTER work in ClickHouse#107305.

This optimization is most useful for `PARTITION BY` schemes where different
partitions populate different subsets of a wide schema's subfields: the
on-disk part keeps only the substreams whose subfield actually appears in
that partition.

Approach

- Reuse the existing whole-column pruning path
  (`IMergedBlockOutputStream::removeEmptyColumnsFromPart` consuming
  `new_data_part->expired_columns`).
- Extend that path to accept dotted subfield names (`data.c2s.gold`) and
  narrow the column's Tuple type via the new
  `narrowDataTypeByExpiredSubstreams` helper in `DataTypes/Utils`.
- After the prune pass, keep `columns_substreams.txt` consistent with the
  on-disk files via the new `ColumnsSubstreams::removeSubstreams` helper.
- Preserve each kept subfield's `SerializationInfo` (its sparse / default
  kind and per-element `num_rows` / `num_defaults`) when narrowing the
  enclosing Tuple, via the new `narrowSerializationInfo` helper.
- INSERT: `MergeTreeDataWriter` traverses each named-Tuple column with the
  new `IColumn::hasOnlyTypeDefaults` to spot all-default subtrees and
  contributes their dotted paths to `expired_columns`.
- Merge (Sub-case A): `MergeTask::prepare` computes the union of leaf
  substreams across all source parts and marks any leaf absent from every
  source as expired in the merged part. This is monotonic: a merged part
  never re-materializes default values for a subfield that was consistently
  pruned in the inputs.

Why top-level all-default columns are intentionally NOT pruned

If we erased a top-level Tuple column whose value is entirely default, the
part would semantically lose that column ("missing column" — equivalent to a
column that was added by a later `ALTER ADD COLUMN`). A subsequent
`ALTER MODIFY COLUMN ... DEFAULT <new_expr>` would then re-materialize the
column with the NEW default expression on read, retroactively changing
historical data. That is exactly the quirk tracked by ClickHouse#92475
(`ALTER MODIFY ... DEFAULT` rewriting old parts).

This PR sidesteps the problem by leaving top-level columns alone: subfield
pruning only narrows the Tuple type of a column that still exists. The
materialized 0 / '' / `[]` bytes of the kept columns pin the part's
semantics; future `ALTER MODIFY ... DEFAULT` changes apply only to parts
written after the ALTER, matching today's whole-column behavior.

Named-Tuple subfields have no per-subfield DEFAULT expression syntax
(`Tuple(a Int64 DEFAULT 5)` is not a valid type), so pruning a subfield can
only ever fall back to the language's type-default (0 / '' / NULL). This is
also why the optimization composes cleanly with the per-column DEFAULT RFC
in ClickHouse#92475 (comment 4334850399): subfield pruning operates entirely below
the column boundary the RFC will redefine.

What is NOT touched

- Compact parts: early return preserved; pruning only fires for Wide parts.
- Patch parts: skipped (mirrors the existing whole-column behavior).
- Mutate path: not pruned; mutations preserve the existing schema.
- Top-level all-default columns: see note above.
- `PR ClickHouse#98472`'s column-level `skip_empty_columns_on_insert` mechanism: only
  the `hasOnlyTypeDefaults` column primitives are lifted, none of its
  signalling layer (no `WITH_SKIPPED_COLUMNS` serialization version, no
  JSON `skipped_columns` field, no DEFAULT-expression interaction).

Gate

- `enable_tuple_subfield_pruning` (default true) gates the entire feature in
  `MergeTreeSettings`. The history entry is recorded under 26.6.

Compatibility

- No on-disk format change: parts written by this PR are readable by any
  server that has the metadata-only-ALTER work in ClickHouse#107305.

Tests

- `tests/queries/0_stateless/04320_tuple_subfield_pruning.sql` exercises 36
  cases: flat / nested Tuple, Nullable wrap, Array(Tuple) (all-empty and
  non-empty), Map(K, Tuple), `LowCardinality(String)`, deep customer-like
  schema, `PARTITION BY` per-partition narrowing, setting OFF, Compact-part
  preservation, two-part merge variants (both pruned, one pruned, different
  subfields pruned), `INSERT SELECT` / async INSERT / materialized view,
  `ReplacingMergeTree` merge, vertical merge, `LWD`, `ALTER MODIFY ADD
  subfield + INSERT`, `ALTER UPDATE` mutation on narrowed part,
  multi-granule part, `DETACH / ATTACH PARTITION`, top-level column with a
  dot in its name, force-sparse + pruning interaction, subcolumn reads of
  pruned subfields, `CHECK TABLE` on a pruned part, and `bytes_on_disk`
  comparison.

### Documentation entry for user-facing changes

- [x] Documentation is not required.

### Changelog category (leave one):

- Improvement

### Changelog entry:

Automatically prune named-Tuple subfields whose values in a part are
entirely type-defaults: the writer omits their stream files and records a
narrowed Tuple type in `columns.txt`; reads materialize defaults via
`CAST`. Gated by the new MergeTree-level setting
`enable_tuple_subfield_pruning` (default on).
amosbird added a commit to amosbird/ClickHouse that referenced this pull request Jun 14, 2026
…ERT and merge

When all values of a named-Tuple subfield in a part are type-defaults, the
writer omits that subfield's stream files and narrows the part's columns.txt
Tuple type so it no longer mentions the subfield. Reads see the narrowed
Tuple type and use `CAST(narrowed_tuple, full_tuple)` to materialize defaults,
relying on the metadata-only ALTER work in ClickHouse#107305.

This optimization is most useful for `PARTITION BY` schemes where different
partitions populate different subsets of a wide schema's subfields: the
on-disk part keeps only the substreams whose subfield actually appears in
that partition.

Approach

- Reuse the existing whole-column pruning path
  (`IMergedBlockOutputStream::removeEmptyColumnsFromPart` consuming
  `new_data_part->expired_columns`).
- Extend that path to accept dotted subfield names (`data.c2s.gold`) and
  narrow the column's Tuple type via the new
  `narrowDataTypeByExpiredSubstreams` helper in `DataTypes/Utils`.
- After the prune pass, keep `columns_substreams.txt` consistent with the
  on-disk files via the new `ColumnsSubstreams::removeSubstreams` helper.
- Preserve each kept subfield's `SerializationInfo` (its sparse / default
  kind and per-element `num_rows` / `num_defaults`) when narrowing the
  enclosing Tuple, via the new `narrowSerializationInfo` helper.
- INSERT: `MergeTreeDataWriter` traverses each named-Tuple column with the
  new `IColumn::hasOnlyTypeDefaults` to spot all-default subtrees and
  contributes their dotted paths to `expired_columns`.
- Merge (Sub-case A): `MergeTask::prepare` computes the union of leaf
  substreams across all source parts and marks any leaf absent from every
  source as expired in the merged part. This is monotonic: a merged part
  never re-materializes default values for a subfield that was consistently
  pruned in the inputs.

Why top-level all-default columns are intentionally NOT pruned

If we erased a top-level Tuple column whose value is entirely default, the
part would semantically lose that column ("missing column" — equivalent to a
column that was added by a later `ALTER ADD COLUMN`). A subsequent
`ALTER MODIFY COLUMN ... DEFAULT <new_expr>` would then re-materialize the
column with the NEW default expression on read, retroactively changing
historical data. That is exactly the quirk tracked by ClickHouse#92475
(`ALTER MODIFY ... DEFAULT` rewriting old parts).

This PR sidesteps the problem by leaving top-level columns alone: subfield
pruning only narrows the Tuple type of a column that still exists. The
materialized 0 / '' / `[]` bytes of the kept columns pin the part's
semantics; future `ALTER MODIFY ... DEFAULT` changes apply only to parts
written after the ALTER, matching today's whole-column behavior.

Named-Tuple subfields have no per-subfield DEFAULT expression syntax
(`Tuple(a Int64 DEFAULT 5)` is not a valid type), so pruning a subfield can
only ever fall back to the language's type-default (0 / '' / NULL). This is
also why the optimization composes cleanly with the per-column DEFAULT RFC
in ClickHouse#92475 (comment 4334850399): subfield pruning operates entirely below
the column boundary the RFC will redefine.

What is NOT touched

- Compact parts: early return preserved; pruning only fires for Wide parts.
- Patch parts: skipped (mirrors the existing whole-column behavior).
- Mutate path: not pruned; mutations preserve the existing schema.
- Top-level all-default columns: see note above.
- `PR ClickHouse#98472`'s column-level `skip_empty_columns_on_insert` mechanism: only
  the `hasOnlyTypeDefaults` column primitives are lifted, none of its
  signalling layer (no `WITH_SKIPPED_COLUMNS` serialization version, no
  JSON `skipped_columns` field, no DEFAULT-expression interaction).

Gate

- `enable_tuple_subfield_pruning` (default true) gates the entire feature in
  `MergeTreeSettings`. The history entry is recorded under 26.6.

Compatibility

- No on-disk format change: parts written by this PR are readable by any
  server that has the metadata-only-ALTER work in ClickHouse#107305.

Tests

- `tests/queries/0_stateless/04320_tuple_subfield_pruning.sql` exercises 36
  cases: flat / nested Tuple, Nullable wrap, Array(Tuple) (all-empty and
  non-empty), Map(K, Tuple), `LowCardinality(String)`, deep customer-like
  schema, `PARTITION BY` per-partition narrowing, setting OFF, Compact-part
  preservation, two-part merge variants (both pruned, one pruned, different
  subfields pruned), `INSERT SELECT` / async INSERT / materialized view,
  `ReplacingMergeTree` merge, vertical merge, `LWD`, `ALTER MODIFY ADD
  subfield + INSERT`, `ALTER UPDATE` mutation on narrowed part,
  multi-granule part, `DETACH / ATTACH PARTITION`, top-level column with a
  dot in its name, force-sparse + pruning interaction, subcolumn reads of
  pruned subfields, `CHECK TABLE` on a pruned part, and `bytes_on_disk`
  comparison.

### Documentation entry for user-facing changes

- [x] Documentation is not required.

### Changelog category (leave one):

- Improvement

### Changelog entry:

Automatically prune named-Tuple subfields whose values in a part are
entirely type-defaults: the writer omits their stream files and records a
narrowed Tuple type in `columns.txt`; reads materialize defaults via
`CAST`. Gated by the new MergeTree-level setting
`enable_tuple_subfield_pruning` (default on).
amosbird added a commit to amosbird/ClickHouse that referenced this pull request Jun 14, 2026
…ERT and merge

When all values of a named-Tuple subfield in a part are type-defaults, the
writer omits that subfield's stream files and narrows the part's columns.txt
Tuple type so it no longer mentions the subfield. Reads see the narrowed
Tuple type and use `CAST(narrowed_tuple, full_tuple)` to materialize defaults,
relying on the metadata-only ALTER work in ClickHouse#107305.

This optimization is most useful for `PARTITION BY` schemes where different
partitions populate different subsets of a wide schema's subfields: the
on-disk part keeps only the substreams whose subfield actually appears in
that partition.

Approach

- Reuse the existing whole-column pruning path
  (`IMergedBlockOutputStream::removeEmptyColumnsFromPart` consuming
  `new_data_part->expired_columns`).
- Extend that path to accept dotted subfield names (`data.c2s.gold`) and
  narrow the column's Tuple type via the new
  `narrowDataTypeByExpiredSubstreams` helper in `DataTypes/Utils`.
- After the prune pass, keep `columns_substreams.txt` consistent with the
  on-disk files via the new `ColumnsSubstreams::removeSubstreams` helper.
- Preserve each kept subfield's `SerializationInfo` (its sparse / default
  kind and per-element `num_rows` / `num_defaults`) when narrowing the
  enclosing Tuple, via the new `narrowSerializationInfo` helper.
- INSERT: `MergeTreeDataWriter` traverses each named-Tuple column with the
  new `IColumn::hasOnlyTypeDefaults` to spot all-default subtrees and
  contributes their dotted paths to `expired_columns`.
- Merge (Sub-case A): `MergeTask::prepare` computes the union of leaf
  substreams across all source parts and marks any leaf absent from every
  source as expired in the merged part. This is monotonic: a merged part
  never re-materializes default values for a subfield that was consistently
  pruned in the inputs.

Why top-level all-default columns are intentionally NOT pruned

If we erased a top-level Tuple column whose value is entirely default, the
part would semantically lose that column ("missing column" — equivalent to a
column that was added by a later `ALTER ADD COLUMN`). A subsequent
`ALTER MODIFY COLUMN ... DEFAULT <new_expr>` would then re-materialize the
column with the NEW default expression on read, retroactively changing
historical data. That is exactly the quirk tracked by ClickHouse#92475
(`ALTER MODIFY ... DEFAULT` rewriting old parts).

This PR sidesteps the problem by leaving top-level columns alone: subfield
pruning only narrows the Tuple type of a column that still exists. The
materialized 0 / '' / `[]` bytes of the kept columns pin the part's
semantics; future `ALTER MODIFY ... DEFAULT` changes apply only to parts
written after the ALTER, matching today's whole-column behavior.

Named-Tuple subfields have no per-subfield DEFAULT expression syntax
(`Tuple(a Int64 DEFAULT 5)` is not a valid type), so pruning a subfield can
only ever fall back to the language's type-default (0 / '' / NULL). This is
also why the optimization composes cleanly with the per-column DEFAULT RFC
in ClickHouse#92475 (comment 4334850399): subfield pruning operates entirely below
the column boundary the RFC will redefine.

What is NOT touched

- Compact parts: early return preserved; pruning only fires for Wide parts.
- Patch parts: skipped (mirrors the existing whole-column behavior).
- Mutate path: not pruned; mutations preserve the existing schema.
- Top-level all-default columns: see note above.
- `PR ClickHouse#98472`'s column-level `skip_empty_columns_on_insert` mechanism: only
  the `hasOnlyTypeDefaults` column primitives are lifted, none of its
  signalling layer (no `WITH_SKIPPED_COLUMNS` serialization version, no
  JSON `skipped_columns` field, no DEFAULT-expression interaction).

Gate

- `enable_tuple_subfield_pruning` (default true) gates the entire feature in
  `MergeTreeSettings`. The history entry is recorded under 26.6.

Compatibility

- No on-disk format change: parts written by this PR are readable by any
  server that has the metadata-only-ALTER work in ClickHouse#107305.

Tests

- `tests/queries/0_stateless/04320_tuple_subfield_pruning.sql` exercises 36
  cases: flat / nested Tuple, Nullable wrap, Array(Tuple) (all-empty and
  non-empty), Map(K, Tuple), `LowCardinality(String)`, deep customer-like
  schema, `PARTITION BY` per-partition narrowing, setting OFF, Compact-part
  preservation, two-part merge variants (both pruned, one pruned, different
  subfields pruned), `INSERT SELECT` / async INSERT / materialized view,
  `ReplacingMergeTree` merge, vertical merge, `LWD`, `ALTER MODIFY ADD
  subfield + INSERT`, `ALTER UPDATE` mutation on narrowed part,
  multi-granule part, `DETACH / ATTACH PARTITION`, top-level column with a
  dot in its name, force-sparse + pruning interaction, subcolumn reads of
  pruned subfields, `CHECK TABLE` on a pruned part, and `bytes_on_disk`
  comparison.

### Documentation entry for user-facing changes

- [x] Documentation is not required.

### Changelog category (leave one):

- Improvement

### Changelog entry:

Automatically prune named-Tuple subfields whose values in a part are
entirely type-defaults: the writer omits their stream files and records a
narrowed Tuple type in `columns.txt`; reads materialize defaults via
`CAST`. Gated by the new MergeTree-level setting
`enable_tuple_subfield_pruning` (default on).
amosbird added a commit to amosbird/ClickHouse that referenced this pull request Jun 14, 2026
…ERT and merge

When all values of a named-Tuple subfield in a part are type-defaults, the
writer omits that subfield's stream files and narrows the part's columns.txt
Tuple type so it no longer mentions the subfield. Reads see the narrowed
Tuple type and use `CAST(narrowed_tuple, full_tuple)` to materialize defaults,
relying on the metadata-only ALTER work in ClickHouse#107305.

This optimization is most useful for `PARTITION BY` schemes where different
partitions populate different subsets of a wide schema's subfields: the
on-disk part keeps only the substreams whose subfield actually appears in
that partition.

Approach

- Reuse the existing whole-column pruning path
  (`IMergedBlockOutputStream::removeEmptyColumnsFromPart` consuming
  `new_data_part->expired_columns`).
- Extend that path to accept dotted subfield names (`data.c2s.gold`) and
  narrow the column's Tuple type via the new
  `narrowDataTypeByExpiredSubstreams` helper in `DataTypes/Utils`.
- After the prune pass, keep `columns_substreams.txt` consistent with the
  on-disk files via the new `ColumnsSubstreams::removeSubstreams` helper.
- Preserve each kept subfield's `SerializationInfo` (its sparse / default
  kind and per-element `num_rows` / `num_defaults`) when narrowing the
  enclosing Tuple, via the new `narrowSerializationInfo` helper.
- INSERT: `MergeTreeDataWriter` traverses each named-Tuple column with the
  new `IColumn::hasOnlyTypeDefaults` to spot all-default subtrees and
  contributes their dotted paths to `expired_columns`.
- Merge (Sub-case A): `MergeTask::prepare` computes the union of leaf
  substreams across all source parts and marks any leaf absent from every
  source as expired in the merged part. This is monotonic: a merged part
  never re-materializes default values for a subfield that was consistently
  pruned in the inputs.

Why top-level all-default columns are intentionally NOT pruned

If we erased a top-level Tuple column whose value is entirely default, the
part would semantically lose that column ("missing column" — equivalent to a
column that was added by a later `ALTER ADD COLUMN`). A subsequent
`ALTER MODIFY COLUMN ... DEFAULT <new_expr>` would then re-materialize the
column with the NEW default expression on read, retroactively changing
historical data. That is exactly the quirk tracked by ClickHouse#92475
(`ALTER MODIFY ... DEFAULT` rewriting old parts).

This PR sidesteps the problem by leaving top-level columns alone: subfield
pruning only narrows the Tuple type of a column that still exists. The
materialized 0 / '' / `[]` bytes of the kept columns pin the part's
semantics; future `ALTER MODIFY ... DEFAULT` changes apply only to parts
written after the ALTER, matching today's whole-column behavior.

Named-Tuple subfields have no per-subfield DEFAULT expression syntax
(`Tuple(a Int64 DEFAULT 5)` is not a valid type), so pruning a subfield can
only ever fall back to the language's type-default (0 / '' / NULL). This is
also why the optimization composes cleanly with the per-column DEFAULT RFC
in ClickHouse#92475 (comment 4334850399): subfield pruning operates entirely below
the column boundary the RFC will redefine.

What is NOT touched

- Compact parts: early return preserved; pruning only fires for Wide parts.
- Patch parts: skipped (mirrors the existing whole-column behavior).
- Mutate path: not pruned; mutations preserve the existing schema.
- Top-level all-default columns: see note above.
- `PR ClickHouse#98472`'s column-level `skip_empty_columns_on_insert` mechanism: only
  the `hasOnlyTypeDefaults` column primitives are lifted, none of its
  signalling layer (no `WITH_SKIPPED_COLUMNS` serialization version, no
  JSON `skipped_columns` field, no DEFAULT-expression interaction).

Gate

- `enable_tuple_subfield_pruning` (default true) gates the entire feature in
  `MergeTreeSettings`. The history entry is recorded under 26.6.

Compatibility

- No on-disk format change: parts written by this PR are readable by any
  server that has the metadata-only-ALTER work in ClickHouse#107305.

Tests

- `tests/queries/0_stateless/04320_tuple_subfield_pruning.sql` exercises 36
  cases: flat / nested Tuple, Nullable wrap, Array(Tuple) (all-empty and
  non-empty), Map(K, Tuple), `LowCardinality(String)`, deep customer-like
  schema, `PARTITION BY` per-partition narrowing, setting OFF, Compact-part
  preservation, two-part merge variants (both pruned, one pruned, different
  subfields pruned), `INSERT SELECT` / async INSERT / materialized view,
  `ReplacingMergeTree` merge, vertical merge, `LWD`, `ALTER MODIFY ADD
  subfield + INSERT`, `ALTER UPDATE` mutation on narrowed part,
  multi-granule part, `DETACH / ATTACH PARTITION`, top-level column with a
  dot in its name, force-sparse + pruning interaction, subcolumn reads of
  pruned subfields, `CHECK TABLE` on a pruned part, and `bytes_on_disk`
  comparison.

### Documentation entry for user-facing changes

- [x] Documentation is not required.

### Changelog category (leave one):

- Improvement

### Changelog entry:

Automatically prune named-Tuple subfields whose values in a part are
entirely type-defaults: the writer omits their stream files and records a
narrowed Tuple type in `columns.txt`; reads materialize defaults via
`CAST`. Gated by the new MergeTree-level setting
`enable_tuple_subfield_pruning` (default on).
amosbird added a commit to amosbird/ClickHouse that referenced this pull request Jun 14, 2026
…ERT and merge

When all values of a named-Tuple subfield in a part are type-defaults, the
writer omits that subfield's stream files and narrows the part's columns.txt
Tuple type so it no longer mentions the subfield. Reads see the narrowed
Tuple type and use `CAST(narrowed_tuple, full_tuple)` to materialize defaults,
relying on the metadata-only ALTER work in ClickHouse#107305.

This optimization is most useful for `PARTITION BY` schemes where different
partitions populate different subsets of a wide schema's subfields: the
on-disk part keeps only the substreams whose subfield actually appears in
that partition.

Approach

- Reuse the existing whole-column pruning path
  (`IMergedBlockOutputStream::removeEmptyColumnsFromPart` consuming
  `new_data_part->expired_columns`).
- Extend that path to accept dotted subfield names (`data.c2s.gold`) and
  narrow the column's Tuple type via the new
  `narrowDataTypeByExpiredSubstreams` helper in `DataTypes/Utils`.
- After the prune pass, keep `columns_substreams.txt` consistent with the
  on-disk files via the new `ColumnsSubstreams::removeSubstreams` helper.
- Preserve each kept subfield's `SerializationInfo` (its sparse / default
  kind and per-element `num_rows` / `num_defaults`) when narrowing the
  enclosing Tuple, via the new `narrowSerializationInfo` helper.
- INSERT: `MergeTreeDataWriter` traverses each named-Tuple column with the
  new `IColumn::hasOnlyTypeDefaults` to spot all-default subtrees and
  contributes their dotted paths to `expired_columns`.
- Merge (Sub-case A): `MergeTask::prepare` computes the union of leaf
  substreams across all source parts and marks any leaf absent from every
  source as expired in the merged part. This is monotonic: a merged part
  never re-materializes default values for a subfield that was consistently
  pruned in the inputs.

Why top-level all-default columns are intentionally NOT pruned

If we erased a top-level Tuple column whose value is entirely default, the
part would semantically lose that column ("missing column" — equivalent to a
column that was added by a later `ALTER ADD COLUMN`). A subsequent
`ALTER MODIFY COLUMN ... DEFAULT <new_expr>` would then re-materialize the
column with the NEW default expression on read, retroactively changing
historical data. That is exactly the quirk tracked by ClickHouse#92475
(`ALTER MODIFY ... DEFAULT` rewriting old parts).

This PR sidesteps the problem by leaving top-level columns alone: subfield
pruning only narrows the Tuple type of a column that still exists. The
materialized 0 / '' / `[]` bytes of the kept columns pin the part's
semantics; future `ALTER MODIFY ... DEFAULT` changes apply only to parts
written after the ALTER, matching today's whole-column behavior.

Named-Tuple subfields have no per-subfield DEFAULT expression syntax
(`Tuple(a Int64 DEFAULT 5)` is not a valid type), so pruning a subfield can
only ever fall back to the language's type-default (0 / '' / NULL). This is
also why the optimization composes cleanly with the per-column DEFAULT RFC
in ClickHouse#92475 (comment 4334850399): subfield pruning operates entirely below
the column boundary the RFC will redefine.

What is NOT touched

- Compact parts: early return preserved; pruning only fires for Wide parts.
- Patch parts: skipped (mirrors the existing whole-column behavior).
- Mutate path: not pruned; mutations preserve the existing schema.
- Top-level all-default columns: see note above.
- `PR ClickHouse#98472`'s column-level `skip_empty_columns_on_insert` mechanism: only
  the `hasOnlyTypeDefaults` column primitives are lifted, none of its
  signalling layer (no `WITH_SKIPPED_COLUMNS` serialization version, no
  JSON `skipped_columns` field, no DEFAULT-expression interaction).

Gate

- `enable_tuple_subfield_pruning` (default true) gates the entire feature in
  `MergeTreeSettings`. The history entry is recorded under 26.6.

Compatibility

- No on-disk format change: parts written by this PR are readable by any
  server that has the metadata-only-ALTER work in ClickHouse#107305.

Tests

- `tests/queries/0_stateless/04320_tuple_subfield_pruning.sql` exercises 36
  cases: flat / nested Tuple, Nullable wrap, Array(Tuple) (all-empty and
  non-empty), Map(K, Tuple), `LowCardinality(String)`, deep customer-like
  schema, `PARTITION BY` per-partition narrowing, setting OFF, Compact-part
  preservation, two-part merge variants (both pruned, one pruned, different
  subfields pruned), `INSERT SELECT` / async INSERT / materialized view,
  `ReplacingMergeTree` merge, vertical merge, `LWD`, `ALTER MODIFY ADD
  subfield + INSERT`, `ALTER UPDATE` mutation on narrowed part,
  multi-granule part, `DETACH / ATTACH PARTITION`, top-level column with a
  dot in its name, force-sparse + pruning interaction, subcolumn reads of
  pruned subfields, `CHECK TABLE` on a pruned part, and `bytes_on_disk`
  comparison.

### Documentation entry for user-facing changes

- [x] Documentation is not required.

### Changelog category (leave one):

- Improvement

### Changelog entry:

Automatically prune named-Tuple subfields whose values in a part are
entirely type-defaults: the writer omits their stream files and records a
narrowed Tuple type in `columns.txt`; reads materialize defaults via
`CAST`. Gated by the new MergeTree-level setting
`enable_tuple_subfield_pruning` (default on).
amosbird added a commit to amosbird/ClickHouse that referenced this pull request Jun 14, 2026
…ERT and merge

When all values of a named-Tuple subfield in a part are type-defaults, the
writer omits that subfield's stream files and narrows the part's columns.txt
Tuple type so it no longer mentions the subfield. Reads see the narrowed
Tuple type and use `CAST(narrowed_tuple, full_tuple)` to materialize defaults,
relying on the metadata-only ALTER work in ClickHouse#107305.

This optimization is most useful for `PARTITION BY` schemes where different
partitions populate different subsets of a wide schema's subfields: the
on-disk part keeps only the substreams whose subfield actually appears in
that partition.

Approach

- Reuse the existing whole-column pruning path
  (`IMergedBlockOutputStream::removeEmptyColumnsFromPart` consuming
  `new_data_part->expired_columns`).
- Extend that path to accept dotted subfield names (`data.c2s.gold`) and
  narrow the column's Tuple type via the new
  `narrowDataTypeByExpiredSubstreams` helper in `DataTypes/Utils`.
- After the prune pass, keep `columns_substreams.txt` consistent with the
  on-disk files via the new `ColumnsSubstreams::removeSubstreams` helper.
- Preserve each kept subfield's `SerializationInfo` (its sparse / default
  kind and per-element `num_rows` / `num_defaults`) when narrowing the
  enclosing Tuple, via the new `narrowSerializationInfo` helper.
- INSERT: `MergeTreeDataWriter` traverses each named-Tuple column with the
  new `IColumn::hasOnlyTypeDefaults` to spot all-default subtrees and
  contributes their dotted paths to `expired_columns`.
- Merge (Sub-case A): `MergeTask::prepare` computes the union of leaf
  substreams across all source parts and marks any leaf absent from every
  source as expired in the merged part. This is monotonic: a merged part
  never re-materializes default values for a subfield that was consistently
  pruned in the inputs.

Why top-level all-default columns are intentionally NOT pruned

If we erased a top-level Tuple column whose value is entirely default, the
part would semantically lose that column ("missing column" — equivalent to a
column that was added by a later `ALTER ADD COLUMN`). A subsequent
`ALTER MODIFY COLUMN ... DEFAULT <new_expr>` would then re-materialize the
column with the NEW default expression on read, retroactively changing
historical data. That is exactly the quirk tracked by ClickHouse#92475
(`ALTER MODIFY ... DEFAULT` rewriting old parts).

This PR sidesteps the problem by leaving top-level columns alone: subfield
pruning only narrows the Tuple type of a column that still exists. The
materialized 0 / '' / `[]` bytes of the kept columns pin the part's
semantics; future `ALTER MODIFY ... DEFAULT` changes apply only to parts
written after the ALTER, matching today's whole-column behavior.

Named-Tuple subfields have no per-subfield DEFAULT expression syntax
(`Tuple(a Int64 DEFAULT 5)` is not a valid type), so pruning a subfield can
only ever fall back to the language's type-default (0 / '' / NULL). This is
also why the optimization composes cleanly with the per-column DEFAULT RFC
in ClickHouse#92475 (comment 4334850399): subfield pruning operates entirely below
the column boundary the RFC will redefine.

What is NOT touched

- Compact parts: early return preserved; pruning only fires for Wide parts.
- Patch parts: skipped (mirrors the existing whole-column behavior).
- Mutate path: not pruned; mutations preserve the existing schema.
- Top-level all-default columns: see note above.
- `PR ClickHouse#98472`'s column-level `skip_empty_columns_on_insert` mechanism: only
  the `hasOnlyTypeDefaults` column primitives are lifted, none of its
  signalling layer (no `WITH_SKIPPED_COLUMNS` serialization version, no
  JSON `skipped_columns` field, no DEFAULT-expression interaction).

Gate

- `enable_tuple_subfield_pruning` (default true) gates the entire feature in
  `MergeTreeSettings`. The history entry is recorded under 26.6.

Compatibility

- No on-disk format change: parts written by this PR are readable by any
  server that has the metadata-only-ALTER work in ClickHouse#107305.

Tests

- `tests/queries/0_stateless/04320_tuple_subfield_pruning.sql` exercises 36
  cases: flat / nested Tuple, Nullable wrap, Array(Tuple) (all-empty and
  non-empty), Map(K, Tuple), `LowCardinality(String)`, deep customer-like
  schema, `PARTITION BY` per-partition narrowing, setting OFF, Compact-part
  preservation, two-part merge variants (both pruned, one pruned, different
  subfields pruned), `INSERT SELECT` / async INSERT / materialized view,
  `ReplacingMergeTree` merge, vertical merge, `LWD`, `ALTER MODIFY ADD
  subfield + INSERT`, `ALTER UPDATE` mutation on narrowed part,
  multi-granule part, `DETACH / ATTACH PARTITION`, top-level column with a
  dot in its name, force-sparse + pruning interaction, subcolumn reads of
  pruned subfields, `CHECK TABLE` on a pruned part, and `bytes_on_disk`
  comparison.

### Documentation entry for user-facing changes

- [x] Documentation is not required.

### Changelog category (leave one):

- Improvement

### Changelog entry:

Automatically prune named-Tuple subfields whose values in a part are
entirely type-defaults: the writer omits their stream files and records a
narrowed Tuple type in `columns.txt`; reads materialize defaults via
`CAST`. Gated by the new MergeTree-level setting
`enable_tuple_subfield_pruning` (default on).
pull Bot pushed a commit to Mu-L/ClickHouse that referenced this pull request Jun 15, 2026
… parent DB is missing

`DataLakeConfiguration::getCatalog` (introduced by ClickHouse#100334) looked up the parent
database in `DatabaseCatalog` and threw `LOGICAL_ERROR` ("Database X not found")
when `tryGetDatabase` returned `nullptr`. That assertion is wrong: a missing
database here is a transient runtime state, not a logical-invariant violation.

Concretely it can fire during async metadata loading after a server restart
(`AsyncLoader::worker` -> `DatabaseOrdinary::loadTableFromMetadata` ->
`createStorageObjectStorage` -> `getCatalog`) when an unrelated table-load
job in the same database has just thrown (for instance because of
`cannot_allocate_thread_fault_injection_probability`) and the database has
been detached as a result. Stress tests with thread-allocation fault
injection have been hitting this LOGICAL_ERROR sporadically: `STID 2377-2a78`,
3 distinct unrelated PRs over 90 days (PR ClickHouse#98472 on 2026-04-09, PR ClickHouse#100958
on 2026-04-12, PR ClickHouse#102804 on 2026-04-30 - none of which touch this code or
its callers).

Production stack from PR ClickHouse#102804 stress-test (amd_debug):

```
2026.04.30 05:17:39.895829 [ 6955 ] AsyncLoader::worker: Code: 439.
DB::Exception: Cannot schedule a task: fault injected (...): Cannot attach
table `test_1`.`test_max_size_drop` from metadata file ...
2026.04.30 05:17:40.099425 [ 6998 ] {} <Fatal> : Logical error:
'Database test_1 not found'.
[stack: DataLakeConfiguration::getCatalog -> createStorageObjectStorage
        -> registerStorageIceberg -> StorageFactory::get
        -> createTableFromAST -> DatabaseOrdinary::loadTableFromMetadata
        -> AsyncLoader::worker]
```

Fix: combine the two null-checks. `dynamic_pointer_cast` already returns
`nullptr` for a null input, so the function naturally returns `nullptr`
both for "DB not registered" and "DB is not `DataLakeCatalog`" - the same
response either way. This matches the behaviour of `getCatalog` before
ClickHouse#100334, restores backward compatibility for `Iceberg` engine tables
hosted in regular `Atomic`/`Ordinary` databases, and removes the
spurious LOGICAL_ERROR signal from stress-test reports without changing
behaviour for the supported `DataLakeCatalog` -> `Iceberg` path.

Local verification (debug build):
- Compiles, server starts.
- `CREATE TABLE iceberg_t ENGINE = IcebergLocal(...)` inside a regular
  `Atomic` database succeeds, DETACH/ATTACH database cycle succeeds,
  server restart with `async_load_databases=1` reloads the table without
  LOGICAL_ERROR.

Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=102804&sha=40e4eba7d14b8588106464e81b911e8de7a45dc6&name_0=PR&name_1=Stress%20test%20%28amd_debug%29

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread src/Storages/MergeTree/MergeTask.cpp Outdated
continue;
/// Column is materialized by this mutation (present in updated_header),
/// so it is written in full and is no longer skipped.
if (updated_header.has(new_name))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A skipped column still represents real inserted values, so preserving its marker through every mutation that leaves it out of updated_header is unsafe for type-changing ALTER MODIFY COLUMN.

Concrete trace: insert b UInt64 = 0 so b is skipped, then run ALTER TABLE ... MODIFY COLUMN b Nullable(UInt64). splitAndModifyMutationCommands skips the READ_COLUMN command because the source part has no physical b file, so updated_header does not contain b here and the marker is preserved. Later reads synthesize the current type default, NULL, but a normal type mutation of the stored value 0 should produce 0 as Nullable(UInt64).

Please either materialize skipped columns for type-changing READ_COLUMN mutations, or only preserve the marker when the old type-default converted to the new type is provably equal to the new type default. A regression with UInt64 -> Nullable(UInt64) would catch this.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The on-the-fly window is now guarded, but the fully-materialized type change can't be fixed by dropping the marker: both the marker and normal missing-column handling yield the new type's default (e.g. NULL), while the correct value is convert(0, new_type) = 0. Only materializing the skipped column produces it. Root cause: the type-changing READ_COLUMN is filtered out at MutateTask.cpp:559 for physically-absent (skipped) columns, so it never reaches getColumnsForNewDataPart. Left as a design decision (force materialization vs. record skip-time type/value) — see #98472 (comment)

Comment thread src/Storages/MergeTree/MergeTreeSettings.cpp Outdated
Comment thread src/Storages/MergeTree/IMergeTreeReader.cpp Outdated
Comment thread src/Storages/MergeTree/MergeTask.cpp Outdated
Comment thread src/Storages/MergeTree/MutateTask.cpp Outdated
/// unrelated mutation silently dropped the marker.
{
NameSet new_skipped_columns;
for (const auto & name : serialization_infos.getSkippedColumns())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLEAR COLUMN needs to forget a skipped column too. Today a skipped column is absent from part_columns, so the DROP_COLUMN command with clear = true is ignored before it reaches the interpreter, and this loop then preserves the old skipped marker because updated_header does not contain the column.

Concrete trace: insert b = 0 so b is skipped, then ALTER TABLE ... MODIFY COLUMN b UInt64 DEFAULT 999, then ALTER TABLE ... CLEAR COLUMN b. CLEAR COLUMN should remove the stored value and make the row read as the current default 999, but preserving the marker keeps returning the inserted type-default 0.

Please treat DROP_COLUMN with clear = true as affecting skipped columns even when they have no physical files: either drop the marker so the current default is evaluated, or materialize the cleared value explicitly.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On-the-fly CLEAR COLUMN is now handled on read via isColumnDropped (66cd944). The fully-materialized case still needs the clear DROP_COLUMN to reach getColumnsForNewDataPart (it is filtered at MutateTask.cpp:566 for skipped columns), which is the same materialization decision as the type-change blocker — see #98472 (comment)

@alexey-milovidov

Copy link
Copy Markdown
Member

Pushed 61c63f35469..cc3a0df89c6 (merged current master, resolving the MutateTask::getColumnsForNewDataPart conflict by keeping both the skipped-columns marker block and master's materialize_updated_column_serialization_infos block) and addressed part of the AI review:

  • Blocker 2 (IMergeTreeReader.cpp) + on-the-fly part of Blocker 4 (CLEAR COLUMN)fillMissingColumns now skips the marker for any column reported by AlterConversions::isColumnDropped (this includes CLEAR COLUMN, which is a DROP_COLUMN with clear). So while a DROP/CLEAR is still pending (on-the-fly), a re-added b ... DEFAULT 999 reads 999 and a cleared b reads its current default instead of the stored type-default.
  • Blocker 3 (MergeTask.cpp) — the merge no longer carries the marker for source columns reported dropped by the part's AlterConversions, so a DROP + merge + re-ADD ... DEFAULT no longer resurrects a stale marker.
  • Major (MergeTreeSettings.cpp) — documented with_skipped_columns under serialization_info_version and fixed the string_serialization_version wording (with_types or newer).
  • PR metadata — body now says 21 test cases (incl. rename-then-merge) and uses Related: https://github.com/ClickHouse/ClickHouse/issues/4968.

These guards are scoped to the isColumnDropped case only, so they cannot change the existing 21 test outcomes.

Still open — needs a design decision (Blocker 1, and the fully-materialized case of Blocker 4):

The root cause is in MutateTask command splitting: for a column physically absent from the part (a skipped column), the type-changing READ_COLUMN is dropped at MutateTask.cpp:559 (part_columns.has(...)) and CLEAR's DROP_COLUMN at MutateTask.cpp:566, so neither reaches getColumnsForNewDataPart. Once the mutation materializes, the marker survives and the read is wrong.

For Blocker 1 this cannot be fixed by dropping the marker: after skipped b UInt64 = 0 then MODIFY COLUMN b Nullable(UInt64), both the marker path and normal missing-column handling produce the new type's default (NULL), whereas the correct value is convert(0, Nullable(UInt64)) = 0. Only materializing the skipped column at type-change time yields the right value (the eager rewrite reads b as old-type 0 from the source marker and converts it). So the choice is essentially: force materialization of a skipped column on a type-changing/clearing mutation, or record the skip-time type/value in the marker. I left this for you rather than guess, since it touches the mutation interpreter path.

I could not run a full local build (this checkout's master is ahead of my main worktree by 11 contrib submodules plus new libpng/xsimd), so the four changed files were verified with clang -fsyntax-only; CI runs the full build and the stateless suite. I also did not add the on-the-fly-window regressions, because forcing a deterministic pending-mutation state needs controlled materialization — happy to add them once the materialization direction above is decided.

Comment thread src/Storages/MergeTree/MutateTask.cpp Outdated
continue;
/// Column is materialized by this mutation (present in updated_header),
/// so it is written in full and is no longer skipped.
if (updated_header.has(new_name))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A skipped column still represents real inserted values, so MATERIALIZE COLUMN must not treat the missing physical file as "value was never stored". Today splitAndModifyMutationCommands admits MATERIALIZE_COLUMN when the part lacks column files; for a skipped b this makes MutationsInterpreter compute the current DEFAULT expression, updated_header.has(b) is true here, and this block drops the skipped marker. A concrete trace is: insert b UInt64 = 0 with skipping enabled, ALTER MODIFY COLUMN b UInt64 DEFAULT 999, then ALTER TABLE ... MATERIALIZE COLUMN b. Before materialization reads return the inserted 0, but the mutation writes 999, violating the existing MATERIALIZE COLUMN contract that past values for DEFAULT columns are not overwritten. Please either keep skipped columns out of this materialization path or materialize the value read through the skipped marker, and add a regression for this sequence.

Comment thread src/Storages/MergeTree/MergeTask.cpp
Comment thread src/DataTypes/Serializations/SerializationInfo.cpp Outdated
Comment thread src/Storages/MergeTree/MergeTreeDataWriter.cpp
/// ... DEFAULT 999, the newly added `b` must read 999, not the
/// frozen default. Fall through to normal missing-column handling
/// (which evaluates the DEFAULT expression) in that case.
if (alter_conversions->isColumnDropped(name_in_part))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This drop guard checks the old physical name after the rename mapping, but AlterConversions records DROP COLUMN under the current name. For a part with missing_columns = ['b'], pending RENAME COLUMN b TO c, then DROP COLUMN c, name_in_part becomes b, isColumnDropped("b") is false, and the stale marker is trusted. After ADD COLUMN c UInt64 DEFAULT 999, the new c can read as the old inserted type-default 0. MergeTask has the same ordering before it translates the marker to the current name. Please check the dropped state both before and after rename, or normalize missing-marker names through the full rename/drop chain before preserving or trusting them; add a rename -> drop -> add regression.

Comment thread tests/integration/test_skip_empty_columns/test.py Outdated
@clickhouse-gh

clickhouse-gh Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.40% 85.50% +0.10%
Functions 92.70% 92.70% +0.00%
Branches 77.60% 77.70% +0.10%

Changed lines: Changed C/C++ lines covered: 275/321 (85.67%) · Uncovered code

Full report · Diff report

When skip_empty_columns_on_insert is enabled and serialization_info_version
is set to 'with_missing_columns', columns whose values are entirely
type-defaults (zeros, empty strings, NULLs) are omitted from MergeTree
parts at INSERT time. A structured 'missing_columns' marker in
serialization.json records the frozen default for each omitted column,
so a later ALTER MODIFY COLUMN ... DEFAULT does not retroactively change
the inserted values.

Key components:
- IColumn::hasOnlyTypeDefaults() with optimized overrides (memoryIsZero,
  offsets check, null map check, etc.)
- skipEmptyColumnsOnInsert() in MergeTreeDataWriter filters columns using
  IDataType::getDefault() to match the read-path reconstruction
- SerializationInfoByName::MissingColumnInfo struct with TypeDefault and
  Expression (reserved for Phase 2) variants
- Read path (fillMissingColumns) consults the marker and fills type-default
  instead of evaluating the current DEFAULT expression
- Marker propagation through merges, mutations, and ALTER RENAME COLUMN
- Version gate: WITH_MISSING_COLUMNS = 2 in serialization_info_version

Fixes:
- Date32 type-default mismatch (getDefault() vs insertDefaultInto())
- Compact-part rename tracking for missing columns
- Expression markers rejected on read (fail closed until Phase 2)

Tests:
- 83 stateless test labels across 3 files (types, mutations, lifecycle)
- 6 integration tests (replication, mixed-version, restart, partitions)
@amosbird amosbird force-pushed the skip-empty-columns branch from 0aa7d4a to fc4513a Compare July 3, 2026 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants