Fix CoalescingMergeTree crash on INSERT with nested Map columns by groeneai · Pull Request #106100 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix CoalescingMergeTree crash on INSERT with nested Map columns#106100

Closed
groeneai wants to merge 2 commits into
ClickHouse:masterfrom
groeneai:groeneai/fix-101937-coalescing-nested-map
Closed

Fix CoalescingMergeTree crash on INSERT with nested Map columns#106100
groeneai wants to merge 2 commits into
ClickHouse:masterfrom
groeneai:groeneai/fix-101937-coalescing-nested-map

Conversation

@groeneai

Copy link
Copy Markdown
Contributor

CoalescingMergeTree SIGSEGVed on INSERT into a table with a nested column whose parent table name ends with Map (e.g. testMap.key, legacy_features_Map.id). In debug builds this surfaced as an intrusive_ptr px != 0 assertion; in arm_asan_ubsan it surfaced as STID 1003-326e (UndefinedBehaviorSanitizer: member call on null pointer of type 'DB::IColumn') inside Chunk::checkNumRowsIsConsistent.

CoalescingMergeTree runs SummingSortedAlgorithm with aggregate_all_columns = true. The dedicated branch for that case in defineColumns (src/Processors/Merges/Algorithms/SummingSortedAlgorithm.cpp) wrapped its bookkeeping in if (map_name == column.name || !endsWith(map_name, \"Map\")) and then fell through to an unconditional continue. For any column whose nested table name ends with Map, the inner branch was skipped while the continue still fired, so the column was added to neither columns_to_aggregate nor column_numbers_not_to_aggregate. postprocessChunk then constructed res_columns of size num_result_columns and left those slots null, and Chunk::setColumns -> Chunk::checkNumRowsIsConsistent dereferenced the null ColumnPtr.

CoalescingMergeTree performs per-column last_value coalescing, so nested Map arrays must be coalesced element-wise like any other array column. The legacy nestedName.suffix routing into discovered_maps for sumMap-by-key merging is a SummingMergeTree concept that does not apply here, so the Map-suffix exclusion is dropped from the aggregate_all_columns = true branch. SummingMergeTree code paths are untouched.

Closes #101937.

CI report (STID 1003-326e on AST fuzzer (arm_asan_ubsan) for PR #104563): https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104563&sha=c2e0f0fe3b92d48b5a47fa90555f7d4aca54c553&name_0=PR&name_1=AST%20fuzzer%20%28arm_asan_ubsan%29

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix server crash on INSERT / OPTIMIZE into a CoalescingMergeTree table that has a nested column whose parent table name ends with Map (e.g. testMap.key Array(...)). The column was silently dropped from the merging metadata in defineColumns, producing a null ColumnPtr inside Chunk::checkNumRowsIsConsistent.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

cc @scanhex12 @CurtizJ — could you review this? Fixes a crash in CoalescingMergeTree on INSERT with nested Map-suffix columns (issue #101937, also fuzzer STID 1003-326e). The Map-suffix exclusion in the aggregate_all_columns=true branch of defineColumns was dropping columns from both bookkeeping lists, leaving a null ColumnPtr in postprocessChunk.

@scanhex12 scanhex12 self-assigned this May 29, 2026
@scanhex12 scanhex12 added the can be tested Allows running workflows for external contributors label May 29, 2026
@clickhouse-gh

clickhouse-gh Bot commented May 29, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [870a861]

Summary:

job_name test_name status info comment
Stateless tests (arm_asan_ubsan, azure, parallel) FAIL
03836_distributed_index_analysis_skip_index_expression FAIL cidb
Stateless tests (arm_asan_ubsan, azure, sequential) FAIL
02535_max_parallel_replicas_custom_key_rmt ERROR cidb
Server died FAIL cidb
AddressSanitizer (STID: 1288-3bd5) FAIL cidb

AI Review

Summary

This PR fixes the CoalescingMergeTree null ColumnPtr failure for nested Map array columns and now correctly handles the explicit CoalescingMergeTree(testMap) case for array subcolumns. One correctness issue remains in the same contract: the scalar branch of defineColumns still does not match column_names_to_sum by Nested::extractTableName, so explicit parent selection can silently leave dotted scalar subcolumns uncoalesced.

Findings
  • ❌ Blocker: [src/Processors/Merges/Algorithms/SummingSortedAlgorithm.cpp:377] MergingParams::check accepts CoalescingMergeTree(testMap) by comparing the requested name with Nested::extractTableName(name_and_type.name), and the new array/tuple branch mirrors that at lines 284-286. The scalar branch still checks only column.name, so a table like `testMap.key` Array(UInt32), `testMap.count` UInt64 with ENGINE = CoalescingMergeTree(testMap) will coalesce testMap.key but route testMap.count to column_numbers_not_to_aggregate. After merging two rows, the selected nested parent can produce a mixed result: array values from the last row and scalar values copied from the first row.
    Suggested fix: extract the physical-or-nested-parent predicate and use it in both branches, then add the mixed explicit-parent regression.
const bool matches_columns_to_sum
    = isInNames(column.name, column_names_to_sum)
    || isInNames(Nested::extractTableName(column.name), column_names_to_sum);

...

if (column_names_to_sum.empty() || matches_columns_to_sum)
Tests
  • ⚠️ Add a regression for ENGINE = CoalescingMergeTree(testMap) with both an array subcolumn and a scalar dotted subcolumn under testMap, then verify FINAL and OPTIMIZE TABLE ... FINAL return the coalesced scalar value from the latest row.
Final Verdict

Status: ❌ Block

Minimum required action: apply the same Nested::extractTableName matching to scalar columns selected through an explicit nested parent name, and cover that mixed shape in the stateless test.

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label May 29, 2026
Comment thread src/Processors/Merges/Algorithms/SummingSortedAlgorithm.cpp Outdated
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate (fix-up e49f85f05f1f)

a) Deterministic repro? Yes. CoalescingMergeTree(testMap) over testMap.key Array(UInt32)/testMap.value Array(UInt32) plus two inserts with the same ORDER BY key reproduces the stale-data bug 100% on the master-side binary (with the first commit b6ada8e3 but without e49f85f05f1f).

b) Root cause explained? MergeTreeData::MergingParams::check (src/Storages/MergeTree/MergeTreeData.cpp:1612-1623) validates columns_to_sum against Nested::extractTableName(name_and_type.name), so for CoalescingMergeTree(testMap) the stored entry is testMap (the nested table name). SummingSortedAlgorithm::defineColumns was comparing requested names against the physical names testMap.key/testMap.value only; both subcolumns therefore routed to column_numbers_not_to_aggregate and finishGroup emitted the first-row values instead of last_value.

c) Fix matches root cause? Match column_names_to_sum against both the physical name and Nested::extractTableName(column.name) in the same branch, mirroring the validation in MergingParams::check. No new helpers, no behavior change for the bare CoalescingMergeTree constructor.

d) Test intent preserved? New tests added? Existing three regression cases in 04299_101937_coalescing_merge_tree_nested_map still exercise the original issue #101937 crash path under bare CoalescingMergeTree. Added a fourth case for CoalescingMergeTree(testMap) which fails deterministically on master before this commit and passes after.

e) Both directions demonstrated?

  • Without e49f85f05f1f: explicit - FINAL 1 [1] [10] 100 (stale first-row).
  • With e49f85f05f1f: explicit - FINAL 1 [2] [20] 100 (last_value, matches reference).
    other_col keeps the first-row value 100 because it is not listed in the engine argument, matching the existing CoalescingMergeTree(version) semantics in test 03409.

f) Fix is general? The companion path at line 367 (aggregate_all_columns=false, i.e. SummingMergeTree) only sees non-tuple non-array columns; array subcolumns reach the discovered_maps path which does not consult column_names_to_sum, so the analogous bug does not exist there. MergeTask.cpp already mirrors the Nested::extractTableName convention. No symmetric implementations missed.

Regression suite (against the rebuilt binary): 02346_text_index_coalescingmergetree, 03409_coalescing_merge_tree, 03594_coalescing_merge_tree_segfault, 03652_coalescing_merge_tree_fix_empty_tuple, 00146_summing_merge_tree_nested_map, 00148_summing_merge_tree_nested_map_multiple_values, 00327_summing_composite_nested, 04043_summingmergetree_nullable_bool, 03566_one_row_summing_merge_tree, 00084_summing_merge_tree, 04299_101937_coalescing_merge_tree_nested_map all PASS.

session: cron:clickhouse-ci-task-worker:20260529-124500

/// emit the first-row values instead of applying `last_value`.
const bool matches_columns_to_sum
= isInNames(column.name, column_names_to_sum)
|| isInNames(Nested::extractTableName(column.name), column_names_to_sum);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MergingParams::check validates CoalescingMergeTree(testMap) against Nested::extractTableName for every column, but the same parent-name match is only applied in this array/tuple branch. A dotted scalar subcolumn under the selected parent still reaches the later isInNames(column.name, column_names_to_sum) check, so ENGINE = CoalescingMergeTree(testMap) over `testMap.key` Array(UInt32), `testMap.count` UInt64 coalesces testMap.key but puts testMap.count into column_numbers_not_to_aggregate; after two rows the array comes from the last row while testMap.count is copied from the first row. Please use the same physical-or-Nested::extractTableName predicate for the scalar branch too, and add that mixed explicit-parent regression.

groeneai and others added 2 commits June 10, 2026 22:03
`CoalescingMergeTree` delegates to `SummingSortedAlgorithm` with
`aggregate_all_columns = true`. The dedicated branch for that case in
`defineColumns` (`src/Processors/Merges/Algorithms/SummingSortedAlgorithm.cpp:267`)
wrapped its work in `if (map_name == column.name || !endsWith(map_name, "Map"))`
and then fell through to an unconditional `continue`. For any nested column
whose parent table name ends with `Map` (e.g. `testMap.key`,
`legacy_features_Map.id`), the inner branch was skipped while the `continue`
still fired, so the column was added to neither `columns_to_aggregate` nor
`column_numbers_not_to_aggregate`.

`postprocessChunk` then constructed `res_columns` of size `num_result_columns`
and left those slots `nullptr`. `Chunk::setColumns` -> `Chunk::checkNumRowsIsConsistent`
dereferenced the null `ColumnPtr`, producing a SIGSEGV in release builds, a
`px != 0` assertion in debug builds, and an UBSan `member call on null pointer`
report in `arm_asan_ubsan` (STID 1003-326e on the AST fuzzer for PR ClickHouse#104281 and
PR ClickHouse#104563).

CoalescingMergeTree performs per-column `last_value` coalescing, so nested
`Map` arrays must be coalesced element-wise like any other array column. The
legacy `nestedName.suffix` routing into `discovered_maps` for `sumMap`-by-key
merging is a SummingMergeTree concept that does not apply here, so the
Map-suffix exclusion is dropped from the `aggregate_all_columns = true` branch.
SummingMergeTree code paths are untouched.

Closes ClickHouse#101937
CI report: https://s3.amazonaws.com/clickhouse-test-reports/PRs/104563/c2e0f0fe3b92d48b5a47fa90555f7d4aca54c553/ast_fuzzer_arm_asan_ubsan/

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…Map)

`MergeTreeData::MergingParams::check` validates the engine argument of
`CoalescingMergeTree(testMap)` against `Nested::extractTableName(name_and_type.name)`,
so the stored `columns_to_sum` entry is the nested table name (`testMap`),
not the physical subcolumn names (`testMap.key`, `testMap.value`).

`SummingSortedAlgorithm::defineColumns` was comparing the requested names
against the physical column name only. Both subcolumns therefore fell through
to `column_numbers_not_to_aggregate`, and `finishGroup` silently emitted the
first-row values instead of applying `last_value` for the documented `columns`
argument.

Match `column_names_to_sum` against both the physical name and the nested
table name, mirroring `MergingParams::check`.

Extends the regression test for issue ClickHouse#101937 with a `CoalescingMergeTree(testMap)`
case that fails without this change.
@groeneai groeneai force-pushed the groeneai/fix-101937-coalescing-nested-map branch from e49f85f to 870a861 Compare June 10, 2026 22:31
@groeneai

Copy link
Copy Markdown
Contributor Author

Rebased onto current master (a0996327ac85) to refresh the 12-day-old CI run. The branch was 3214 commits behind; the only master change to SummingSortedAlgorithm.cpp in that window was cosmetic (clang-tidy member-init), so the rebase was clean and the fix is unchanged.

Re-verified bidirectionally against current master (26.6.1.1, debug build):

  • Without fix: INSERT into a CoalescingMergeTree with a nested Map column aborts with Assertion 'px != 0' failed (null ColumnPtr deref in Chunk::checkNumRowsIsConsistent).
  • With fix: same workload returns last_value correctly, exit 0.
  • Regression test 04299_101937_coalescing_merge_tree_nested_map matches its reference.
  • No regressions in siblings 03594_coalescing_merge_tree_segfault, 03652_coalescing_merge_tree_fix_empty_tuple, 00146_summing_merge_tree_nested_map, 00327_summing_composite_nested.

The prior run's Bugfix validation (functional tests) ended in ERROR, which is the expected outcome for a crash-reproduction regression test: the test crashes the server on master HEAD, so the runner cannot finish cleanly. The inverter preserves ERROR as inconclusive (per #105789) rather than reporting "Failed to reproduce the bug".

Session: cron:clickhouse-worker-slot-11:20260610-214900

@clickhouse-gh

clickhouse-gh Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.70% 84.70% +0.00%
Functions 92.30% 92.30% +0.00%
Branches 77.30% 77.30% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 23/26 (88.46%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@groeneai

Copy link
Copy Markdown
Contributor Author

CI summary for HEAD 870a861142110e — all red marks are non-PR-caused, the PR's own checks are green (scanhex12 already approved):

CH Inc sync red is the internal CH mirror sync, expected on fork PRs.

@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai groeneai closed this Jun 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CoalescingMergeTree crashes (SIGSEGV) on INSERT with nested Map columns due to missing column in defineColumns

2 participants