Fix CoalescingMergeTree crash on INSERT with nested Map columns#106100
Fix CoalescingMergeTree crash on INSERT with nested Map columns#106100groeneai wants to merge 2 commits into
Conversation
|
cc @scanhex12 @CurtizJ — could you review this? Fixes a crash in CoalescingMergeTree on INSERT with nested Map-suffix columns (issue #101937, also fuzzer STID 1003-326e). The Map-suffix exclusion in the aggregate_all_columns=true branch of defineColumns was dropping columns from both bookkeeping lists, leaving a null ColumnPtr in postprocessChunk. |
|
Workflow [PR], commit [870a861] Summary: ❌
AI ReviewSummaryThis PR fixes the Findings
const bool matches_columns_to_sum
= isInNames(column.name, column_names_to_sum)
|| isInNames(Nested::extractTableName(column.name), column_names_to_sum);
...
if (column_names_to_sum.empty() || matches_columns_to_sum)Tests
Final VerdictStatus: ❌ Block Minimum required action: apply the same |
Pre-PR validation gate (fix-up
|
| /// emit the first-row values instead of applying `last_value`. | ||
| const bool matches_columns_to_sum | ||
| = isInNames(column.name, column_names_to_sum) | ||
| || isInNames(Nested::extractTableName(column.name), column_names_to_sum); |
There was a problem hiding this comment.
MergingParams::check validates CoalescingMergeTree(testMap) against Nested::extractTableName for every column, but the same parent-name match is only applied in this array/tuple branch. A dotted scalar subcolumn under the selected parent still reaches the later isInNames(column.name, column_names_to_sum) check, so ENGINE = CoalescingMergeTree(testMap) over `testMap.key` Array(UInt32), `testMap.count` UInt64 coalesces testMap.key but puts testMap.count into column_numbers_not_to_aggregate; after two rows the array comes from the last row while testMap.count is copied from the first row. Please use the same physical-or-Nested::extractTableName predicate for the scalar branch too, and add that mixed explicit-parent regression.
`CoalescingMergeTree` delegates to `SummingSortedAlgorithm` with `aggregate_all_columns = true`. The dedicated branch for that case in `defineColumns` (`src/Processors/Merges/Algorithms/SummingSortedAlgorithm.cpp:267`) wrapped its work in `if (map_name == column.name || !endsWith(map_name, "Map"))` and then fell through to an unconditional `continue`. For any nested column whose parent table name ends with `Map` (e.g. `testMap.key`, `legacy_features_Map.id`), the inner branch was skipped while the `continue` still fired, so the column was added to neither `columns_to_aggregate` nor `column_numbers_not_to_aggregate`. `postprocessChunk` then constructed `res_columns` of size `num_result_columns` and left those slots `nullptr`. `Chunk::setColumns` -> `Chunk::checkNumRowsIsConsistent` dereferenced the null `ColumnPtr`, producing a SIGSEGV in release builds, a `px != 0` assertion in debug builds, and an UBSan `member call on null pointer` report in `arm_asan_ubsan` (STID 1003-326e on the AST fuzzer for PR ClickHouse#104281 and PR ClickHouse#104563). CoalescingMergeTree performs per-column `last_value` coalescing, so nested `Map` arrays must be coalesced element-wise like any other array column. The legacy `nestedName.suffix` routing into `discovered_maps` for `sumMap`-by-key merging is a SummingMergeTree concept that does not apply here, so the Map-suffix exclusion is dropped from the `aggregate_all_columns = true` branch. SummingMergeTree code paths are untouched. Closes ClickHouse#101937 CI report: https://s3.amazonaws.com/clickhouse-test-reports/PRs/104563/c2e0f0fe3b92d48b5a47fa90555f7d4aca54c553/ast_fuzzer_arm_asan_ubsan/ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…Map) `MergeTreeData::MergingParams::check` validates the engine argument of `CoalescingMergeTree(testMap)` against `Nested::extractTableName(name_and_type.name)`, so the stored `columns_to_sum` entry is the nested table name (`testMap`), not the physical subcolumn names (`testMap.key`, `testMap.value`). `SummingSortedAlgorithm::defineColumns` was comparing the requested names against the physical column name only. Both subcolumns therefore fell through to `column_numbers_not_to_aggregate`, and `finishGroup` silently emitted the first-row values instead of applying `last_value` for the documented `columns` argument. Match `column_names_to_sum` against both the physical name and the nested table name, mirroring `MergingParams::check`. Extends the regression test for issue ClickHouse#101937 with a `CoalescingMergeTree(testMap)` case that fails without this change.
e49f85f to
870a861
Compare
|
Rebased onto current master ( Re-verified bidirectionally against current master (
The prior run's Session: cron:clickhouse-worker-slot-11:20260610-214900 |
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 23/26 (88.46%) | Lost baseline coverage: none · Uncovered code |
|
CI summary for HEAD
|

CoalescingMergeTreeSIGSEGVed on INSERT into a table with a nested column whose parent table name ends withMap(e.g.testMap.key,legacy_features_Map.id). In debug builds this surfaced as anintrusive_ptrpx != 0assertion; inarm_asan_ubsanit surfaced as STID 1003-326e (UndefinedBehaviorSanitizer: member call on null pointer of type 'DB::IColumn') insideChunk::checkNumRowsIsConsistent.CoalescingMergeTreerunsSummingSortedAlgorithmwithaggregate_all_columns = true. The dedicated branch for that case indefineColumns(src/Processors/Merges/Algorithms/SummingSortedAlgorithm.cpp) wrapped its bookkeeping inif (map_name == column.name || !endsWith(map_name, \"Map\"))and then fell through to an unconditionalcontinue. For any column whose nested table name ends withMap, the inner branch was skipped while thecontinuestill fired, so the column was added to neithercolumns_to_aggregatenorcolumn_numbers_not_to_aggregate.postprocessChunkthen constructedres_columnsof sizenum_result_columnsand left those slots null, andChunk::setColumns->Chunk::checkNumRowsIsConsistentdereferenced the nullColumnPtr.CoalescingMergeTreeperforms per-columnlast_valuecoalescing, so nestedMaparrays must be coalesced element-wise like any other array column. The legacynestedName.suffixrouting intodiscovered_mapsforsumMap-by-key merging is aSummingMergeTreeconcept that does not apply here, so the Map-suffix exclusion is dropped from theaggregate_all_columns = truebranch.SummingMergeTreecode paths are untouched.Closes #101937.
CI report (STID 1003-326e on
AST fuzzer (arm_asan_ubsan)for PR #104563): https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104563&sha=c2e0f0fe3b92d48b5a47fa90555f7d4aca54c553&name_0=PR&name_1=AST%20fuzzer%20%28arm_asan_ubsan%29Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix server crash on
INSERT/OPTIMIZEinto aCoalescingMergeTreetable that has a nested column whose parent table name ends withMap(e.g.testMap.key Array(...)). The column was silently dropped from the merging metadata indefineColumns, producing a nullColumnPtrinsideChunk::checkNumRowsIsConsistent.Documentation entry for user-facing changes