{{ message }}
Fix uniqExact concurrent-merge crash with GROUPING SETS/ROLLUP/CUBE#108928
Merged
Algunenano merged 8 commits intoJul 2, 2026
Merged
Conversation
The const getTwoLevelSet mutated the shared source set via doDeepCopyIfNeeded; concurrent merges of the same state corrupted it. Do the copy-on-write only on the destination's mutable path. Closes ClickHouse#108912
Contributor
Algunenano
commented
Jun 30, 2026
| -- Each (a, b, c) group accumulates > 100000 distinct values, so the uniqExact sets become two-level. | ||
| INSERT INTO st_04489 SELECT number % 4, number % 3, number % 2, uniqExactState(number) FROM numbers_mt(4000000) GROUP BY 1, 2, 3; | ||
|
|
||
| -- Repeat a few times: the race is timing dependent but used to crash almost immediately. |
Member
Author
There was a problem hiding this comment.
Note that even though it was timing dependent in release mode, TSAN showed the race every time running this test. That's why I didn't add a race condition test (well, I added it and then removed it because it wasn't needed)
Contributor
|
We have two-level enable/disable/trigger settings for group by, but not for uniqExactHash optimization, which looks a bit lacking. (it would allow to have W/A for when this more "complex" optimization is getting triggered) |
nickitat
approved these changes
Jun 30, 2026
1 task
Correctness is checked in other tests, we only want to fix the race conditions / crashes
The 4M-row insert made the test exceed the 180s per-run limit in the amd_debug flaky check. 1.5M rows keeps each of the 12 groups at 125000 distinct values, still above the 100000 uniqExact auto-two-level threshold, so the concurrent two-level merge path is still exercised. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered: 5/5 (100.00%) · Uncovered code |
Merged
via the queue into
ClickHouse:master
with commit Jul 2, 2026
34e2497
347 of 348 checks passed
This was referenced Jul 2, 2026
clickhouse-gh Bot
added a commit
that referenced
this pull request
Jul 2, 2026
Backport #108928 to 26.6: Fix uniqExact concurrent-merge crash with GROUPING SETS/ROLLUP/CUBE
Algunenano
added a commit
that referenced
this pull request
Jul 2, 2026
Backport #108928 to 26.5: Fix uniqExact concurrent-merge crash with GROUPING SETS/ROLLUP/CUBE
Algunenano
added a commit
that referenced
this pull request
Jul 2, 2026
Backport #108928 to 26.4: Fix uniqExact concurrent-merge crash with GROUPING SETS/ROLLUP/CUBE
Algunenano
added a commit
that referenced
this pull request
Jul 2, 2026
Backport #108928 to 26.3: Fix uniqExact concurrent-merge crash with GROUPING SETS/ROLLUP/CUBE
Algunenano
added a commit
that referenced
this pull request
Jul 2, 2026
Backport #108928 to 25.8: Fix uniqExact concurrent-merge crash with GROUPING SETS/ROLLUP/CUBE
This was referenced Jul 3, 2026
groeneai
added a commit
to groeneai/ClickHouse
that referenced
this pull request
Jul 4, 2026
STID 1464-31ed. The two-level pointee of a UniqExactSet is shared across several UniqExactSet objects: the merge fast path (two_level_set = other.getTwoLevelSet()) aliases one pointee into an empty destination, and parallelizeMergeMulti pre-fetches raw TwoLevelSet* to every state and reads those buckets across threads. The parallel merge of ROLLUP/CUBE/GROUPING SETS states thus reads one pointee on one thread while another thread merges into a state that owns the same pointee. doDeepCopyIfNeeded forked (copied the pointee before mutating) only when shared_ptr::use_count() > 1. use_count() is not a synchronization primitive: a sibling holder being destroyed drops the count 2 -> 1 with no happens-before against the pointee's memory, so one state can observe use_count == 1 and write the pointee's buckets in place while another thread is still reading the same pointee. That is a data race (TSan reports it; issue ClickHouse#108912 was a segfault from the same in-place write). PR ClickHouse#108928 moved the copy-on-write from the read-intent getTwoLevelSet() to the destination path but kept the racy use_count() guard, so the race persisted. Attach a monotonic is_shared atomic flag to the pointee (SharedTwoLevelSet wrapper). getTwoLevelSet() sets it (release) before the pointee escapes; doDeepCopyIfNeeded() forks when it is set (acquire) instead of consulting use_count(). A pointee that another thread can reach is therefore never mutated in place; a freshly forked copy is solely owned and unshared, so the common single-consumer case keeps mutating in place (preserving ClickHouse#108928's copy elision). Reproduced under TSan with a unit test that merges into and reads one shared two-level pointee from two threads: races on the pointee buckets before the fix (WRITE emplaceNonZeroImpl HashTable.h <- UniqExactSet::merge; READ HashTable::size), TSan-clean after. Existing UniqExactParallelMerge gtests and 04489_uniq_exact_grouping_sets_concurrent_merge stay green, and parallel vs single-thread GROUPING SETS results are identical. Related: ClickHouse#108912 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
groeneai
added a commit
to groeneai/ClickHouse
that referenced
this pull request
Jul 4, 2026
STID 1464-31ed. The two-level pointee of a UniqExactSet is shared across several UniqExactSet objects: the merge fast path (two_level_set = other.getTwoLevelSet()) aliases one pointee into an empty destination, and parallelizeMergeMulti pre-fetches raw TwoLevelSet* to every state and reads those buckets across threads. The parallel merge of ROLLUP/CUBE/GROUPING SETS states thus reads one pointee on one thread while another thread merges into a state that owns the same pointee. doDeepCopyIfNeeded forked (copied the pointee before mutating) only when shared_ptr::use_count() > 1. use_count() is not a synchronization primitive: a sibling holder being destroyed drops the count 2 -> 1 with no happens-before against the pointee's memory, so one state can observe use_count == 1 and write the pointee's buckets in place while another thread is still reading the same pointee. That is a data race (TSan reports it; issue ClickHouse#108912 was a segfault from the same in-place write). PR ClickHouse#108928 moved the copy-on-write from the read-intent getTwoLevelSet() to the destination path but kept the racy use_count() guard, so the race persisted. Attach a monotonic is_shared atomic flag to the pointee (SharedTwoLevelSet wrapper). getTwoLevelSet() sets it (release) before the pointee escapes; doDeepCopyIfNeeded() forks when it is set (acquire) instead of consulting use_count(). A pointee that another thread can reach is therefore never mutated in place; a freshly forked copy is solely owned and unshared, so the common single-consumer case keeps mutating in place (preserving ClickHouse#108928's copy elision). Regression test (gtest UniqExactSharedPointeeCoW): a direct, deterministic test of the exact lifetime transition of STID 1464-31ed. A pointee P escapes to a second UniqExactSet via the merge fast path (use_count 2), the sibling holder is dropped (use_count 2 -> 1), then a merge mutates the surviving state. The test asserts the pointee address changes, i.e. the mutation forks P instead of writing the escaped instance in place. It fails before this change (in-place mutation, same address) and passes after it (fork, new address). A concurrent reader cannot form a valid fails-before/passes-after gate: any holder that keeps P alive for the reader also bumps use_count so the buggy guard forks too, while dropping it to force use_count == 1 makes the fix free P under the reader. So the fork-vs-in-place decision, the necessary and sufficient cause of the race, is asserted directly. Existing UniqExactParallelMerge gtests and 04489_uniq_exact_grouping_sets_concurrent_merge stay green, and parallel vs single-thread GROUPING SETS results are identical. Related: ClickHouse#108912
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Closes: #108912
uniqExact'sgetTwoLevelSet()isconstbut performed a copy-on-write of the source set (doDeepCopyIfNeeded) on a path that is read concurrently. WithGROUPING SETS/ROLLUP/CUBEthe same state is merged into several destinations from multiple threads, so concurrent calls mutated the shared source'stwo_level_setpointer without synchronization, corrupting it and crashing with a segmentation fault.The fix moves the copy-on-write to the destination's mutable accessor (
asTwoLevelChecked), which runs on a thread-local accumulator and forks before any in-place mutation.getTwoLevelSet()becomes genuinely read-only, so an already-two-level source is now shared across threads viashared_ptrinstead of being copied. This removes copies rather than adding them: the old read path deep-copied the source the first time it was shared, whereas the new code shares it; the necessary copy is deferred to the destination side and bounded by the number of mutated destinations, not the thread count.mutableis dropped fromtwo_level_setso the compiler enforces that aconstset cannot be modified.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix a segmentation fault when merging
uniqExactaggregate states withGROUPING SETS,ROLLUPorCUBEandmax_threads > 1.Version info
26.7.1.42326.6.2.22,26.5.6.11,26.4.5.96,26.3.17.18,25.8.26.11