{{ message }}
Cherry pick #103536 to 26.3: Fix heap-use-after-free in executeAggregateMultiply self-merge#105215
Merged
robot-ch-test-poll4 merged 6 commits intoMay 18, 2026
Merged
Conversation
The exponentiation-by-squaring loop in `FunctionBinaryArithmetic::
executeAggregateMultiply` squared the running aggregate state with
function->merge(vec_from[i], vec_from[i], arena.get());
passing the same state as both source and destination. For aggregate
functions whose `merge` implementation appends the source's internal
storage into the destination (`QuantileExact`, `groupArray`,
`groupUniqArray`, `windowFunnel`, `sequenceMatch`, etc.), the
destination reallocates its `PODArray` mid-merge and the source
iterators (referencing the same, now-freed buffer) are read by the
trailing `memcpy` in `PODArray::insert_assume_reserved` -> heap-use-
after-free. The hazard was already documented at
`src/Common/PODArray.h:482`: "Do not insert into the array a piece of
itself. Because with the resize, the iterators on themselves can be
invalidated."
Fix: copy `vec_from` into an independent column before each squaring
step and merge the copy into `vec_from`, so source and destination no
longer alias.
Reproduction (Address Sanitizer): caught by master fuzzer
(`Stress test (experimental, serverfuzz, arm_asan_ubsan, s3)`,
STID 0988-40af, sha 20f4b50):
SELECT finalizeAggregation(state * 4)
FROM (SELECT quantilesExactState(0.5)(number) AS state FROM numbers(50));
This deterministically triggers the UAF on the previous code and
passes with the fix.
CI report:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=20f4b50bb190f87122f53eee1f99887ec9d8d771&name_0=MasterCI&name_1=Stress%20test%20%28experimental%2C%20serverfuzz%2C%20arm_asan_ubsan%2C%20s3%29
This PR's `executeAggregateMultiply` rework copies the aggregate state into an independent column before each squaring step, so source and destination no longer alias on entry to `merge`. For aggregate functions whose `merge` reallocates and then reads the source (e.g. `quantilesExact`, `groupArray`, `groupUniqArray`, `windowFunnel`, `sequenceMatch`), this fixes the heap-use-after-free. `ReservoirSampler::merge` (used by `medianState` / `quantileState`) had been guarded since #79031 with an early-return when `this == &b`. That guard is unreachable from `executeAggregateMultiply` now: the squaring step never feeds the same state as both arguments. Squaring becomes a real merge instead of a no-op, so `state * N` for a ReservoirSampler- backed aggregate now actually accumulates `N` copies of the input rather than collapsing to a single copy. `03406_reservoir_sample_self_merging` was added in #79031 with reference values that match the no-op semantics: 10 * medianState(numbers(10)) -> 4.5 (100 inserts <= 8192, exact) 10 * medianState(numbers(1000)) -> 499.5 (was no-op; same as state) 10 * medianState(numbers(100000)) -> 50501.5 (already over reservoir) Under the corrected semantics the middle case crosses the 8192-sample reservoir cap (10 x 1000 = 10000 inserts) and the deterministic seed yields `495` for the median. The first and third cases are unchanged: the small one stays under the cap, the large one was already saturated. The test still verifies the absence of UAF / `LOGICAL_ERROR` on `state * N` for ReservoirSampler-backed aggregates - it's a regression test for #78299 - and only the boundary value moves. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=103543&sha=f7dd195af45870446b0c48654753c337438e90b2&name_0=PR
Master AST fuzzer hit `Segmentation fault (STID: 0988-2212)` on `AST fuzzer (amd_msan)` for `groupArrayMovingSum` on PR #100394 and on `AST fuzzer (arm_asan_ubsan)` / `amd_debug` for PRs #101033, #101062, #104531 — same bug class as STID 0988-40af / 0988-3351. The fuzzer mutated a query into `groupArrayMovingSumState(...) * N`, which goes through `FunctionBinaryArithmetic::executeAggregateMultiply`. The exponentiation-by-squaring loop called `function->merge(vec_from[i], vec_from[i], arena.get())`, and `MovingImpl::merge` then did `cur_elems.value.insert(rhs_elems.value.begin(), rhs_elems.value.end(), arena)` where `rhs_elems` aliased `cur_elems.value`, so the source iterators referenced the buffer that `insertPrepare` was about to reallocate. The fix in this PR (copy `vec_from` into an independent column before each squaring step) eliminates the self-alias at the call site, so `MovingImpl::merge` is never invoked with `place == rhs` from `executeAggregateMultiply` anymore. The new test exercises every `MovingImpl` instantiation reachable from `groupArrayMovingSum` / `groupArrayMovingAvg`: - integer accumulator, - decimal accumulator (the variant flagged on PR #101062), - windowed parameterisation (`LimitNumElements::value = true`), - the `agg_state_is_const` path through `arrayJoin([state, state])`. Verified locally on Debug build that pre-fix the squaring path aborts with the PODArray `assertNotIntersects` assertion (Server `<Fatal>`), while `state * 1` (no squaring branch) returns `50` — matching the single-state baseline used to derive the reference values. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100394&sha=7309f23dc7ccee494b41416c8fff18c02d0697a7&name_0=PR&name_1=AST%20fuzzer%20%28amd_msan%29
`executeAggregateMultiply` previously constructed `column_temp` once outside the squaring loop and reset it between iterations with `popBack(column_temp->size())`. `ColumnAggregateFunction::popBack` destroys the popped aggregate states but does not release the column's arena memory (the `Arenas` list is monotonic and is only freed when the column itself is destroyed). Each squaring iteration's `insertFrom` therefore allocated new state memory in the same arena, causing the temporary footprint to grow with the multiplier instead of staying bounded by one copy of the input column. Construct `column_temp` inside the squaring branch so the column (and its arena) is destroyed at the end of each iteration. Peak temporary footprint is now bounded by `size * state_size`. Caught by the clickhouse-gh[bot] inline review on #103536 .
…merge-uaf-0988 Fix heap-use-after-free in `executeAggregateMultiply` self-merge
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Original pull-request #103536
Do not merge this PR manually
This pull-request is a first step of an automated backporting.
It contains changes similar to calling
git cherry-picklocally.If you intend to continue backporting the changes, then resolve all conflicts if any.
Otherwise, if you do not want to backport them, then just close this pull-request.
The check results does not matter at this step - you can safely ignore them.
Troubleshooting
If the conflicts were resolved in a wrong way
If this cherry-pick PR is completely screwed by a wrong conflicts resolution, and you want to recreate it:
pr-cherrypicklabel from the PRYou also need to check the Original pull-request for
pr-backports-createdlabel, and delete if it's presented thereThe PR source
The PR is created in the CI job