Cherry pick #103536 to 26.2: Fix heap-use-after-free in `executeAggregateMultiply` self-merge by robot-ch-test-poll4 · Pull Request #105213 · ClickHouse/ClickHouse · GitHub
Skip to content

Cherry pick #103536 to 26.2: Fix heap-use-after-free in executeAggregateMultiply self-merge#105213

Merged
robot-ch-test-poll4 merged 6 commits into
backport/26.2/103536from
cherrypick/26.2/103536
May 18, 2026
Merged

Cherry pick #103536 to 26.2: Fix heap-use-after-free in executeAggregateMultiply self-merge#105213
robot-ch-test-poll4 merged 6 commits into
backport/26.2/103536from
cherrypick/26.2/103536

Conversation

@robot-ch-test-poll4

Copy link
Copy Markdown
Contributor

Original pull-request #103536

Do not merge this PR manually

This pull-request is a first step of an automated backporting.
It contains changes similar to calling git cherry-pick locally.
If you intend to continue backporting the changes, then resolve all conflicts if any.
Otherwise, if you do not want to backport them, then just close this pull-request.

The check results does not matter at this step - you can safely ignore them.

Troubleshooting

If the conflicts were resolved in a wrong way

If this cherry-pick PR is completely screwed by a wrong conflicts resolution, and you want to recreate it:

  • delete the pr-cherrypick label from the PR
  • delete this branch from the repository

You also need to check the Original pull-request for pr-backports-created label, and delete if it's presented there

The PR source

The PR is created in the CI job

groeneai and others added 6 commits April 25, 2026 15:58
The exponentiation-by-squaring loop in `FunctionBinaryArithmetic::
executeAggregateMultiply` squared the running aggregate state with

    function->merge(vec_from[i], vec_from[i], arena.get());

passing the same state as both source and destination. For aggregate
functions whose `merge` implementation appends the source's internal
storage into the destination (`QuantileExact`, `groupArray`,
`groupUniqArray`, `windowFunnel`, `sequenceMatch`, etc.), the
destination reallocates its `PODArray` mid-merge and the source
iterators (referencing the same, now-freed buffer) are read by the
trailing `memcpy` in `PODArray::insert_assume_reserved` -> heap-use-
after-free. The hazard was already documented at
`src/Common/PODArray.h:482`: "Do not insert into the array a piece of
itself. Because with the resize, the iterators on themselves can be
invalidated."

Fix: copy `vec_from` into an independent column before each squaring
step and merge the copy into `vec_from`, so source and destination no
longer alias.

Reproduction (Address Sanitizer): caught by master fuzzer
(`Stress test (experimental, serverfuzz, arm_asan_ubsan, s3)`,
STID 0988-40af, sha 20f4b50):

    SELECT finalizeAggregation(state * 4)
    FROM (SELECT quantilesExactState(0.5)(number) AS state FROM numbers(50));

This deterministically triggers the UAF on the previous code and
passes with the fix.

CI report:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=20f4b50bb190f87122f53eee1f99887ec9d8d771&name_0=MasterCI&name_1=Stress%20test%20%28experimental%2C%20serverfuzz%2C%20arm_asan_ubsan%2C%20s3%29
This PR's `executeAggregateMultiply` rework copies the aggregate state
into an independent column before each squaring step, so source and
destination no longer alias on entry to `merge`. For aggregate functions
whose `merge` reallocates and then reads the source (e.g. `quantilesExact`,
`groupArray`, `groupUniqArray`, `windowFunnel`, `sequenceMatch`), this
fixes the heap-use-after-free.

`ReservoirSampler::merge` (used by `medianState` / `quantileState`) had
been guarded since #79031 with an early-return when `this == &b`. That
guard is unreachable from `executeAggregateMultiply` now: the squaring
step never feeds the same state as both arguments. Squaring becomes a
real merge instead of a no-op, so `state * N` for a ReservoirSampler-
backed aggregate now actually accumulates `N` copies of the input rather
than collapsing to a single copy.

`03406_reservoir_sample_self_merging` was added in #79031 with reference
values that match the no-op semantics:

  10 * medianState(numbers(10))      -> 4.5      (100 inserts <= 8192, exact)
  10 * medianState(numbers(1000))    -> 499.5    (was no-op; same as state)
  10 * medianState(numbers(100000))  -> 50501.5  (already over reservoir)

Under the corrected semantics the middle case crosses the 8192-sample
reservoir cap (10 x 1000 = 10000 inserts) and the deterministic seed
yields `495` for the median. The first and third cases are unchanged:
the small one stays under the cap, the large one was already saturated.
The test still verifies the absence of UAF / `LOGICAL_ERROR` on
`state * N` for ReservoirSampler-backed aggregates - it's a regression
test for #78299 - and only the boundary value moves.

CI report:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=103543&sha=f7dd195af45870446b0c48654753c337438e90b2&name_0=PR
Master AST fuzzer hit `Segmentation fault (STID: 0988-2212)` on
`AST fuzzer (amd_msan)` for `groupArrayMovingSum` on PR #100394
and on `AST fuzzer (arm_asan_ubsan)` / `amd_debug` for PRs #101033,
#101062, #104531 — same bug class as STID 0988-40af / 0988-3351.
The fuzzer mutated a query into `groupArrayMovingSumState(...) * N`,
which goes through `FunctionBinaryArithmetic::executeAggregateMultiply`.
The exponentiation-by-squaring loop called
`function->merge(vec_from[i], vec_from[i], arena.get())`, and
`MovingImpl::merge` then did
`cur_elems.value.insert(rhs_elems.value.begin(), rhs_elems.value.end(), arena)`
where `rhs_elems` aliased `cur_elems.value`, so the source iterators
referenced the buffer that `insertPrepare` was about to reallocate.

The fix in this PR (copy `vec_from` into an independent column before
each squaring step) eliminates the self-alias at the call site, so
`MovingImpl::merge` is never invoked with `place == rhs` from
`executeAggregateMultiply` anymore.

The new test exercises every `MovingImpl` instantiation reachable from
`groupArrayMovingSum` / `groupArrayMovingAvg`:
  - integer accumulator,
  - decimal accumulator (the variant flagged on PR #101062),
  - windowed parameterisation (`LimitNumElements::value = true`),
  - the `agg_state_is_const` path through `arrayJoin([state, state])`.

Verified locally on Debug build that pre-fix the squaring path aborts
with the PODArray `assertNotIntersects` assertion (Server `<Fatal>`),
while `state * 1` (no squaring branch) returns `50` — matching the
single-state baseline used to derive the reference values.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100394&sha=7309f23dc7ccee494b41416c8fff18c02d0697a7&name_0=PR&name_1=AST%20fuzzer%20%28amd_msan%29
`executeAggregateMultiply` previously constructed `column_temp` once
outside the squaring loop and reset it between iterations with
`popBack(column_temp->size())`. `ColumnAggregateFunction::popBack`
destroys the popped aggregate states but does not release the column's
arena memory (the `Arenas` list is monotonic and is only freed when
the column itself is destroyed). Each squaring iteration's
`insertFrom` therefore allocated new state memory in the same arena,
causing the temporary footprint to grow with the multiplier instead of
staying bounded by one copy of the input column.

Construct `column_temp` inside the squaring branch so the column
(and its arena) is destroyed at the end of each iteration. Peak
temporary footprint is now bounded by `size * state_size`.

Caught by the clickhouse-gh[bot] inline review on
#103536 .
…merge-uaf-0988

Fix heap-use-after-free in `executeAggregateMultiply` self-merge
@robot-ch-test-poll4 robot-ch-test-poll4 added pr-cherrypick Cherry-pick of merge-commit before backporting. Do not use manually - automated use only! do not test disable testing on pull request pr-bugfix Pull request with bugfix, not backported by default labels May 18, 2026
@robot-ch-test-poll4 robot-ch-test-poll4 merged commit 1459b88 into backport/26.2/103536 May 18, 2026
@robot-ch-test-poll4 robot-ch-test-poll4 deleted the cherrypick/26.2/103536 branch May 18, 2026 11:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do not test disable testing on pull request pr-bugfix Pull request with bugfix, not backported by default pr-cherrypick Cherry-pick of merge-commit before backporting. Do not use manually - automated use only!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants