iframe-proxy

4ertus2 · 2026-03-27T22:12:57Z

Changelog category (leave one):

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Slightly reduced memory over-allocation in partial_merge JOIN.

Version info

Merged into: 26.5.1.636

clickhouse-gh · 2026-03-31T13:36:26Z

clickhouse-gh · 2026-03-31T13:39:39Z

    if (queue.isValid())
    {
+        size_t size_to_reserve = 0;
+        for (auto & chunk : chunks)


This adds an O(chunks.size()) scan on every mergeBatchImpl call. MergeSorter::read may call this many times (one output chunk per call), so the new scan can become O(num_output_chunks * num_input_chunks) overhead in a hot path.

Could we avoid re-scanning all chunks here? A simple approach is to saturate while summing (break once size_to_reserve >= max_merged_block_size) or cache the total row count once in the sorter state.

This is still live on current 10deef16: mergeBatchImpl recomputes size_to_reserve by scanning all chunks on every output block. Because chunks is not reduced by queue.next, this repeats identical work and adds O(num_chunks * num_output_blocks) overhead in a hot path.

Please either precompute the capped reserve size once in MergeSorter state, or make the sum saturating and stop as soon as size_to_reserve >= max_merged_block_size.

clickhouse-gh · 2026-04-05T05:51:46Z

    if (queue.isValid())
    {
+        size_t size_to_reserve = 0;
+        for (auto & chunk : chunks)


Changelog category is performance improvement, which requires a non-empty Changelog entry in the PR template. Please add a user-facing one-liner describing the concrete performance impact.

This is still unresolved in the current PR body: Changelog category is performance improvement, but Changelog entry is missing.

For this category, please add a concrete user-facing one-liner describing what workload gets faster and in what scenario.

alexey-milovidov · 2026-04-07T00:27:26Z

The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix.

alexey-milovidov · 2026-04-10T02:28:26Z

The flaky check failure is fixed in #102148, let's update the branch.

clickhouse-gh · 2026-05-13T20:11:06Z

LLVM Coverage Report

Changed lines: 100.00% (7/7) · Uncovered code

Full report · Diff report

…rf regression) PR ClickHouse#100963 changed the per-output-block reserve in MergeSorter::mergeBatchImpl from a constant to a sum over all input chunks (capped at max_merged_block_size). The output block is already bounded by max_merged_block_size, so for the common case (many input chunks, each around max_merged_block_size rows) the sum saturates the cap and reserves the same amount, but the summation walks the entire chunks vector on every output block in front of the hot k-way merge loop. This shows up as a measurable slowdown on the jit_sort perf test (ARM Neoverse-V2, query 12: ORDER BY with LIMIT, around +18% client_time). Compute the estimate in O(1) instead. The k-way merge draws from all chunks, so size the reserve as chunks[0].getNumRows() * chunks.size(), capped at max_merged_block_size. This is one multiply with no per-block loop, reserves closer to the actual merged output for the many-chunks case, and is bounded by max_merged_block_size exactly as the sum form was. For the regressing case both saturate the cap, recovering the time. reserve() is a capacity hint only, so this does not affect results. Reported and validated on ARM by egor-click in ClickHouse#106599. O(1) estimate form suggested by 4ertus2 in review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…rf regression) PR ClickHouse#100963 changed the per-output-block reserve in MergeSorter::mergeBatchImpl from a constant to a sum over all input chunks (capped at max_merged_block_size). The output block is already bounded by max_merged_block_size, so for the common case (many input chunks, each around max_merged_block_size rows) the sum saturates the cap and reserves the same amount, but the summation walks the entire chunks vector on every output block in front of the hot k-way merge loop. This shows up as a measurable slowdown on the jit_sort perf test (ARM Neoverse-V2, query 12: ORDER BY with LIMIT, around +18% client_time). Compute the same estimate in O(1). chunks is not cleared until the whole merge finishes, so the sum of input rows is constant across every output block; sum it once in the constructor (which already iterates the chunks) into total_input_rows, and reserve min(total_input_rows, max_merged_block_size) per block. This is the exact value the sum form reserved, computed once instead of per block, so it neither under-reserves (as a bare chunks[0] estimate would for many small chunks) nor over-reserves for uneven chunks (as a chunks[0] * chunks.size() product would). For the regressing case it saturates the cap, recovering the time. reserve() is a capacity hint only, so this does not affect results. Reported and validated on ARM by egor-click in ClickHouse#106599. O(1) estimate suggested by 4ertus2 in review; bounding it by total input rows (instead of the product) suggested by clickhouse-gh in review. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…f regression) PR ClickHouse#100963 replaced the O(1) reservation min(chunks[0].getNumRows(), max_merged_block_size) with a sum over all chunks (capped at max_merged_block_size) computed on every output block, in front of the hot k-way merge loop. The output block is already capped at max_merged_block_size, and in the regressing jit_sort/12 case the input chunks are each about max_merged_block_size rows, so the loop produced the same reservation while adding per-block work. egor-click measured +18.73% on Neoverse-V2 (issue ClickHouse#106599). This restores the original O(1) form. chunks[0] is a single input chunk, so the value never exceeds the rows the merge can emit (no over-reservation), and there is no walk over chunks. reserve() is a capacity hint only, so the merge-sort output is byte-identical regardless of the reserved size; for uneven chunks the columns grow as needed. egor-click validated this form recovers the regression on Neoverse-V2 (about -13.96% vs affected), whereas reserving the saturated total-row sum did not. Closes: ClickHouse#106599

fix reserve in SortingTransform MergeSorter

c24e65e

nikitamikhaylov added the can be tested Allows running workflows for external contributors label Mar 31, 2026

clickhouse-gh Bot added the pr-performance Pull request with some performance improvements label Mar 31, 2026

nikitamikhaylov approved these changes Mar 31, 2026

View reviewed changes

nikitamikhaylov self-assigned this Mar 31, 2026

clickhouse-gh Bot reviewed Mar 31, 2026

View reviewed changes

Merge remote-tracking branch 'origin/master' into fix-reserve-100963

887543a

clickhouse-gh Bot reviewed Apr 5, 2026

View reviewed changes

alexey-milovidov added 2 commits April 6, 2026 17:49

Merge branch 'master' into fix-reserve

a6de532

Merge remote-tracking branch 'origin/master' into tmp-update-pr-100963

1e6de05

alexey-milovidov and others added 3 commits April 10, 2026 06:05

Merge remote-tracking branch 'origin/master' into HEAD

79edee7

Merge branch 'master' into fix-reserve

a7ea7e1

Merge branch 'master' into fix-reserve

10deef1

vdimir approved these changes May 13, 2026

View reviewed changes

vdimir enabled auto-merge May 13, 2026 17:18

vdimir added this pull request to the merge queue May 13, 2026

Merged via the queue into ClickHouse:master with commit d8dd2c3 May 13, 2026
167 checks passed

robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 13, 2026

This was referenced Jun 22, 2026

Restore O(1) reserve in MergeSorter::mergeBatchImpl (fix ORDER BY perf regression) #108179

Open

ARM performance regression: jit_sort query 12 is slower #106599

Open

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: fix reserve in SortingTransform MergeSorter#100963

perf: fix reserve in SortingTransform MergeSorter#100963
vdimir merged 7 commits into
ClickHouse:masterfrom
4ertus2:fix-reserve

4ertus2 commented Mar 27, 2026 •

edited by robot-clickhouse

Loading

Uh oh!

clickhouse-gh Bot commented Mar 31, 2026 •

edited

Loading

Uh oh!

clickhouse-gh Bot Mar 31, 2026

Uh oh!

clickhouse-gh Bot May 13, 2026

Uh oh!

clickhouse-gh Bot Apr 5, 2026

Uh oh!

clickhouse-gh Bot May 13, 2026

Uh oh!

alexey-milovidov commented Apr 7, 2026

Uh oh!

alexey-milovidov commented Apr 10, 2026

Uh oh!

clickhouse-gh Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

4ertus2 commented Mar 27, 2026 • edited by robot-clickhouse Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Version info

Uh oh!

clickhouse-gh Bot commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

PR Metadata

Findings

Final Verdict

Uh oh!

clickhouse-gh Bot Mar 31, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot May 13, 2026

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov commented Apr 7, 2026

Uh oh!

alexey-milovidov commented Apr 10, 2026

Uh oh!

clickhouse-gh Bot commented May 13, 2026

LLVM Coverage Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

4ertus2 commented Mar 27, 2026 •

edited by robot-clickhouse

Loading

clickhouse-gh Bot commented Mar 31, 2026 •

edited

Loading