iframe-proxy

nihalzp · 2026-05-16T15:21:16Z

LIMIT BY queries on MergeTree tables currently call pipeline.resize(1) to merge every upstream stream into one, then attach a single LimitByTransform; the hash table sees the entire input on one thread regardless of how many partitions the data is spread across. This PR runs LIMIT BY per partition in parallel when the partition expression is a deterministic function of the LIMIT BY columns; as a result, no LIMIT BY group can ever span two partitions.

CREATE TABLE t (a UInt64, b UInt64) ENGINE = MergeTree ORDER BY tuple() PARTITION BY a % 8;
INSERT INTO t SELECT number, number FROM numbers(10000000);

EXPLAIN PIPELINE
SELECT a FROM t WHERE b > 1 LIMIT 10 BY a
SETTINGS allow_limit_by_partitions_independently = 1;

    ┌─explain──────────────────────────────────────┐
 1. │ (Expression)                                 │
 2. │ ExpressionTransform × 8                      │
 3. │   (LimitBy)                                  │
 4. │   LimitByTransform × 8                       │ <- now × 8 (was × 1)
 5. │     (Expression)                             │
 6. │     ExpressionTransform × 8                  │
 7. │       (Expression)                           │
 8. │       ExpressionTransform × 8                │
 9. │         (ReadFromMergeTree)                  │
10. │         Resize 7 → 1                         │
11. │           MergeTreeSelect(...                │
    └──────────────────────────────────────────────┘

Changelog category (leave one):

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Speed up LIMIT BY queries on partitioned MergeTree tables by running LIMIT BY inside each partition's stream in parallel, instead of merging all streams into one before applying the limit. This applies when the partition expression is a deterministic function of the LIMIT BY columns, so no LIMIT BY group can span two partitions. Controlled by the new setting allow_limit_by_partitions_independently (enabled by default).

Version info

Merged into: 26.6.1.125

clickhouse-gh · 2026-05-16T15:21:52Z

alexey-milovidov

Ok. Related question - do you know why allow_aggregate_partitions_independently is not true by default? If not, I will send a PR enabling it. And the second question - do we have the same for DISTINCT? If not, let's add, it's even more important than for LIMIT BY.

nihalzp · 2026-05-16T15:32:40Z

Ok. Related question - do you know why allow_aggregate_partitions_independently is not true by default?

Yes. Actutally, few days ago, I asked the same question to @nickitat. The issue is that for skewed data this can slow down queries significantly.

nihalzp · 2026-05-16T15:33:46Z

And the second question - do we have the same for DISTINCT? If not, let's add, it's even more important than for LIMIT BY.

We do not have it for DISTINCT. But it is already planned and we can reuse most of the code from this PR.

nihalzp · 2026-05-16T16:32:28Z

+            if (optimization_settings.aggregate_partitions_independently)
+                optimizeAggregationPerPartition(frame_node, nodes, optimization_settings);
+
+            if (optimization_settings.limit_by_partitions_independently)
+                optimizeLimitByPerPartition(frame_node, nodes, optimization_settings);


These optimizations were previously run in first pass. But that's risky as some projection optimization can replace the source after we've set the flags, causing streams to no longer carry disjoint partitions but the transform still expecting them to. This current place is correct as at this point query plan has been stabilized.

alexey-milovidov · 2026-05-17T18:41:20Z

This was fixed by #105146. Let's update the branch.

clickhouse-gh · 2026-05-25T02:07:13Z

LLVM Coverage Report

Changed lines: 82.47% (127/154) | lost baseline coverage: 1 line(s) · Uncovered code

Full report · Diff report

The test (added in ClickHouse#105126) has ~30 scenarios that each do INSERT + EXPLAIN, and the MSan slowdown pushes it past the per-test timeout. It is consistently flaky on `Stateless tests (amd_msan, ...)` while passing on every other sanitizer/build flavor. Adding `no-msan` since the test exercises pipeline planning, not memory correctness, so MSan adds little signal. CI report (first failing master commit, sha 69102a31ed94): https://d1k2gkhrlfqv31.cloudfront.net/clickhouse-test-reports-private/json.html?REF=master&sha=69102a31ed94&name_0=MasterCI&name_1=Stateless%20tests%20%28amd_msan%2C%20meta%20in%20keeper%2C%20s3%20storage%2C%20parallel%2C%202%2F2%29

Speed up `LIMIT BY` by running it independently per `MergeTree` partition

nihalzp added 7 commits May 16, 2026 10:15

Add test

6a6b251

Show output_each_partition_through_separate_port in actions

09c1997

Add perf test

5c63b1d

Add explanation of the tags

0f40b64

Add setting allow_limit_by_partitions_independently

56a0171

Integrate the setting into optimizations

686ceeb

Request parallel reading for LIMIT BY

0a0fd75

clickhouse-gh Bot added the pr-performance Pull request with some performance improvements label May 16, 2026

alexey-milovidov approved these changes May 16, 2026

View reviewed changes

alexey-milovidov self-assigned this May 16, 2026

clickhouse-gh Bot reviewed May 16, 2026

View reviewed changes

Comment thread src/Processors/QueryPlan/Optimizations/useDataParallelAggregation.cpp

Add to ignore experimental flag

e380617

nihalzp added 4 commits May 16, 2026 15:55

Merge branch 'master' into partitioned-limit-by

e88387e

Fix build

4bee1d1

Add regression test

413b8a0

Move per partition optimizations with in order optimizations

5d07134

nihalzp commented May 16, 2026

View reviewed changes

clickhouse-gh Bot reviewed May 16, 2026

View reviewed changes

Comment thread tests/queries/0_stateless/04218_limit_by_partitions_independently.sql

nihalzp added 5 commits May 17, 2026 06:46

Merge branch 'master' into partitioned-limit-by

2a8cedf

Rerun CI

2569032

Merge branch 'master' into partitioned-limit-by

6a40dca

Make the test run faster

b82a828

Increase test coverage

bd2b3af

alexey-milovidov mentioned this pull request May 17, 2026

Stop the bleeding in function_prop_fuzzer #105146

Merged

1 task

Merge branch 'master' into partitioned-limit-by

62ae401

groeneai mentioned this pull request May 18, 2026

Fix MSan use-of-uninitialized-value in UTF-8 case-insensitive StringSearcher #105223

Merged

1 task

groeneai mentioned this pull request May 18, 2026

Fix exception on IN tuple() against Distributed sharded table #104966

Merged

1 task

nihalzp added 5 commits May 19, 2026 20:33

Merge branch 'master' into partitioned-limit-by

ff02c56

Fix test

c97585f

Merge branch 'master' into partitioned-limit-by

1839803

Merge branch 'master' into partitioned-limit-by

bb920e7

Move settings to 26.6

e4f6aaa

nihalzp added this pull request to the merge queue May 25, 2026

Merged via the queue into ClickHouse:master with commit ab6b5ff May 25, 2026
166 of 167 checks passed

nihalzp deleted the partitioned-limit-by branch May 25, 2026 06:23

robot-clickhouse-ci-1 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 25, 2026

fm4v mentioned this pull request May 26, 2026

Disable 04218_limit_by_partitions_independently under MSan (test timeout) #105842

Merged

DavidHe-2008 pushed a commit to DavidHe-2008/ClickHouse that referenced this pull request Jun 1, 2026

Merge pull request ClickHouse#105126 from nihalzp/partitioned-limit-by

f962ebb

Speed up `LIMIT BY` by running it independently per `MergeTree` partition

nihalzp mentioned this pull request Jun 23, 2026

Speed up DISTINCT by running it independently per MergeTree partition #108326

Open

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up `LIMIT BY` by running it independently per `MergeTree` partition#105126

Speed up `LIMIT BY` by running it independently per `MergeTree` partition#105126
nihalzp merged 23 commits into
ClickHouse:masterfrom
nihalzp:partitioned-limit-by

nihalzp commented May 16, 2026 •

edited by robot-clickhouse

Loading

Uh oh!

clickhouse-gh Bot commented May 16, 2026 •

edited

Loading

Uh oh!

alexey-milovidov left a comment

Uh oh!

Uh oh!

nihalzp commented May 16, 2026 •

edited

Loading

Uh oh!

nihalzp commented May 16, 2026 •

edited

Loading

Uh oh!

nihalzp May 16, 2026

Uh oh!

Uh oh!

alexey-milovidov commented May 17, 2026

Uh oh!

clickhouse-gh Bot commented May 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

nihalzp commented May 16, 2026 • edited by robot-clickhouse Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Version info

Uh oh!

clickhouse-gh Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Final Verdict

Uh oh!

alexey-milovidov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nihalzp commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nihalzp commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nihalzp May 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alexey-milovidov commented May 17, 2026

Uh oh!

clickhouse-gh Bot commented May 25, 2026

LLVM Coverage Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nihalzp commented May 16, 2026 •

edited by robot-clickhouse

Loading

clickhouse-gh Bot commented May 16, 2026 •

edited

Loading

nihalzp commented May 16, 2026 •

edited

Loading

nihalzp commented May 16, 2026 •

edited

Loading