iframe-proxy

nihalzp · 2026-06-04T18:04:14Z

The PR tries to optimize the scenario when we have a sorted IColumn and we want to find the end row number of equal key in the column. In many places, we do normal linear scan and in some places, we do binary search but have duplicate implementation. This PR unifies this utility it as a IColumn method and allows a single optimized binary search implementation for all use cases.

Changelog category (leave one):

Performance Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Speed up operators that scan a sorted stream for runs of equal key values — DISTINCT in order, LIMIT BY in order, negative LIMIT BY in order, full_sorting_merge and partial_merge joins.

…sumeSorted`

…ted`

clickhouse-gh · 2026-06-04T18:34:16Z

clickhouse-gh · 2026-06-04T21:36:30Z

📊 Cloud Performance Report

✅ AI verdict: no_change — no significant changes across 38 queries analysed

This PR adds a galloping plus binary-search fast path for finding equal-value runs in sorted columns (getEqualRangeEndAssumeSorted), wired only into sort-merge join equal-length detection and the LIMIT BY / negative-LIMIT-BY sorted-stream transforms. ClickBench queries use neither merge joins nor LIMIT BY, so the flagged improvements on Q4, Q28, and Q32 cannot be attributed to this change and read as run-to-run variance across the two builds; they have been downgraded. Notably the tpch_adapted_1_official suite, which is where join and ordering changes would actually surface, showed no per-query shifts, reinforcing that the ClickBench deltas are noise rather than a real effect of this PR.

clickbench

⚠️ 3 inconclusive

Flagged queries (3 of 43)

	Query	Verdict	Baseline median (ms)	PR median (ms)	Change	q-value	Hint
⚠️	4	not_sure	261	229	-12.3%	<0.0001	ClickBench has no LIMIT BY and no merge joins; this PR's equal-range fast path isn't exercised. Off-path, run-to-run var
⚠️	28	not_sure	6973	6613	-5.2%	0.0006	Off-path: PR only touches sort-merge join and LIMIT BY equal-range scanning, which this ClickBench query doesn't run. Va
⚠️	32	not_sure	1343	1269	-5.5%	<0.0001	Already not_sure; noisy (high variance, the two tests disagree). PR's merge-join/LIMIT BY change is off-path for ClickBe

_{q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on.}

tpch_adapted_1_official

🟢 No significant changes

Debug info

StressHouse run: 22fc6d18-9782-4df6-9a06-0ef9338b2c8a
MIRAI run: b802e90d-8b37-474a-899d-dee612136e7e
PR check IDs:
- clickbench_39534_1782392618
- clickbench_39548_1782392618
- clickbench_39554_1782392618
- tpch_adapted_1_official_39561_1782392618
- tpch_adapted_1_official_39572_1782392618
- tpch_adapted_1_official_39582_1782392618

Algunenano

@nihalzp It seems #106502 (comment) and #106502 (comment) are still unaddressed.

You mentioned Okay, I will keep it safe and not apply it for Float LC. but didn't change it AFAICT.

ColumnLowCardinality::getEqualRangeEndAssumeSorted would need to deal with floats, and nullable(float). And after that uniqueInsertCanonicalFloat is unused and should be removed

nihalzp · 2026-06-23T13:18:50Z

You mentioned Okay, I will keep it safe and not apply it for Float LC. but didn't change it AFAICT.

I requested review before I made that comment and have not gotten chance to work on it yet :'. I will work on it soon.

clickhouse-gh · 2026-06-25T01:40:58Z

LLVM Coverage Report

Changed lines: Changed C/C++ lines covered by tests: 482/490 (98.37%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 5 line(s) · Uncovered code

Full report · Diff report

nihalzp added 17 commits June 3, 2026 20:19

Add performance test

d227f1c

Add test to make sure limit by is not collation aware

19e1c9e

Optimize negative limit by with binary search via `getEqualRangeEndAs…

59697e5

…sumeSorted`

Optimize merge join with binary search via `getEqualRangeEndAssumeSor…

8b2aaca

…ted`

Optimize fill with binary search via getEqualRangeEndAssumeSorted

88dbde3

Optimize distinct with binary search via getEqualRangeEndAssumeSorted

e96ce32

Add multi column overloads

ebb14be

Add findEqualRangeEndAssumeSorted

2831106

Add IColumn::getEqualRangeEndAssumeSorted()

9d850ec

Add specialised overload for ColumnVector

837b285

Add specialised overload for ColumnString

b0a10d8

Add specialised overload for ColumnNullable

e2d7456

Add specialised overload for ColumnLowCardinality

b0a4c5c

Add specialised overload for ColumnFixedString

3f29db8

Add specialised overload for ColumnDecimal

c1b950f

Merge branch 'master' into optimize-equal-run-end-search

543e43a

Fix build

9e5375d

clickhouse-gh Bot added the pr-performance Pull request with some performance improvements label Jun 4, 2026

nihalzp added 2 commits June 4, 2026 19:05

Add unit test for the method

ecb4b27

Skip fast test for collation test

5a35c56

nihalzp added 4 commits June 5, 2026 11:16

Merge branch 'master' into optimize-equal-run-end-search

70a0a35

Fix ubsan issue

83e5f45

Optimize for high cardinality case

01b625c

Optimize for LIMIT BY transform

1e3d039

nihalzp changed the title ~~Speed up DISTINCT in order, sort-merge joins, ORDER BY ... WITH FILL and LIMIT BY~~ Speed up DISTINCT in order, sort-merge joins, ORDER BY ... WITH FILL, LIMIT BY and negative LIMIT BY Jun 5, 2026

clickhouse-gh Bot reviewed Jun 5, 2026

View reviewed changes

Comment thread tests/performance/full_sorting_merge_join.xml

nihalzp mentioned this pull request Jun 5, 2026

Optimize LIMIT BY #103349

Merged

1 task

Merge branch 'master' into optimize-equal-run-end-search

cea80bf

clickhouse-gh Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread src/Processors/Transforms/NegativeLimitByTransform.cpp Outdated

nihalzp added 3 commits June 13, 2026 12:15

Make LIMIT BY transforms sort desc aware

d371a6b

Pass sort description to LIMIT BY steps

71a9e3c

Add test

1bce367

clickhouse-gh Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread src/Columns/ColumnLowCardinality.cpp

nihalzp added 3 commits June 20, 2026 11:58

Merge branch 'master' into optimize-equal-run-end-search

a50f5f9

Make 0.0 and -0.0 the same in ColumnUnique

4f35e74

Add test for the fix

cedab0f

clickhouse-gh Bot reviewed Jun 20, 2026

View reviewed changes

Comment thread src/Columns/ColumnUnique.h Outdated

nihalzp added 3 commits June 20, 2026 13:09

Remove redundant ref count increases

05778b2

Do defensive fix

544cb36

Fix build

9c2467c

nihalzp requested a review from Algunenano June 22, 2026 09:07

Algunenano requested changes Jun 23, 2026

View reviewed changes

nihalzp added 4 commits June 24, 2026 20:31

Merge branch 'master' into optimize-equal-run-end-search

fcab620

Revert -0.0 and 0.0 normalization for LC

d7277c5

Skip the optimization for unsafe types

c90c40d

Add tests

fef062a

Algunenano approved these changes Jun 25, 2026

View reviewed changes

nihalzp added this pull request to the merge queue Jun 25, 2026

Merged via the queue into ClickHouse:master with commit 9f7b790 Jun 25, 2026
328 of 330 checks passed

nihalzp deleted the optimize-equal-run-end-search branch June 25, 2026 16:16

robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 25, 2026

groeneai mentioned this pull request Jun 26, 2026

Fix logical error in correlated subquery decorrelation with set operations #108554

Open

alexey-milovidov mentioned this pull request Jun 27, 2026

Refuse to materialize columns used in sort key expressions #99647

Open

3 tasks

groeneai mentioned this pull request Jun 29, 2026

Speed up ZSTD decompression on AArch64 (small-offset overlapping copies) #108049

Merged

alexey-milovidov mentioned this pull request Jun 30, 2026

Add GradualResizeProcessor to limit effective parallelism for GROUP BY on small data volumes #99495

Draft

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up `DISTINCT` in order, sort-merge joins, `LIMIT BY` and negative `LIMIT BY`#106502

Speed up `DISTINCT` in order, sort-merge joins, `LIMIT BY` and negative `LIMIT BY`#106502
nihalzp merged 47 commits into
ClickHouse:masterfrom
nihalzp:optimize-equal-run-end-search

nihalzp commented Jun 4, 2026 •

edited

Loading

Uh oh!

clickhouse-gh Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

clickhouse-gh Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Algunenano left a comment

Uh oh!

nihalzp commented Jun 23, 2026

Uh oh!

clickhouse-gh Bot commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

nihalzp commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Uh oh!

clickhouse-gh Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Missing context / blind spots

Final Verdict

Uh oh!

clickhouse-gh Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

clickbench

tpch_adapted_1_official

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Algunenano left a comment

Choose a reason for hiding this comment

Uh oh!

nihalzp commented Jun 23, 2026

Uh oh!

clickhouse-gh Bot commented Jun 25, 2026

LLVM Coverage Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nihalzp commented Jun 4, 2026 •

edited

Loading

clickhouse-gh Bot commented Jun 4, 2026 •

edited

Loading

clickhouse-gh Bot commented Jun 4, 2026 •

edited

Loading