iframe-proxy

xuir268 · 2026-03-21T23:07:39Z

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Add the SHUFFLE clause for SELECT queries, including optimized SHUFFLE LIMIT execution, query plan support, tests, and documentation.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

CLAassistant · 2026-03-21T23:13:10Z

xuir268 · 2026-03-21T23:32:51Z

Hi @alexey-milovidov , if you have time, please take a look. if need anything more, please do let me know

clickhouse-gh · 2026-03-21T23:43:31Z

Workflow [PR], commit [f473327]

Switch `SHUFFLE LIMIT` to bounded reservoir sampling, add the experimental `allow_experimental_shuffle_query` setting, update docs/tests, and fix style issues from the CI report. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100305&sha=latest&name_0=PR&name_1=Style+check

Remove the global alias restriction for `SHUFFLE` so disabled mode keeps legacy alias parsing, and update regression tests to use unambiguous clause forms. PR comment: ClickHouse#100305

Apply the optimization only for positive limits, add to , normalize , and add a stateless regression test for . PR comment: ClickHouse#100305

Add to the first runnable example so it works under default settings. PR comment: ClickHouse#100305

Keep alias-behavior checks in and move the expected query into a dedicated error-only test so does not fail the whole file with return code . Fast test report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100305&sha=0095e031faec3babd928dbe81c9b6c436a116255&name_0=PR&name_1=Fast%20test&name_2=Tests

- Remove trailing whitespace on line 633 of `CommonParsers.h` - Move `-- { serverError SUPPORT_IS_DISABLED }` inline with the semicolon in `04054_shuffle_disabled_error.sql` so the test runner correctly associates the expected error with the query CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100305&sha=12f6e74bde27f51c8223a68221c984c4581c3ce1&name_0=PR&name_1=Fast%20test

When `allow_experimental_shuffle_query = 1` but `allow_experimental_analyzer = 0`, `TreeRewriter` previously passed the guard and let the query proceed. The old planner has no `ShuffleStep`, so `SHUFFLE` was silently ignored and non-randomized results were returned — a correctness bug. Fix: after the `SUPPORT_IS_DISABLED` gate, always throw `NOT_IMPLEMENTED` in the old planner path, matching the pattern used for `LIMIT BY ALL`. Add `04055_shuffle_old_interpreter` to cover the setting-enabled / old-planner case.

xuir268 · 2026-04-22T17:11:12Z

Hi @alexey-milovidov @shankar-iyer is anything more is left to review

rschu1ze · 2026-04-22T18:48:43Z

@xuir268 Expect delay please, people out on a business trip at the moment.

Explain `SHUFFLE` primarily as result row order randomization, make `AS SHUFFLE` alias usage explicit, and describe `SHUFFLE LIMIT n` as the random-subset form without overemphasizing sampling terminology. PR comment: ClickHouse#100305

xuir268 · 2026-05-02T18:54:54Z

Hi @alexey-milovidov @shankar-iyer is anything more is left to review

shankar-iyer · 2026-05-04T06:19:07Z

Hi @alexey-milovidov @shankar-iyer is anything more is left to review

I will catch up on this PR this week.

xuir268 · 2026-05-04T17:41:33Z

Is anything more left @rschu1ze

rschu1ze · 2026-05-04T18:22:16Z

Is anything more left @rschu1ze

#100535 (comment)

rschu1ze · 2026-05-04T20:42:56Z

I really appreciate your work @xuir268 but I will need to close this PR for these reasons (sorry):

the functionality of result order shuffling can already be achieved today using ORDER BY rand() [LIMIT n] with different (in some cases faster, in some cases slower) performance characteristics,
in practice, result row shuffling is rarely used,
the SHUFFLE syntax is not SQL standard (unlike ORDER BY rand()) and also not supported by other databases as far as I see, i.e. it would be ClickHouse-only.

For this niche advantage, it seems not worth to add and maintain many hundreds LOC of code going forward.

xuir268 · 2026-05-05T07:45:39Z

@rschu1ze @shankar-iyer , Thanks for the review and for clarifying the final decision.

I want to be direct about one process issue: this outcome is frustrating because SHUFFLE had been mentioned in the roadmap, and the earlier review discussion focused on implementation details, tests, benchmarks, and documentation rather than on whether the feature itself should exist at all. If the conclusion is that a ClickHouse-specific SHUFFLE clause is not worth the long-term maintenance cost because ORDER BY rand() [LIMIT n] already covers the user-visible functionality, that would have been important to state much earlier.

I have to be honest: I do not think the review process was handled well here, because the product-level objection appears only after substantial implementation, benchmarking, testing, and documentation work had already been done.

I would appreciate it if future feature reviews call out that kind of product-level objection as early as possible. That would save a lot of unnecessary implementation and review time on both sides.

If some of the execution-side work here is still useful independently, for example around improving large-n random-row selection behavior, I would also appreciate guidance on whether that should be pursued in a different form.

rschu1ze · 2026-05-05T12:29:54Z

@xuir268 Thanks for the feedback, I understand the frustration. I could actually not find SHUFFLE on the 2024, the 2025 or the 2026 roadmaps. DId I miss something?

If the conclusion is that a ClickHouse-specific SHUFFLE clause is not worth the long-term maintenance cost because ORDER BY rand() [LIMIT n] already covers the user-visible functionality, that would have been important to state much earlier.

Fair enought and sorry again. This PR didn't have an associated linked ticket initially and I somehow missed its motivation and context in the beginning.

If some of the execution-side work here is still useful independently

I'll check back if this makes sense, thanks!

xuir268 · 2026-05-05T13:14:35Z

@rschu1ze here it is #87836 and can you also assign some tickets so I can look into more other issues or from roadmap , this will be highly appreciated

rschu1ze · 2026-05-05T22:41:09Z

#87836 is the list of intern topics. These are technically not part of the roadmap because of their experimental nature.

can you also assign some tickets so I can look into more other issues or from roadmap , this will be highly appreciated

Feel free to pick any open ticket, we are experimenting with AI and getting lots of automatic bug reports (e.g.clickgapai)

alexey-milovidov · 2026-05-11T13:59:20Z

Shuffle is obviously more performant than ORDER BY rand() (a different algorithm), and we will use it to automatically optimize ORDER BY rand().

xuir268 · 2026-05-11T20:42:55Z

@shankar-iyer you can review the MR

shankar-iyer · 2026-05-12T05:04:27Z

Shuffle is obviously more performant than ORDER BY rand() (a different algorithm), and we will use it to automatically optimize ORDER BY rand().

@shankar-iyer you can review the MR

My concern is still the memory required for a large LIMIT. I am compiling the PR and will give it a spin.

shankar-iyer · 2026-05-12T16:19:19Z

I took the PR on my VM (48 CPU) and ran on a not too big dataset (100M rows).

select * from test_100m order by rand() limit 10 format null;
0 rows in set. Elapsed: 0.040 sec. Processed 100.00 million rows, 1.00 GB (2.53 billion rows/s., 25.29 GB/s.)
Peak memory usage: 2.29 MiB.


select * from test_100m shuffle limit 10 format null;
0 rows in set. Elapsed: 183.244 sec. Processed 100.00 million rows, 6.90 GB (545.72 thousand rows/s., 37.65 MB/s.)
Peak memory usage: 182.40 KiB.


select * from test_100m order by rand() limit 1000 format null
0 rows in set. Elapsed: 0.074 sec. Processed 100.06 million rows, 1.00 GB (1.35 billion rows/s., 13.54 GB/s.)
Peak memory usage: 7.04 MiB.

select * from test_100m shuffle limit 1000 format null;
0 rows in set. Elapsed: 4.518 sec. Processed 100.00 million rows, 7.00 GB (22.13 million rows/s., 1.55 GB/s.)
Peak memory usage: 254.91 KiB.


select * from test_100m order by rand() limit 1000000 format null
0 rows in set. Elapsed: 1.070 sec. Processed 147.42 million rows, 4.32 GB (137.74 million rows/s., 4.03 GB/s.)
Peak memory usage: 1.48 GiB.

 select * from test_100m shuffle limit 1000000 format null;
0 rows in set. Elapsed: 2.457 sec. Processed 100.00 million rows, 7.00 GB (40.70 million rows/s., 2.85 GB/s.)
Peak memory usage: 10.23 GiB.


select * from test_100m order by rand() limit 10000000 format null
0 rows in set. Elapsed: 4.663 sec. Processed 199.88 million rows, 7.99 GB (42.86 million rows/s., 1.71 GB/s.)
Peak memory usage: 2.66 GiB.

select * from test_100m shuffle limit 10000000 format null;
0 rows in set. Elapsed: 7.790 sec. Processed 100.00 million rows, 7.00 GB (12.84 million rows/s., 898.54 MB/s.)
Peak memory usage: 15.38 GiB.


select * from test_100m order by rand() limit 80000000 format null
0 rows in set. Elapsed: 34.417 sec. Processed 200.00 million rows, 8.00 GB (5.81 million rows/s., 232.44 MB/s.)
Peak memory usage: 13.08 GiB.

select * from test_100m shuffle limit 80000000 format null;
0 rows in set. Elapsed: 23.199 sec. Processed 100.00 million rows, 7.00 GB (4.31 million rows/s., 301.73 MB/s.)
Peak memory usage: 44.08 GiB.

At 80M limit, lazy materialization performs too many scattered reads and maybe also duplicate granule reads and hence starts slowing down. But note that SHUFFLE's memory usage is a function of the LIMIT + number of columns - essentially ruling out large dataset sampling.

There is a simple optimization possible in LazyMaterializingTransform - we don't really need to sort the final chunk of 10M or 80M rand() values. The purpose of the feature is to return a random sample of 80M rows from the dataset and that is achieved. That change can reduce the scattered reads (we just sort by _part_offset)

select * from test_100m order by rand() limit 10000000 format null settings query_plan_lazy_materialization_unordered_output = 1
0 rows in set. Elapsed: 3.264 sec. Processed 199.88 million rows, 7.99 GB (61.23 million rows/s., 2.45 GB/s.)
Peak memory usage: 2.57 GiB.

select * from test_100m order by rand() limit 80000000 format null settings query_plan_lazy_materialization_unordered_output = 1
0 rows in set. Elapsed: 18.799 sec. Processed 200.00 million rows, 8.00 GB (10.64 million rows/s., 425.55 MB/s.)
Peak memory usage: 12.10 GiB.

Can you reiterate again when the SHUFFLE implementation in this PR would perform better? Structurally, I feel difficult for a single pass algorithm to perform allround like 2-pass lazy materialization.

xuir268 · 2026-05-18T08:03:30Z

@shankar-iyer I updated the SHUFFLE LIMIT implementation to use the lazy materialization path instead of the row-reservoir ShuffleStep.

Implementation-wise, SHUFFLE LIMIT n is planned through an ORDER BY rand() LIMIT n sampling path, but the SortingStep is marked as allowing unordered output. Then LazyMaterializingTransform sorts the selected row ids by _part_offset for the lazy read and skips restoring the final random-value order. So the returned rows are still a random sample, but SHUFFLE LIMIT n does not guarantee random output order. Plain SHUFFLE without LIMIT still uses ShuffleStep.

Local 10M-row MergeTree check, 3 selected columns, max_threads = 16, FORMAT Null:

query	elapsed	peak memory
`ORDER BY rand() LIMIT 1000`	0.311s	6.50 MiB
`SHUFFLE LIMIT 1000`	0.280s	969.60 KiB
`ORDER BY rand() LIMIT 100000`	0.409s	103.34 MiB
`SHUFFLE LIMIT 100000`	0.312s	107.86 MiB
`ORDER BY rand() LIMIT 1000000`	0.550s	235.92 MiB
`SHUFFLE LIMIT 1000000`	0.533s	242.94 MiB
`ORDER BY rand() LIMIT 8000000`	2.005s	809.41 MiB
`SHUFFLE LIMIT 8000000`	1.936s	799.29 MiB

So with the updated semantics, SHUFFLE LIMIT is useful when the user wants a random sample and does not require the sample itself to be returned in random order. If the final random ordering is required, then ORDER BY rand() LIMIT n is still the correct semantic match.

shankar-iyer · 2026-05-19T13:59:08Z

@xuir268 Checking..

shankar-iyer · 2026-05-19T16:14:15Z

@shankar-iyer I updated the SHUFFLE LIMIT implementation to use the lazy materialization path instead of the row-reservoir ShuffleStep.

Implementation-wise, SHUFFLE LIMIT n is planned through an ORDER BY rand() LIMIT n sampling path, but the SortingStep is marked as allowing unordered output. Then LazyMaterializingTransform sorts the selected row ids by _part_offset for the lazy read and skips restoring the final random-value order. So the returned rows are still a random sample, but SHUFFLE LIMIT n does not guarantee random output order. Plain SHUFFLE without LIMIT still uses ShuffleStep.

Local 10M-row MergeTree check, 3 selected columns, max_threads = 16, FORMAT Null:

query elapsed peak memory
ORDER BY rand() LIMIT 1000 0.311s 6.50 MiB
SHUFFLE LIMIT 1000 0.280s 969.60 KiB
ORDER BY rand() LIMIT 100000 0.409s 103.34 MiB
SHUFFLE LIMIT 100000 0.312s 107.86 MiB
ORDER BY rand() LIMIT 1000000 0.550s 235.92 MiB
SHUFFLE LIMIT 1000000 0.533s 242.94 MiB
ORDER BY rand() LIMIT 8000000 2.005s 809.41 MiB
SHUFFLE LIMIT 8000000 1.936s 799.29 MiB
So with the updated semantics, SHUFFLE LIMIT is useful when the user wants a random sample and does not require the sample itself to be returned in random order. If the final random ordering is required, then ORDER BY rand() LIMIT n is still the correct semantic match.

Thanks for working through the iterations on this — the design exploration has been useful and our last 2 conversations have made the actual win clear, and I think the scope should shrink to match.

Looking at the current state: for SHUFFLE LIMIT n, the planner now rewrites the tree to ORDER BY rand() LIMIT n and the heavy lifting (heap-based top-n selection) is done entirely by the existing sorting infrastructure.

So the real optimization the work has surfaced is narrow but genuine: letting the final MergingSortedTransform skip the last sort plus the planner pass that detects ORDER BY rand() LIMIT n and sets it. That gives essentially all the value with none of the other surface area.

Given how many revisions we've gone through here, I think the cleanest path forward is to close this PR and open a focused replacement PR that does just:

Pattern-match ORDER BY rand() LIMIT n in the planner.
Set SortingStep::allow_unordered_output for that case, so the final merge can skip work (expensive for large LIMITs)
[Needs research] The current placement of SHUFFLE in the SQL grammer conflicts with table alias and is not an option. You could explore tying the SHUFFLE with LIMIT -> SELECT * FROM <...> WHERE <...> LIMIT n SHUFFLE. This would mean dropping support for SHUFFLE without LIMIT - I believe that is fine. The planner can rewrite SHUFFLE LIMIT n to ORDER BY rand() LIMIT n.

xuir268 · 2026-05-19T16:19:55Z

@shankar-iyer okay , opening dedicating MR for this focused replacement PR that does just:

Pattern-match ORDER BY rand() LIMIT n in the planner.
Set SortingStep::allow_unordered_output for that case, so the final merge can skip work (expensive for large LIMITs)
[Needs research] The current placement of SHUFFLE in the SQL grammer conflicts with table alias and is not an option. You could explore tying the SHUFFLE with LIMIT -> SELECT * FROM <...> WHERE <...> LIMIT n SHUFFLE. This would mean dropping support for SHUFFLE without LIMIT - I believe that is fine. The planner can rewrite SHUFFLE LIMIT n to ORDER BY rand() LIMIT n.

xuir268 · 2026-05-24T21:07:55Z

Add clause support, tests, and docs

c18bd8e

alexey-milovidov added the can be tested Allows running workflows for external contributors label Mar 21, 2026

clickhouse-gh Bot added the pr-feature Pull request with new product feature label Mar 21, 2026

clickhouse-gh Bot reviewed Mar 21, 2026

View reviewed changes

Comment thread src/Parsers/ParserSelectQuery.cpp

clickhouse-gh Bot reviewed Mar 21, 2026

View reviewed changes

Comment thread src/Processors/Transforms/ShuffleTransform.cpp

shankar-iyer self-assigned this Mar 22, 2026