Do not lift arrayJoin above SortingStep (issue #82279)#104558
Do not lift arrayJoin above SortingStep (issue #82279)#104558alexey-milovidov merged 19 commits into
arrayJoin above SortingStep (issue #82279)#104558Conversation
The `tryExecuteFunctionsAfterSorting` optimization in `liftUpFunctions.cpp` splits the post-sort expression into a "needed for sorting" part (computes the sort keys, stays below `SortingStep`) and an "unneeded for sorting" part (lifted above `SortingStep`). It is unsound to lift `arrayJoin` because `arrayJoin` can change the number of rows: with a `LIMIT` pushed down to the `SortingStep`, the input rows are truncated *before* `arrayJoin` expansion, silently dropping rows that should have been produced.
Skip the optimization when the "unneeded for sorting" part contains `arrayJoin`.
Repro from the issue:
CREATE TABLE m(id UInt32) ENGINE = Memory AS SELECT number FROM numbers(10);
CREATE TABLE l(id UInt32, a Array(UInt16)) ENGINE = Memory AS SELECT 4, [1,2,3,4,5];
SELECT id, arrayJoin(a) FROM m LEFT JOIN l ON l.id = m.id ORDER BY id LIMIT 3;
-- before: 0 rows; after: 3 rows.
Closes ClickHouse#82279.
A companion bug in the lazy-materialization optimization is addressed in ClickHouse#101644 by @vdimir.
|
cc @vdimir @novikd — could you review this? It fixes a correctness bug where the |
|
Workflow [PR], commit [fab2347] Summary: ✅
AI ReviewSummaryThis PR closes the PR Metadata
Findings💡 Nits
Final VerdictStatus: ✅ Approve No code changes requested. Refreshing the PR metadata before merge would make the changelog and related-link automation match the final scope. |
The previous unconditional guard in `tryExecuteFunctionsAfterSorting` blocked the lift even when no `LIMIT` was pushed into the `SortingStep`. That caused 100% reproducible CI regressions in tests that rely on the lift for determinism — notably `01592_long_window_functions1`, `03254_timeseries_instant_value_aggregate_functions`, and others that `ORDER BY` a column produced by `arrayJoin`. When such a column is in `ORDER BY`, the planner emits a `SortDescription` that omits it because it is not yet materialized below the sort. The lift compensates: it pushes the `arrayJoin` above `SortingStep` so the sort runs on un-expanded rows, and `arrayJoin` then produces each group's rows in array-element order above the sort. Without the lift, `arrayJoin` runs first, the sort sees expanded rows but does not key on the post-`arrayJoin` column, and the within-group order is undefined → non-deterministic results. `LIMIT` was the only configuration where the lift was unsound for the issue ClickHouse#82279 case — the `LIMIT` would truncate input rows before `arrayJoin` expansion, silently dropping rows. Limit the guard to that exact case (`sorting_step->getLimit() != 0`) so the deterministic-order behavior is preserved everywhere else. Verified locally: - Issue ClickHouse#82279 repro (LIMIT 3) still returns 3 rows. - 01592 arrays query returns the expected hash deterministically (50 randomized runs). - 03254 passes 50 randomized runs. - 04220 regression test passes 50 randomized runs.
|
Pushed Root cause of the CI regressionsThe previous unconditional SELECT brand_id, rack_id, arrayJoin(arraySlice(arraySort(groupArray(quantity)), 1, 2)) AS quantity
FROM stack GROUP BY brand_id, rack_id
ORDER BY brand_id, rack_id, quantity(from Without the lift:
This explains the 100% reproducible failures on Why
|
The narrow lift guard in liftUpFunctions.cpp only blocked tryExecuteFunctionsAfterSorting from moving arrayJoin above SortingStep. Two other optimizations were still rewriting the plan in ways that re-introduced the LIMIT-before-arrayJoin bug: 1. optimizeLazyMaterialization2 split the ExpressionStep below the sort into a main half (applied before LIMIT) and a lazy half (applied after JoinLazyColumnsStep). With arrayJoin in the lazy half, every LIMIT-truncated source row was re-expanded after the LIMIT, producing thousands of extra rows. (Triggered on the MergeTree path when optimize_read_in_order=0, which the CI randomizer hit.) 2. tryOptimizeTopK installs a dynamic prewhere filter that learns the sort-by threshold from rows the SortingStep has consumed. With arrayJoin between the read and the sort, those rows are already expanded and the source-row threshold the filter applies is wrong; the filter can stabilize on the wrong value and let the wrong source rows through. Both paths now bail out conservatively whenever an Expression or Filter step below the SortingStep contains arrayJoin, matching the condition used in tryExecuteFunctionsAfterSorting. Also pin optimize_read_in_order=1 in the failing query of 01533_multiple_nested so the test's row-order assertion is stable across both paths (the test pre-dates this fix and relied on sort being implicitly stable for rows that share the same id). Loosen the EXPLAIN assertion in 04220_arrayjoin_after_sort_with_limit to only verify the '[lifted up part]' marker is present, since the new and old analyzers describe the lifted ExpressionStep differently. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104558 Issue: ClickHouse#82279
|
[2026-05-11 09:30 UTC] Second fix-up pushed as commit `c1fb77526dd` to address the regressions from `d35023d`. The previous narrowed guard (in `tryExecuteFunctionsAfterSorting`) only covered one of three optimization passes that can move `arrayJoin` past a `LIMIT`. Local repro with `optimize_read_in_order=0` showed two sibling paths still produced the issue:
Both passes now bail out whenever an Expression/Filter below the sort contains `arrayJoin`, matching the condition already used in `liftUpFunctions.cpp`. Test fixes:
Pre-PR validation gate (re-verified):
Build: amd_debug local; `SELECT version() = 26.5.1.1 @ c1fb775`
CI re-running on the new commit. Session: `cron:clickhouse-ci-task-worker:20260511-084500` |
…lickHouse#101608) Pulled in from ClickHouse#101644 (@vdimir / Vladimir Cherkasov) and consolidated here so a single test file covers both lazy-materialization-only repros and the broader `arrayJoin`-below-sort-with-`LIMIT` repros that this PR adds for issue ClickHouse#82279. The second query uses `optimize_read_in_order = 0` and `query_plan_execute_functions_after_sorting = 0` to disable the `liftUpFunctions` guard added in this PR, which isolates the `optimizeLazyMaterialization` guard as the only thing keeping the result correct. This makes the test sensitive to a regression in either the lazy-materialization fix or the lift-up fix.
|
Pulled in @vdimir's regression test from #101644 as a new file Detailed comparison of #104558 vs #101644 on #101644 (comment). |
|
@groeneai merge with master |
…arrayjoin-limit-lift-up
…g master PR ClickHouse#101644 was merged on 2026-05-16 (commit 40adaa6) and added `tests/queries/0_stateless/04077_lazy_materialization_array_join.{sql,reference}` with the same test queries that I had pulled in as `04221_lazy_materialization_array_join_nested.{sql,reference}` (commit bdbdfe7). The `.reference` blobs are byte-identical (hash c5ba9f5); the `.sql` files differ only by the explanatory comments and `as e` vs `AS e` cosmetics. The queries themselves are the same, so the test ran twice after the merge with no added coverage. Drop the 04221 copy and rely on master's 04077 going forward.
|
Merged Two notes on the merge:
PR diff vs master is now 6 files / +131 / -2. Per @PedroTadim's clarification — leaving auto-merge off and waiting on @al13n321's review before enabling. |
Ok, now I get it |
The new topKThroughJoin optimization (added after our previous fixes landed) pushes Sort+Limit below a JOIN's preserved side. Its soundness breaks when arrayJoin sits between Sort and Join: pre-limiting the preserved side to n rows can yield fewer (or zero) final rows once arrayJoin expands. Bail out when an ExpressionStep DAG between Sort and Join contains arrayJoin, mirroring the guards we already have in liftUpFunctions.cpp, optimizeLazyMaterialization.cpp, and optimizeTopK.cpp.
|
Fast test failure was the original Commit Local: Session: |
|
Pre-PR validation gate (for commit a) Deterministic repro? ✓ b) Root cause explained? ✓ c) Fix matches root cause? ✓ Same d) Test intent preserved? ✓ Existing e) Both directions demonstrated? ✓ Pre-fix Fast test failure shows LIMIT 3/1/2/4 → 0; post-fix returns 3/1/2/4. Local: 50/50 with f) Fix is general? ✓ All four optimization passes that can move |
…arrayjoin-limit-lift-up
`pushLimitByIntoSort` attaches a per-stream `LimitByTransform` to a `SortingStep` and runs after `tryExecuteFunctionsAfterSorting`. When `arrayJoin` has already been lifted above the sort, the per-stream hint truncates input rows BEFORE `arrayJoin` expansion, silently dropping rows that should have produced output. The existing guard in `liftUpFunctions.cpp` checks `getLimit()` but not the LIMIT BY hint that this pass installs later. Add the same `hasArrayJoin()` guard used in the four sibling passes (`tryExecuteFunctionsAfterSorting`, `optimizeLazyMaterialization2`, `tryOptimizeTopK`, `tryTopKThroughJoin`).
|
Pre-PR validation gate for a) Deterministic repro? Yes. With a 4-part MergeTree (so the read pipeline has multiple streams) and b) Root cause explained? c) Fix matches root cause? Yes. The guard short-circuits d) Test intent preserved? Yes. The new test ( e) Both directions demonstrated? Yes. Pre-fix: 2 rows. Post-fix: 4 rows. The previously merged regression tests for f) Fix is general, not narrow? Yes. This is the fifth row-reducing sort pass to receive the guard; the four prior passes were each addressed in earlier commits on this PR. The guard sits at the same structural position as the sibling guards (one Session: cron:clickhouse-ci-task-worker:20260529-191500 |
…ision After merging master, three other tests now share the `04301` prefix (`04301_clickhouse_local_query_id`, `04301_deltalake_cdf_ffi_panic_guard`, `04301_virtual_row_per_block_wide_partial_granule`). Move the LIMIT BY regression test to the next free number. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Picked this up via Merged Addressed @PedroTadim / Local verification (built The two CI failures are unrelated to this PR:
@groeneai, the |
…kHouse#82279) Two further row-truncating paths still let `arrayJoin` ExpressionStep be sandwiched between an outer `LIMIT` and the source, so the source receives a row cap and `arrayJoin` then expands fewer rows than requested (or zero when leading arrays are empty). The previous PR commits guarded `liftUpFunctions`, `optimizeLazyMaterialization`, `optimizeTopK`, `topKThroughJoin`, and `pushLimitByIntoSort` (each Sort-based limit path). The remaining paths covered here are: 1. `optimizePrimaryKeyConditionAndLimit` walks from a `SourceStepWithFilterBase` through `ExpressionStep`s and calls `setLimit` on the source when it sees the outer `LimitStep`. With an `arrayJoin` ExpressionStep on the walk, the propagation is unsound. Composing filters through `arrayJoin` expressions in the same walk is unsound for the same reason. Stop walking when an ExpressionStep with `arrayJoin` is encountered. 2. `numbersLikeUtils::shouldPushdownLimit` is the construction-time path that sets the source limit on `ReadFromSystemNumbersStep` from the AST's `LIMIT n` directly, before the query-plan optimizer runs. With `arrayJoin` in the SELECT clause, that direct pushdown is unsound. Reject pushdown when the SELECT clause contains an `arrayJoin` function. Bot's exact repro (no sort, no join, just `LIMIT` over `arrayJoin`): SELECT arrayJoin(if(number < 3, [], [number])) FROM numbers() LIMIT 3; -- before: 0 rows -- after: 3, 4, 5 Regression test `04320_arrayjoin_no_sort_limit_pushdown` covers both analyzer paths and a `MergeTree` source. Closes: ClickHouse#82279
Pre-PR validation gate
Session id: cron:clickhouse-ci-task-worker:20260605-183300 |
|
Thanks for picking this up @alexey-milovidov — I addressed bot's no-sort On the Session id: cron:clickhouse-ci-task-worker:20260605-183300 |
…ouse#82279) The construction-time `numbersLikeUtils::shouldPushdownLimit` only inspected `query.select()` for `arrayJoin` functions. The clause form `... ARRAY JOIN expr` is stored separately on `ASTSelectQuery::arrayJoinExpressionList`, so it slipped past the guard. The source `numbers()` step then received a hard `LIMIT n` cap before the `ARRAY JOIN` expansion ran, and rows whose arrays were empty in the truncated prefix produced zero output. `SELECT number FROM numbers(100) ARRAY JOIN if(number < 3, [], [number]) AS x LIMIT 3` now returns `3, 4, 5` instead of zero rows, matching the existing guard pattern already used in `InterpreterSelectQuery.cpp:1653` for the same set of pushdown preconditions. Adds `04321_array_join_clause_limit_pushdown` covering the bot-supplied repro on both analyzers and with `query_plan_enable_optimizations = 0` (proves the fix is at AST-construction time, not in a query-plan optimizer pass). Closes: ClickHouse#82279 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pre-PR validation gatePer
Session id: cron:clickhouse-ci-task-worker:20260605-202100 |
CI ledger — covered SHA
|
| # | Check | Test | Classification | Disposition |
|---|---|---|---|---|
| 1 | Build (arm_tidy) |
Build ClickHouse |
trunk regression | Pre-existing trunk break introduced by PR #106102 (merged 2026-06-05T17:43Z by @ scanhex12). Two cppcoreguidelines-init-variables errors at src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp:731,733. CIDB shows 21+ unrelated PRs and master itself hit by the same error in the 3-hour window 18:04Z–21:25Z. Not in our diff. Tracked separately. |
| 2 | Mergeable Check |
— | cascade | Aggregator status only. Driven by (1). Will clear once arm_tidy is fixed upstream. |
| 3 | PR |
— | cascade | Aggregator status only. Same as above. |
| 4 | Bugfix validation (functional tests) |
01533_multiple_nested |
known PR-specific quirk | Per-PR-specific check that runs a bugfix regression test against master HEAD without the PR. Our PR extends 01533_multiple_nested with the regression cases for the issue this PR closes (#82279). The check expects the new assertions to fail on master HEAD (without our fix) — they do, and so the check itself reports FAIL by design. Cross-PR query confirms this 01533_multiple_nested-on-Bugfix-validation pattern is unique to this PR (7 occurrences, all on #104558). Not actionable. |
No PR-caused failures. Every other check is pass or skipping.
Pre-PR validation gate for 214ff5480948 was posted earlier (#104558 (comment)) — all six items remain answered.
Session: cron:our-pr-ci-monitor:20260605-213000
|
@alexey-milovidov re the Root cause: Fix: #106096 "Do not mark move operations that allocate through memory-tracking containers as noexcept" (Algunenano), merged 2026-06-01 13:11 UTC. Removes Post-fix verification (CIDB, since the 06-01 13:11 merge):
So the family is resolved on Session: cron:clickhouse-worker-slot-25:20260611-011400 |
…arrayjoin-limit-lift-up
…arrayjoin-limit-lift-up
|
Picked this up via Merged The sole CI failure was unrelated and is now resolved by the merge. Master changed two of the files this PR also touches — Verification (freshly built
Note: No unresolved review threads. |
…lickHouse#82279) A top-level WITH alias whose expression is arrayJoin is normalized into the referencing clauses before shouldPushdownLimit inspects the AST, so the existing query.select() check already rejects pushdown. Add cases that lock this in, including an unreferenced WITH alias that must still push down. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
This does not reproduce on the current HEAD. The bot's exact query returns the correct A column-form Proof the source is not capped, via Also note that checking Rather than a speculative change for a case that does not reproduce, I added regression coverage in |
…kHouse#82279) maxBlockSizeByLimit (old analyzer) and mainQueryNodeBlockSizeByLimit (new analyzer) compute query_info.trivial_limit, which hard source consumers (ReadFromLoopStep::generate, system.zeros, generateRandom) treat as a hard cap on generated input rows, before the outer arrayJoin expansion runs. Neither function excluded arrayJoin, so 'SELECT arrayJoin(if(number < 3, [], [number])) FROM loop(numbers(100)) LIMIT 3' capped the source to 3 input rows (all empty arrays), returning 0 rows instead of 3, 4, 5. This is the source-construction sibling of the existing numbersLikeUtils::shouldPushdownLimit guard, on a different mechanism that runs during planning (so it reproduces even with query_plan_enable_optimizations=0) and feeds every trivial_limit hard consumer at once. Both functions now return 0 (disabling the cap) when the SELECT projection contains arrayJoin or an ARRAY JOIN clause. Extends the no-sort regression test 04320 with loop() coverage on both analyzer paths, with optimizations disabled, with OFFSET, with the ARRAY JOIN clause, and no-regression cases proving the optimization still fires without arrayJoin. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pre-PR validation gateFor fix-up commit
Session id: cron:clickhouse-worker-slot-1:20260613-151300 |
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 80/87 (91.95%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 1 line(s) · Uncovered code |
|
CI fully finished on HEAD CI ledger, covered SHA
No PR-caused failures. 149 checks pass, 0 pending. The 06-05 arm_tidy Iceberg trunk regression and the arm_darwin macOS infra flake from earlier SHAs are not present on this HEAD. Awaiting @al13n321 review. |

The
tryExecuteFunctionsAfterSortingoptimization insrc/Processors/QueryPlan/Optimizations/liftUpFunctions.cppsplits the post-sort expression into a "needed for sorting" part (computes the sort keys, stays belowSortingStep) and an "unneeded for sorting" part (lifted aboveSortingStep). It is unsound to liftarrayJoinbecausearrayJoincan change the number of rows: with aLIMITpushed down to theSortingStep, the input rows are truncated beforearrayJoinexpansion, silently dropping rows that should have been produced.This PR adds a
hasArrayJoincheck on the "unneeded for sorting" part and skips the optimization in that case.Repro from the issue
Why it manifested
EXPLAINfor the LIMIT 3 query showed the liftedARRAY JOIN:The
JOINproduces 10 rows (one perm.id), onlyid = 4has a non-empty array. WithORDER BY id LIMIT 3, the innerSorting+Limit 3keepsid = 0, 1, 2(all with empty arrays). The liftedarrayJointhen produces 0 output rows.MergeTree storage happens to mask the bug via
optimize_read_in_order(the sort can read input in-order and skip the lift). Settingoptimize_read_in_order = 0reproduces the same 0-row result on MergeTree.A companion bug in the lazy-materialization optimization (#101608) is addressed in #101644 by @vdimir.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fixed a bug where queries combining
arrayJoinwithORDER BY ... LIMITafter aJOINcould silently return zero rows. The query plan optimization that lifts function evaluation above theSortingStepno longer applies when the lifted expression containsarrayJoin, sincearrayJoincan change the number of rows. Closes #82279.Documentation entry for user-facing changes
Closes #82279.
Closes #101644