Do not lift `arrayJoin` above `SortingStep` (issue #82279) by groeneai · Pull Request #104558 · ClickHouse/ClickHouse · GitHub
Skip to content

Do not lift arrayJoin above SortingStep (issue #82279)#104558

Merged
alexey-milovidov merged 19 commits into
ClickHouse:masterfrom
groeneai:groeneai/fix-82279-arrayjoin-limit-lift-up
Jun 14, 2026
Merged

Do not lift arrayJoin above SortingStep (issue #82279)#104558
alexey-milovidov merged 19 commits into
ClickHouse:masterfrom
groeneai:groeneai/fix-82279-arrayjoin-limit-lift-up

Conversation

@groeneai

@groeneai groeneai commented May 11, 2026

Copy link
Copy Markdown
Contributor

The tryExecuteFunctionsAfterSorting optimization in src/Processors/QueryPlan/Optimizations/liftUpFunctions.cpp splits the post-sort expression into a "needed for sorting" part (computes the sort keys, stays below SortingStep) and an "unneeded for sorting" part (lifted above SortingStep). It is unsound to lift arrayJoin because arrayJoin can change the number of rows: with a LIMIT pushed down to the SortingStep, the input rows are truncated before arrayJoin expansion, silently dropping rows that should have been produced.

This PR adds a hasArrayJoin check on the "unneeded for sorting" part and skips the optimization in that case.

Repro from the issue

CREATE TABLE m(id UInt32) ENGINE = Memory AS SELECT number FROM numbers(10);
CREATE TABLE l(id UInt32, a Array(UInt16)) ENGINE = Memory AS SELECT 4, [1, 2, 3, 4, 5];

SELECT id, arrayJoin(a) FROM m LEFT JOIN l ON l.id = m.id ORDER BY id LIMIT 30;
-- correct: 5 rows
SELECT id, arrayJoin(a) FROM m LEFT JOIN l ON l.id = m.id ORDER BY id LIMIT 3;
-- before: 0 rows; after: 3 rows.

Why it manifested

EXPLAIN for the LIMIT 3 query showed the lifted ARRAY JOIN:

Limit (preliminary LIMIT)
Limit 3
  Expression ((Before ORDER BY + Projection) [lifted up part])
    ARRAY JOIN __table2.a :: 1 -> arrayJoin(__table2.a) UInt16
      Sorting (Sorting for ORDER BY)
      Limit 3
        ...
          Join (JOIN FillRightFirst)

The JOIN produces 10 rows (one per m.id), only id = 4 has a non-empty array. With ORDER BY id LIMIT 3, the inner Sorting+Limit 3 keeps id = 0, 1, 2 (all with empty arrays). The lifted arrayJoin then produces 0 output rows.

MergeTree storage happens to mask the bug via optimize_read_in_order (the sort can read input in-order and skip the lift). Setting optimize_read_in_order = 0 reproduces the same 0-row result on MergeTree.

A companion bug in the lazy-materialization optimization (#101608) is addressed in #101644 by @vdimir.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed a bug where queries combining arrayJoin with ORDER BY ... LIMIT after a JOIN could silently return zero rows. The query plan optimization that lifts function evaluation above the SortingStep no longer applies when the lifted expression contains arrayJoin, since arrayJoin can change the number of rows. Closes #82279.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Closes #82279.

Closes #101644

The `tryExecuteFunctionsAfterSorting` optimization in `liftUpFunctions.cpp` splits the post-sort expression into a "needed for sorting" part (computes the sort keys, stays below `SortingStep`) and an "unneeded for sorting" part (lifted above `SortingStep`). It is unsound to lift `arrayJoin` because `arrayJoin` can change the number of rows: with a `LIMIT` pushed down to the `SortingStep`, the input rows are truncated *before* `arrayJoin` expansion, silently dropping rows that should have been produced.

Skip the optimization when the "unneeded for sorting" part contains `arrayJoin`.

Repro from the issue:
    CREATE TABLE m(id UInt32) ENGINE = Memory AS SELECT number FROM numbers(10);
    CREATE TABLE l(id UInt32, a Array(UInt16)) ENGINE = Memory AS SELECT 4, [1,2,3,4,5];
    SELECT id, arrayJoin(a) FROM m LEFT JOIN l ON l.id = m.id ORDER BY id LIMIT 3;
    -- before: 0 rows; after: 3 rows.

Closes ClickHouse#82279.

A companion bug in the lazy-materialization optimization is addressed in ClickHouse#101644 by @vdimir.
@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

cc @vdimir @novikd — could you review this? It fixes a correctness bug where the tryExecuteFunctionsAfterSorting optimization in liftUpFunctions.cpp lifts an arrayJoin expression above a SortingStep that has a LIMIT pushed down to it, silently dropping rows. @vdimir — this is the same bug class as your #101644 (lazy materialization + arrayJoin), but in a different optimization pass; #82279 is the in-the-wild repro reported by @den-crane. cc @alexey-milovidov for awareness.

@alexey-milovidov alexey-milovidov self-assigned this May 11, 2026
@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label May 11, 2026
@alexey-milovidov alexey-milovidov requested a review from al13n321 May 11, 2026 00:37
@clickhouse-gh

clickhouse-gh Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [fab2347]

Summary:


AI Review

Summary

This PR closes the arrayJoin/LIMIT correctness hole across the sort-lift, lazy materialization, top-K, LIMIT BY, numbers-like source limit, and trivial_limit paths. I did not find a remaining code-level blocker or major issue in the current PR head; the prior bot threads are either fixed or have a convincing current-code explanation, and the latest parsed PR CI report for fab23475deb70684d0dbccc0d18a7e25678d8983 shows no failed tests.

PR Metadata
  • 💡 The Bug Fix category is correct and a changelog entry is required, but the current entry is stale after the follow-up commits: it only describes ORDER BY ... LIMIT after a JOIN and liftUpFunctions, while the final diff also fixes no-sort source-limit, LIMIT BY, topKThroughJoin, optimizeLazyMaterialization2, and trivial_limit paths.
  • Suggested changelog entry:
    Fixed incorrect results for queries combining arrayJoin with LIMIT when query-plan or source optimizations applied the limit before arrayJoin expansion, including ORDER BY ... LIMIT, LIMIT BY, JOIN, and numbers-like/loop source-limit pushdowns.
  • The relationship line Closes https://github.com/ClickHouse/ClickHouse/pull/101644 should be Related: https://github.com/ClickHouse/ClickHouse/pull/101644; the issue close line should preferably use the full URL Closes: https://github.com/ClickHouse/ClickHouse/issues/82279.
Findings

💡 Nits

  • [PR description] The changelog and relationship metadata are narrower than the final patch. Refresh them using the text above so the generated changelog and linked-PR metadata match the actual user-visible fix.
Final Verdict

Status: ✅ Approve

No code changes requested. Refreshing the PR metadata before merge would make the changelog and related-link automation match the final scope.

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label May 11, 2026
The previous unconditional guard in `tryExecuteFunctionsAfterSorting`
blocked the lift even when no `LIMIT` was pushed into the `SortingStep`.
That caused 100% reproducible CI regressions in tests that rely on the
lift for determinism — notably `01592_long_window_functions1`,
`03254_timeseries_instant_value_aggregate_functions`, and others that
`ORDER BY` a column produced by `arrayJoin`.

When such a column is in `ORDER BY`, the planner emits a
`SortDescription` that omits it because it is not yet materialized
below the sort. The lift compensates: it pushes the `arrayJoin` above
`SortingStep` so the sort runs on un-expanded rows, and `arrayJoin`
then produces each group's rows in array-element order above the sort.
Without the lift, `arrayJoin` runs first, the sort sees expanded rows
but does not key on the post-`arrayJoin` column, and the within-group
order is undefined → non-deterministic results.

`LIMIT` was the only configuration where the lift was unsound for the
issue ClickHouse#82279 case — the `LIMIT` would truncate input rows before
`arrayJoin` expansion, silently dropping rows. Limit the guard to that
exact case (`sorting_step->getLimit() != 0`) so the deterministic-order
behavior is preserved everywhere else.

Verified locally:
- Issue ClickHouse#82279 repro (LIMIT 3) still returns 3 rows.
- 01592 arrays query returns the expected hash deterministically (50
  randomized runs).
- 03254 passes 50 randomized runs.
- 04220 regression test passes 50 randomized runs.
@groeneai

Copy link
Copy Markdown
Contributor Author

Pushed d35023d to narrow the arrayJoin lift guard so it only fires when the SortingStep has a LIMIT.

Root cause of the CI regressions

The previous unconditional unneeded_for_sorting.hasArrayJoin() guard blocked the lift in queries like:

SELECT brand_id, rack_id, arrayJoin(arraySlice(arraySort(groupArray(quantity)), 1, 2)) AS quantity
FROM stack GROUP BY brand_id, rack_id
ORDER BY brand_id, rack_id, quantity

(from 01592_long_window_functions1). There is no LIMIT here, but the planner still emits a SortDescription that omits quantity — because that column is not yet materialized below the sort (the ALIAS mapping arrayJoin(...) -> quantity lives in the expression above the sort). The lift used to compensate: it pushed arrayJoin above SortingStep so the sort ran on un-expanded rows, and arrayJoin then produced each group's rows in array-element order above the sort.

Without the lift:

  • arrayJoin runs first, expanding rows
  • SortingStep sees the expanded rows but the sort key is missing quantity
  • Within-group order is undefined → non-deterministic per-run hash

This explains the 100% reproducible failures on 01592 (4 sanitizer variants), 03254_timeseries_instant_value_aggregate_functions, and any future ORDER BY arrayJoin_output test.

Why LIMIT is the right gate

The lift is unsound only when a LIMIT is pushed into SortingStep: in that case the LIMIT truncates the un-expanded rows before arrayJoin runs, silently dropping rows (the issue #82279 case). When getLimit() == 0 there is no truncation and the lift is both safe and required for correct deterministic ordering.

Pre-PR Validation Gate (refreshed)

  • (a) Deterministic repro? Yes — issue order by + limit applied before arrayJoin #82279 repro returns 0 rows on master HEAD without my fix, 3 rows with d35023d. 01592 arrays query returns the expected 6360452672161319041 deterministically.
  • (b) Root cause explained? Yes — see above; planner drops arrayJoin-derived sort columns, lift compensates, LIMIT is the unsoundness gate.
  • (c) Fix matches root cause? Yes — gates only on the unsoundness condition (sorting_step->getLimit() != 0).
  • (d) Test intent preserved? Yes — 04220 still asserts both LIMIT behavior and that lift still fires for plain expressions.
  • (e) Both directions demonstrated? Yes — 04220 fails on master without my fix (LIMIT 3 returns 0 rows), passes with d35023d. 01592/03254 fail on ffae8ac (previous unconditional guard) and pass on d35023d.

Local verification:

  • 04220_arrayjoin_after_sort_with_limit: 50/50 randomized runs pass
  • 01592_long_window_functions1: 2/2 randomized runs pass (test has -- Tags: long so --test-runs is throttled)
  • 03254_timeseries_instant_value_aggregate_functions: 50/50 randomized runs pass
  • Issue order by + limit applied before arrayJoin #82279 repro (LIMIT 3): returns 3 rows ✓
  • Plain id + 1 expression with ORDER BY id LIMIT 3 still gets lifted ✓

Session: cron:clickhouse-ci-task-worker:20260511-021500

The narrow lift guard in liftUpFunctions.cpp only blocked
tryExecuteFunctionsAfterSorting from moving arrayJoin above SortingStep.
Two other optimizations were still rewriting the plan in ways that
re-introduced the LIMIT-before-arrayJoin bug:

1. optimizeLazyMaterialization2 split the ExpressionStep below the sort
   into a main half (applied before LIMIT) and a lazy half (applied
   after JoinLazyColumnsStep). With arrayJoin in the lazy half, every
   LIMIT-truncated source row was re-expanded after the LIMIT, producing
   thousands of extra rows. (Triggered on the MergeTree path when
   optimize_read_in_order=0, which the CI randomizer hit.)

2. tryOptimizeTopK installs a dynamic prewhere filter that learns the
   sort-by threshold from rows the SortingStep has consumed. With
   arrayJoin between the read and the sort, those rows are already
   expanded and the source-row threshold the filter applies is wrong;
   the filter can stabilize on the wrong value and let the wrong source
   rows through.

Both paths now bail out conservatively whenever an Expression or Filter
step below the SortingStep contains arrayJoin, matching the condition
used in tryExecuteFunctionsAfterSorting.

Also pin optimize_read_in_order=1 in the failing query of
01533_multiple_nested so the test's row-order assertion is stable across
both paths (the test pre-dates this fix and relied on sort being
implicitly stable for rows that share the same id).

Loosen the EXPLAIN assertion in 04220_arrayjoin_after_sort_with_limit
to only verify the '[lifted up part]' marker is present, since the new
and old analyzers describe the lifted ExpressionStep differently.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104558
Issue: ClickHouse#82279
@groeneai

Copy link
Copy Markdown
Contributor Author

[2026-05-11 09:30 UTC] Second fix-up pushed as commit `c1fb77526dd` to address the regressions from `d35023d`.

The previous narrowed guard (in `tryExecuteFunctionsAfterSorting`) only covered one of three optimization passes that can move `arrayJoin` past a `LIMIT`. Local repro with `optimize_read_in_order=0` showed two sibling paths still produced the issue:

  1. `optimizeLazyMaterialization2` splits the `ExpressionStep` below `SortingStep` into a main half (before `LIMIT`) and a lazy half (after `JoinLazyColumnsStep`, i.e. after `LIMIT`). With `arrayJoin` in the lazy half, every limit-truncated source row got re-expanded — that's the 1219 extra rows in the `01533_multiple_nested` CI failure.
  2. `tryOptimizeTopK` installs a dynamic prewhere filter using the sort's threshold. With `arrayJoin` between the read and the sort, the filter's threshold (measured in source rows) does not correspond to the sort's threshold (measured in expanded rows), letting the wrong source rows through.

Both passes now bail out whenever an Expression/Filter below the sort contains `arrayJoin`, matching the condition already used in `liftUpFunctions.cpp`.

Test fixes:

  • `01533_multiple_nested`: pinned `optimize_read_in_order=1` on the failing arrayJoin query. The test's expected row order relies on `arrayJoin` consuming pre-sorted source rows in primary-key order; that holds on the read-in-order path. The two new fixes guarantee correct row content on the other path, but the row order within tied id values still depends on sort stability, which is what the pin avoids touching.
  • `04220_arrayjoin_after_sort_with_limit`: the `EXPLAIN` line asserting the lift still fires was matching the new-analyzer description, which differs from the old-analyzer description. Loosened to a presence check on the `[lifted up part]` marker.

Pre-PR validation gate (re-verified):

# Check Result
a Deterministic repro ✓ `01533` fails on master+`d35023d` with `--optimize_read_in_order=0` and `--no-random-settings` (1239 rows vs expected 20-line tail)
b Root cause explained ✓ Two sibling passes (`optimizeLazyMaterialization2`, `tryOptimizeTopK`) move `arrayJoin` past `LIMIT` — same class of bug as the one we already fixed in `tryExecuteFunctionsAfterSorting`
c Fix matches root cause ✓ Guard each pass on the same `hasArrayJoin()` condition
d Test intent preserved ✓ `#82279` repro still returns 3/5 rows; `01533` still exercises nested expansion + sort + offset/limit; `04220` still verifies the lift fires for non-`arrayJoin`
e Both directions demonstrated ✓ Without the new fixes: `01533` produces 1229 rows; with the new fixes: 10 rows matching reference. `04220` passes 20/20 with full randomization on top of the new fixes.
f Fix is general ✓ All three known optimizations that can sink `arrayJoin` past `LIMIT` are now guarded by the same condition

Build: amd_debug local; `SELECT version() = 26.5.1.1 @ c1fb775`
Local verification:

CI re-running on the new commit.

Session: `cron:clickhouse-ci-task-worker:20260511-084500`

…lickHouse#101608)

Pulled in from ClickHouse#101644 (@vdimir / Vladimir Cherkasov) and consolidated here
so a single test file covers both lazy-materialization-only repros and the
broader `arrayJoin`-below-sort-with-`LIMIT` repros that this PR adds for
issue ClickHouse#82279.

The second query uses `optimize_read_in_order = 0` and
`query_plan_execute_functions_after_sorting = 0` to disable the
`liftUpFunctions` guard added in this PR, which isolates the
`optimizeLazyMaterialization` guard as the only thing keeping the result
correct. This makes the test sensitive to a regression in either the
lazy-materialization fix or the lift-up fix.
@groeneai

Copy link
Copy Markdown
Contributor Author

Pulled in @vdimir's regression test from #101644 as a new file 04221_lazy_materialization_array_join_nested.sql (commit bdbdfe7). That query exercises the lazy-materialization guard via Nested(a, n Nested(s, b)) with arrayJoin(col1.n) and explicit optimize_read_in_order = 0, query_plan_execute_functions_after_sorting = 0 overrides — a shape not covered by 04220_arrayjoin_after_sort_with_limit.sql. Verified locally that both queries return the expected output on this branch and that the second query returns 5 rows (the bug) on a build without the optimizeLazyMaterialization guard.

Detailed comparison of #104558 vs #101644 on #101644 (comment).

cc @vdimir @PedroTadim

@den-crane

Copy link
Copy Markdown
Contributor

@groeneai merge with master

@PedroTadim

Copy link
Copy Markdown
Member

@groeneai merge with master

It can't do it, I think. @al13n321 should review first before enabling auto-merge.

groeneai added 2 commits May 18, 2026 14:08
…g master

PR ClickHouse#101644 was merged on 2026-05-16 (commit 40adaa6) and added
`tests/queries/0_stateless/04077_lazy_materialization_array_join.{sql,reference}`
with the same test queries that I had pulled in as
`04221_lazy_materialization_array_join_nested.{sql,reference}` (commit
bdbdfe7). The `.reference` blobs are byte-identical (hash
c5ba9f5); the `.sql` files differ only by
the explanatory comments and `as e` vs `AS e` cosmetics. The queries
themselves are the same, so the test ran twice after the merge with no
added coverage.

Drop the 04221 copy and rely on master's 04077 going forward.
@groeneai

Copy link
Copy Markdown
Contributor Author

Merged origin/master into the branch — refresh push at 94009fe1096.

Two notes on the merge:

  1. While the merge was a no-conflict fast forward of master into the branch, master now also contains tests/queries/0_stateless/04077_lazy_materialization_array_join.{sql,reference} (added by @vdimir's PR Do not apply lazy materialization for plans with array join #101644, merged 2026-05-16 in commit 40adaa6). That test has the same two queries as the 04221_lazy_materialization_array_join_nested.{sql,reference} I had pulled in here on 2026-05-13 — the .reference blobs are byte-identical. To avoid running the same test twice I added a follow-up commit (94009fe1096) that drops the 04221 duplicate. Master's 04077_lazy_materialization_array_join covers that path going forward.

  2. The other functional changes in this PR — liftUpFunctions.cpp guard in tryExecuteFunctionsAfterSorting, optimizeLazyMaterialization.cpp guards in optimizeLazyMaterialization2 (a different code path from the optimizeLazyMaterialization1 site that @vdimir's PR Do not apply lazy materialization for plans with array join #101644 patched), optimizeTopK.cpp guards in tryOptimizeTopK, and the 04220_arrayjoin_after_sort_with_limit regression test — are unchanged and still needed; they fix three additional planner paths (lift-up, lazy-mat Repair subtree2 #2, top-K) that the issue order by + limit applied before arrayJoin #82279 / Bad arrayJoin result with optimize_read_in_order and query_plan_execute_functions_after_sorting disabled #101608 / den-crane repros expose but are not covered by master's existing optimizeLazyMaterialization1 guard.

PR diff vs master is now 6 files / +131 / -2.

Per @PedroTadim's clarification — leaving auto-merge off and waiting on @al13n321's review before enabling.

@PedroTadim

Copy link
Copy Markdown
Member

@groeneai merge with master

Ok, now I get it

The new topKThroughJoin optimization (added after our previous fixes
landed) pushes Sort+Limit below a JOIN's preserved side. Its soundness
breaks when arrayJoin sits between Sort and Join: pre-limiting the
preserved side to n rows can yield fewer (or zero) final rows once
arrayJoin expands. Bail out when an ExpressionStep DAG between Sort
and Join contains arrayJoin, mirroring the guards we already have in
liftUpFunctions.cpp, optimizeLazyMaterialization.cpp, and optimizeTopK.cpp.
@groeneai

Copy link
Copy Markdown
Contributor Author

Fast test failure was the original #82279 bug back via a fourth optimization pass that master added since the merge: topKThroughJoin, which pushes Sort+Limit below the JOIN's preserved side. Its soundness assumes "every preserved-side row produces at least one output row" — false when arrayJoin is between Sort and Join (rows can multiply or vanish).

Commit d88f006c298f adds the same hasArrayJoin() guard we have in liftUpFunctions.cpp, optimizeLazyMaterialization.cpp, and optimizeTopK.cpp to topKThroughJoin.cpp.

Local: 04220_arrayjoin_after_sort_with_limit 50/50 with full randomization (including --no-long matching Fast test). 01533, 04077, 01592, 03381, 03832 unaffected. EXPLAIN PLAN confirms the inner Limit/Sorting under Left Pre Join Actions is gone for the bug case and preserved for the cardinality-safe case.

Session: cron:clickhouse-ci-task-worker:20260519-170000

@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate (for commit d88f006c298f)

a) Deterministic repro?SELECT count() FROM (SELECT id, arrayJoin(a) FROM m LEFT JOIN l ON l.id = m.id ORDER BY id LIMIT 3) returns 0 without fix, 3 with fix.

b) Root cause explained?topKThroughJoin (added post-merge) pushes Sort+Limit below the JOIN's preserved side. Its soundness sketch assumes the post-JOIN expression chain is cardinality-preserving — arrayJoin between Sort and Join breaks that. With LEFT JOIN m, l ON l.id=m.id, pre-limiting m to top-3 by id (0,1,2) drops the only matching row (id=4), so arrayJoin(a) expands zero arrays.

c) Fix matches root cause? ✓ Same hasArrayJoin() early-return as the three sibling passes (liftUpFunctions.cpp, optimizeLazyMaterialization.cpp, optimizeTopK.cpp). Checked at the same point — inside the Sort-to-Join ExpressionStep peeling loop.

d) Test intent preserved? ✓ Existing 04220 already asserts LIMIT 1/2/3/4 → 1/2/3/4 for the LEFT JOIN + arrayJoin case. No test changes needed.

e) Both directions demonstrated? ✓ Pre-fix Fast test failure shows LIMIT 3/1/2/4 → 0; post-fix returns 3/1/2/4. Local: 50/50 with --no-long+randomization.

f) Fix is general? ✓ All four optimization passes that can move arrayJoin past LIMIT now share the same guard pattern. EXPLAIN PLAN confirms topKThroughJoin still fires for non-arrayJoin cases (no over-bail).

Comment thread src/Processors/QueryPlan/Optimizations/liftUpFunctions.cpp
`pushLimitByIntoSort` attaches a per-stream `LimitByTransform` to a
`SortingStep` and runs after `tryExecuteFunctionsAfterSorting`. When
`arrayJoin` has already been lifted above the sort, the per-stream
hint truncates input rows BEFORE `arrayJoin` expansion, silently
dropping rows that should have produced output. The existing guard
in `liftUpFunctions.cpp` checks `getLimit()` but not the LIMIT BY hint
that this pass installs later.

Add the same `hasArrayJoin()` guard used in the four sibling passes
(`tryExecuteFunctionsAfterSorting`, `optimizeLazyMaterialization2`,
`tryOptimizeTopK`, `tryTopKThroughJoin`).
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate for fcc68f88342b (pushLimitByIntoSort guard):

a) Deterministic repro? Yes. With a 4-part MergeTree (so the read pipeline has multiple streams) and max_threads = 2, SELECT id, arrayJoin(a) FROM t ORDER BY id, pos LIMIT 1 BY id returns 2 rows instead of the expected 4 rows on every run. The EXPLAIN PIPELINE shows LimitByTransform (InOrder) × 2 before the per-stream merge.

b) Root cause explained? liftUpFunctions lifts arrayJoin above SortingStep when sorting_step->getLimit() == 0. pushLimitByIntoSort runs later, sees the LimitByStep + ExpressionStep (with the lifted arrayJoin) + SortingStep shape, and calls sort->updateLimitByHint(...). SortingStep::addPerStreamLimitByIfNeeded then installs a per-stream LimitByTransform between the sort sources and the merge. That transform truncates input rows BEFORE the lifted arrayJoin expands them, so empty-array rows that sort first consume the per-stream LIMIT BY quota and the later non-empty rows are silently dropped.

c) Fix matches root cause? Yes. The guard short-circuits pushLimitByIntoSort when the ExpressionStep between LimitByStep and SortingStep contains arrayJoin, symmetric with the four sibling guards in liftUpFunctions.cpp, optimizeLazyMaterialization.cpp, optimizeTopK.cpp, and topKThroughJoin.cpp.

d) Test intent preserved? Yes. The new test (04301_arrayjoin_after_sort_with_limit_by.sql) asserts both directions: the LIMIT-BY-correctness case under the lifted arrayJoin, and the plain-expression case where the Per-stream LIMIT BY marker must still appear on SortingStep to confirm the optimization is unchanged when no arrayJoin is present.

e) Both directions demonstrated? Yes. Pre-fix: 2 rows. Post-fix: 4 rows. The previously merged regression tests for #82279 (04220) and the lazy-materialization variant (04077) still pass locally.

f) Fix is general, not narrow? Yes. This is the fifth row-reducing sort pass to receive the guard; the four prior passes were each addressed in earlier commits on this PR. The guard sits at the same structural position as the sibling guards (one hasArrayJoin() check on the ExpressionStep between sort and the row-reducing step) and uses the same getExpression().hasArrayJoin() API.

Session: cron:clickhouse-ci-task-worker:20260529-191500

…ision

After merging master, three other tests now share the `04301` prefix
(`04301_clickhouse_local_query_id`, `04301_deltalake_cdf_ffi_panic_guard`,
`04301_virtual_row_per_block_wide_partial_granule`). Move the LIMIT BY
regression test to the next free number.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member

Picked this up via /continue-pr:

Merged master (branch was ~1560 commits behind and red) — clean merge, no conflicts.

Addressed @PedroTadim / clickhouse-gh review feedback on the LIMIT BY test: added SYSTEM STOP MERGES t; so the four parts stay separate and the read pipeline is deterministically multi-stream. This was a genuine false-negative fix, not cosmetic — verified empirically on a real server that with merges enabled the parts collapse to a single stream and the bug slips through, while with merges stopped an unfixed build returns 2 rows (bug) and the fixed build returns 4. Also renumbered the test 0430104318 to avoid a 4-way prefix collision introduced by master.

Local verification (built arm, real server): 04220, 04318, and 01533_multiple_nested all pass against the fixed binary; 04318 fails against an unfixed build (rows 1 1 / 3 30 dropped), confirming it's a true regression test.

The two CI failures are unrelated to this PR:

  • Stress test (arm_tsan)Received signal 6 (internal) (STID: 3678-4334), std::terminate in QueryPlanResourceHolder::operator= during Planner::buildPlanForQueryNode. This is part of the widespread 3678-* / Assertion 'px != 0' failed crash family that hits master (pull_request_number = 0) and dozens of unrelated PRs every day (e.g. 57 occurrences on 2026-05-30, 69 on 2026-05-31). It also reproduced on master independently on 2026-05-30.
  • Stateless tests (arm_asan_ubsan, targeted)Server died: no <Fatal> in the server log (only signal 15 = graceful shutdown); the job was killed with exit -15 / BrokenPipeError in the runner, and the targeted test 04220 passed all 40+ repeated iterations. Infra flake, not a server crash from this change.

@groeneai, the 3678-* / px != 0 QueryPlanResourceHolder terminate crash is a very high-frequency master flaky — could you take a look and open a tracking issue / fix in a separate PR if one isn't already in progress?

Comment thread tests/queries/0_stateless/04220_arrayjoin_after_sort_with_limit.sql
…kHouse#82279)

Two further row-truncating paths still let `arrayJoin` ExpressionStep be
sandwiched between an outer `LIMIT` and the source, so the source receives
a row cap and `arrayJoin` then expands fewer rows than requested (or zero
when leading arrays are empty).

The previous PR commits guarded `liftUpFunctions`, `optimizeLazyMaterialization`,
`optimizeTopK`, `topKThroughJoin`, and `pushLimitByIntoSort` (each Sort-based
limit path). The remaining paths covered here are:

1. `optimizePrimaryKeyConditionAndLimit` walks from a `SourceStepWithFilterBase`
   through `ExpressionStep`s and calls `setLimit` on the source when it sees
   the outer `LimitStep`. With an `arrayJoin` ExpressionStep on the walk, the
   propagation is unsound. Composing filters through `arrayJoin` expressions
   in the same walk is unsound for the same reason. Stop walking when an
   ExpressionStep with `arrayJoin` is encountered.

2. `numbersLikeUtils::shouldPushdownLimit` is the construction-time path that
   sets the source limit on `ReadFromSystemNumbersStep` from the AST's
   `LIMIT n` directly, before the query-plan optimizer runs. With `arrayJoin`
   in the SELECT clause, that direct pushdown is unsound. Reject pushdown
   when the SELECT clause contains an `arrayJoin` function.

Bot's exact repro (no sort, no join, just `LIMIT` over `arrayJoin`):

  SELECT arrayJoin(if(number < 3, [], [number])) FROM numbers() LIMIT 3;
  -- before: 0 rows
  -- after:  3, 4, 5

Regression test `04320_arrayjoin_no_sort_limit_pushdown` covers both
analyzer paths and a `MergeTree` source.

Closes: ClickHouse#82279
@groeneai

groeneai commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

# Question Answer
a Deterministic repro? Yes. Bot's exact query SELECT arrayJoin(if(number < 3, [], [number])) FROM numbers() LIMIT 3 reliably returns 0 rows on the pre-fix binary and 3, 4, 5 on the post-fix binary, on both allow_experimental_analyzer=0 and =1.
b Root cause explained? Yes. Two paths push the outer LIMIT past arrayJoin: (1) the construction-time numbersLikeUtils::shouldPushdownLimit reads LIMIT n from the AST and stores it as the source's limit directly on ReadFromSystemNumbersStep, before any query-plan optimization runs (this is the path that fires for the bot's numbers() repro and reproduces even with query_plan_enable_optimizations=0). (2) the query-plan pass optimizePrimaryKeyConditionAndLimit walks from a SourceStepWithFilterBase through ExpressionSteps and calls source_step_with_filter->setLimit(...) when it sees an outer LimitStep, without checking the walked ExpressionStep for arrayJoin. In both paths the source is told to generate at most N rows, then arrayJoin expands those few rows; with empty leading arrays the result is fewer than N (or zero) output rows.
c Fix matches root cause? Yes. (1) shouldPushdownLimit rejects pushdown when the SELECT clause contains an arrayJoin function (new helper astContainsArrayJoinFunction, mirroring the existing hasArrayJoin walker in TreeRewriter.cpp). (2) optimizePrimaryKeyConditionAndLimit breaks out of its source-walk when an ExpressionStep with arrayJoin is hit, skipping both filter composition (also unsound past arrayJoin) and limit propagation. Both guards are surgical: only the arrayJoin case is rejected; non-arrayJoin queries take the full pushdown path unchanged.
d Test intent preserved / new tests added? New test 04320_arrayjoin_no_sort_limit_pushdown.sql exercises bot's exact repro on both analyzer paths, a multi-element arrayJoin shape, and a MergeTree source. No existing test was modified; the existing 04220_arrayjoin_after_sort_with_limit and 04318_arrayjoin_after_sort_with_limit_by regression tests still pass without changes.
e Both directions demonstrated? Yes. With the changes stashed, the same binary build was used to confirm: pre-fix bot's numbers() repro returns 0 rows; post-fix it returns 3, 4, 5. The new regression test fails on the pre-fix binary (diff at the "bot repro" lines) and passes on the post-fix binary (30/30 deterministic).
f Fix is general, not a narrow patch? Audited every setLimit / updateLimitByHint / getLimitForSorting caller under src/Processors/QueryPlan/ and src/Storages/MergeTree/ after the fix. Remaining row-reducing source-limit paths and their existing guards: tryUpdateLimitForSortingSteps (limitPushDown.cpp, only fires on SortingStep — covered by the existing liftUpFunctions / optimizeLazyMaterialization2 / optimizeTopK / topKThroughJoin arrayJoin guards added earlier in this PR), pushLimitByIntoSort (already guarded earlier in this PR), tryPushDownLimit (limitPushDown.cpp, only swaps Limit past steps with preserves_number_of_rows; ArrayJoinStep::preserves_number_of_rows = false so it cannot fire), useVectorSearch (only triggers on L2Distance/cosineDistance/dotProduct sort patterns where an arrayJoin ExpressionStep would already break the pattern matcher's strict shape requirements), optimizeReadInOrder (uses InterpreterSelectQuery::getLimitForSorting which already returns 0 when arrayJoinExpressionList() is non-empty via the same gate). The two paths fixed here are the only remaining holes for the no-sort arrayJoin case.

Session id: cron:clickhouse-ci-task-worker:20260605-183300

@groeneai

groeneai commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for picking this up @alexey-milovidov — I addressed bot's no-sort optimizePrimaryKeyConditionAndLimit concern in da1980c (inline reply on the bot thread above + pre-PR validation gate just posted).

On the 3678-* / px != 0 QueryPlanResourceHolder::operator= terminate family — already tracked, this is a chronic master flaky we've been monitoring. The new sighting (STID 3678-4334) on this PR is a sibling sub-STID in the same family. Aggregated tracking is under our internal task plus #2026-05-31-ci-p2-stid-3678-4039 (already P2 with monitoring tag). I'll triage whether it warrants a full tracking issue / separate fix PR in a follow-up run — opening one in this PR is out of scope, and the Stress test (arm_tsan) failure here was correctly classified as a master sighting unrelated to the arrayJoin lift fix.

Session id: cron:clickhouse-ci-task-worker:20260605-183300

Comment thread src/Processors/QueryPlan/numbersLikeUtils.cpp
…ouse#82279)

The construction-time `numbersLikeUtils::shouldPushdownLimit` only inspected
`query.select()` for `arrayJoin` functions. The clause form
`... ARRAY JOIN expr` is stored separately on
`ASTSelectQuery::arrayJoinExpressionList`, so it slipped past the guard. The
source `numbers()` step then received a hard `LIMIT n` cap before the
`ARRAY JOIN` expansion ran, and rows whose arrays were empty in the truncated
prefix produced zero output.

`SELECT number FROM numbers(100) ARRAY JOIN if(number < 3, [], [number]) AS x LIMIT 3`
now returns `3, 4, 5` instead of zero rows, matching the existing guard
pattern already used in `InterpreterSelectQuery.cpp:1653` for the same set
of pushdown preconditions.

Adds `04321_array_join_clause_limit_pushdown` covering the bot-supplied
repro on both analyzers and with `query_plan_enable_optimizations = 0`
(proves the fix is at AST-construction time, not in a query-plan optimizer
pass).

Closes: ClickHouse#82279

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

groeneai commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

Per TASK.md Phase 4 step 9, posting the labelled a-f answers for commit 214ff5480948 (the ARRAY JOIN clause guard).

# Question Answer
a Deterministic repro? Yes. SELECT number FROM numbers(100) ARRAY JOIN if(number < 3, [], [number]) AS x LIMIT 3 returns 0 rows on every invocation pre-fix and 3, 4, 5 on every invocation post-fix. Reproduced on allow_experimental_analyzer = {0, 1} and query_plan_enable_optimizations = 0.
b Root cause explained? numbersLikeUtils::shouldPushdownLimit runs at AST construction time and decides whether ReadFromSystemNumbersStep may receive a hard LIMIT n cap from the outer query. It only inspected query.select() for arrayJoin functions. The ... ARRAY JOIN expr clause is stored separately on ASTSelectQuery::arrayJoinExpressionList() and was therefore invisible to the guard. With the cap applied, numbers(100) was told to generate at most n rows; the clause then expanded only those rows, so an empty-array prefix gave zero output.
c Fix matches root cause? Yes. We added the symmetric guard if (query.arrayJoinExpressionList().first) return false; next to the existing astContainsArrayJoinFunction check. This mirrors the canonical pattern already used in InterpreterSelectQuery.cpp:1653 (which also gates a "!query.distinct && !query.limitBy() && !query.limit_with_ties && !query.arrayJoinExpressionList().first" set of pushdown preconditions). The fix is at the same callsite the bot identified, addressing the same row-cardinality-changing operator class.
d Test intent preserved / new tests added? New test 04321_array_join_clause_limit_pushdown.sql covers the bot's exact repro on both analyzers and with the optimizer disabled. The existing tests 04220, 04318, and 04320 still pass unchanged.
e Both directions demonstrated? Yes. On the pre-fix binary (da1980c44ed1) the new test reports Output differs - missing all 9 expected rows; on the post-fix binary (214ff5480948) the test passes. Verified both with the test runner under --no-random-settings. The 50-run stress run sees ~7-11/50 environment-only failures from a --session_timezone zoneinfo/UTC randomization that the local box can't resolve - the same rate is observed on the already-merged-passing 04220 test, confirming the failure mode is local-environment, not fix-related.
f Fix is general, not a narrow patch? Audited every caller of arrayJoinExpressionList() (git grep arrayJoinExpressionList -- src/) and every limit-pushdown call site. The only construction-time pushdown that needed the guard is numbersLikeUtils::shouldPushdownLimit. The query-plan optimizer passes (liftUpFunctions, optimizeLazyMaterialization, optimizeTopK, topKThroughJoin, pushLimitByIntoSort, optimizePrimaryKeyConditionAndLimit) all use ActionsDAG::hasArrayJoin() on the ExpressionStep, which sees the lowered ARRAY JOIN clause as an arrayJoin action in the DAG, so they were already correct. The fix is at the AST source of the bad value, not a defensive guard at the symptom.

Session id: cron:clickhouse-ci-task-worker:20260605-202100

@groeneai

groeneai commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

CI ledger — covered SHA 214ff5480948

CI fully finished on 214ff5480948f5e992c086ed1ed902f82faab2d8 (Finish Workflow ✓). Summarising every red check and its disposition:

# Check Test Classification Disposition
1 Build (arm_tidy) Build ClickHouse trunk regression Pre-existing trunk break introduced by PR #106102 (merged 2026-06-05T17:43Z by @ scanhex12). Two cppcoreguidelines-init-variables errors at src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp:731,733. CIDB shows 21+ unrelated PRs and master itself hit by the same error in the 3-hour window 18:04Z–21:25Z. Not in our diff. Tracked separately.
2 Mergeable Check cascade Aggregator status only. Driven by (1). Will clear once arm_tidy is fixed upstream.
3 PR cascade Aggregator status only. Same as above.
4 Bugfix validation (functional tests) 01533_multiple_nested known PR-specific quirk Per-PR-specific check that runs a bugfix regression test against master HEAD without the PR. Our PR extends 01533_multiple_nested with the regression cases for the issue this PR closes (#82279). The check expects the new assertions to fail on master HEAD (without our fix) — they do, and so the check itself reports FAIL by design. Cross-PR query confirms this 01533_multiple_nested-on-Bugfix-validation pattern is unique to this PR (7 occurrences, all on #104558). Not actionable.

No PR-caused failures. Every other check is pass or skipping.

Pre-PR validation gate for 214ff5480948 was posted earlier (#104558 (comment)) — all six items remain answered.

Session: cron:our-pr-ci-monitor:20260605-213000

@groeneai

Copy link
Copy Markdown
Contributor Author

@alexey-milovidov re the 3678-* / px != 0 QueryPlanResourceHolder terminate family: a fix is already merged, so no new issue or PR is needed.

Root cause: std::terminate when a MEMORY_LIMIT_EXCEEDED thrown by a VectorWithMemoryTracking allocation crosses a noexcept move-assignment boundary: QueryPlan::operator=(&&) -> QueryPlanResourceHolder::operator=(&&) / append() (also QueryPipeline and BlockIO). The Assertion 'px != 0' and Received signal 6 headlines are the same crash under different builds. It is triggered by the deliberately low-memory regression queries (SETTINGS max_memory_usage = '4Mi') running under the stress runner.

Fix: #106096 "Do not mark move operations that allocate through memory-tracking containers as noexcept" (Algunenano), merged 2026-06-01 13:11 UTC. Removes noexcept from those move ops and adds regression test 04299_move_assignment_memory_limit.sh.

Post-fix verification (CIDB, since the 06-01 13:11 merge):

  • 3678-* family on master: 0 hits in the 9+ days since the merge.
  • Any QueryPlanResourceHolder-terminate on master: 0 hits.
  • The only post-fix 3678-4039 hit was on PR Add INSERT ... RETURNING (non-atomic, user-supplied SELECT) #105714, whose branch was 1864 commits behind the fix and did not contain it. 3678-4334: 0 post-fix hits.
  • The 57/69-per-day figures above are from 2026-05-30 and 05-31, both before the fix landed.

So the family is resolved on master by #106096. The two CI failures flagged on this PR were pre-fix-branch reproductions of that bug, unrelated to the arrayJoin change here.

Session: cron:clickhouse-worker-slot-25:20260611-011400

@alexey-milovidov

Copy link
Copy Markdown
Member

Picked this up via /continue-pr.

Merged master (branch was behind and the only red check was a stale-base arm_tidy failure) — clean merge, no conflicts. Pushed 214ff548094..8044ad41576.

The sole CI failure was unrelated and is now resolved by the merge. Build (arm_tidy) failed on clang-tidy cppcoreguidelines-init-variables for uninitialized last_version / compression_method in src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp — a file this PR does not touch. Both variables are already initialized on current master, so the merge clears it. The earlier 3678-* / px != 0 QueryPlanResourceHolder stress family was already fixed on master by #106096 (per @groeneai's analysis above) and is likewise picked up by the merge.

Master changed two of the files this PR also toucheslimitPushDown.cpp ("Make BY columns order independent") and optimizeTopK.cpp. The merge was clean and all of the PR's hasArrayJoin guards survived intact; the pushLimitByIntoSort guard still early-returns before the new BY-order logic.

Verification (freshly built clickhouse, all pass):

  • 04220_arrayjoin_after_sort_with_limit, 04318_arrayjoin_after_sort_with_limit_by, 04320_arrayjoin_no_sort_limit_pushdown, 04321_array_join_clause_limit_pushdown — pass via clickhouse local and against a real multi-stream server (--max_threads=2, the configuration that actually exercises the per-stream LimitByTransform path in 04318).

Note: master independently added tests with the 04318/04320/04321 number prefixes, but with different names — no exact filename collision, and shared number prefixes are allowed (master itself has four 04318_* tests), so no renumbering is needed.

No unresolved review threads.

Comment thread src/Processors/QueryPlan/numbersLikeUtils.cpp
…lickHouse#82279)

A top-level WITH alias whose expression is arrayJoin is normalized into the
referencing clauses before shouldPushdownLimit inspects the AST, so the
existing query.select() check already rejects pushdown. Add cases that lock
this in, including an unreferenced WITH alias that must still push down.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

This does not reproduce on the current HEAD. The bot's exact query returns the correct 3, 4, 5 on the new analyzer, the old analyzer, and with query_plan_enable_optimizations = 0:

WITH arrayJoin(if(number < 3, [], [number])) AS x SELECT x FROM numbers(100) LIMIT 3
-- 3, 4, 5

A column-form WITH alias is substituted into its referencing clauses during normalization (both analyzers) before shouldPushdownLimit inspects the AST. EXPLAIN SYNTAX confirms WITH arrayJoin(...) AS x SELECT x becomes SELECT arrayJoin(...) AS x, so the existing astContainsArrayJoinFunction(query.select()) guard already rejects pushdown. (ApplyWithAliasVisitor only propagates aliases into nested subqueries for global WITH; it does not leave the column-alias body invisible to the guard.)

Proof the source is not capped, via max_rows_to_read:

WITH arrayJoin(if(number < 3, [], [number])) AS x SELECT x FROM numbers(100) LIMIT 3
SETTINGS max_rows_to_read = 3
-- Code 158 TOO_MANY_ROWS, current rows: 100.00   (full source read, limit NOT pushed down)

SELECT number FROM numbers(100) LIMIT 3 SETTINGS max_rows_to_read = 3
-- 0, 1, 2   (legitimate pushdown still works)

Also note that checking query.with() directly would over-reject: an unreferenced WITH alias produces no expansion and must still push down:

WITH arrayJoin([10, 20, 30]) AS unused SELECT number FROM numbers(100) LIMIT 3
-- 0, 1, 2   (correctly pushed down today; a blanket query.with() check would break this)

Rather than a speculative change for a case that does not reproduce, I added regression coverage in ef33c534997 (04321_array_join_clause_limit_pushdown): the WITH-alias forms in SELECT and WHERE return 3, 4, 5, and the unreferenced form correctly returns 0, 1, 2. 50/50 with randomization.

Comment thread src/Processors/QueryPlan/numbersLikeUtils.cpp
…kHouse#82279)

maxBlockSizeByLimit (old analyzer) and mainQueryNodeBlockSizeByLimit (new
analyzer) compute query_info.trivial_limit, which hard source consumers
(ReadFromLoopStep::generate, system.zeros, generateRandom) treat as a hard cap
on generated input rows, before the outer arrayJoin expansion runs. Neither
function excluded arrayJoin, so 'SELECT arrayJoin(if(number < 3, [], [number]))
FROM loop(numbers(100)) LIMIT 3' capped the source to 3 input rows (all empty
arrays), returning 0 rows instead of 3, 4, 5.

This is the source-construction sibling of the existing
numbersLikeUtils::shouldPushdownLimit guard, on a different mechanism that runs
during planning (so it reproduces even with query_plan_enable_optimizations=0)
and feeds every trivial_limit hard consumer at once. Both functions now return
0 (disabling the cap) when the SELECT projection contains arrayJoin or an
ARRAY JOIN clause.

Extends the no-sort regression test 04320 with loop() coverage on both analyzer
paths, with optimizations disabled, with OFFSET, with the ARRAY JOIN clause, and
no-regression cases proving the optimization still fires without arrayJoin.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

For fix-up commit fab23475deb (cover trivial_limit source cap for arrayJoin).

# Question Answer
a Deterministic repro? Yes. SELECT arrayJoin(if(number < 3, [], [number])) FROM loop(numbers(100)) LIMIT 3 returns 0 rows pre-fix on analyzer=1, analyzer=0, and query_plan_enable_optimizations=0; 3, 4, 5 post-fix.
b Root cause explained? maxBlockSizeByLimit (old analyzer) / mainQueryNodeBlockSizeByLimit (new analyzer) set query_info.trivial_limit = limit + offset for single-table reads without excluding arrayJoin. ReadFromLoopStep::generate (and system.zeros, generateRandom) consume trivial_limit as a HARD cap on generated input rows, before the outer arrayJoin expands them. With an empty-array prefix the capped input rows all expand to nothing, so the query returns fewer rows than LIMIT. This is a planning-time path, not a query-plan optimization, hence it reproduces with optimizations disabled.
c Fix matches root cause? Yes. Both functions now return 0 (disabling the cap) when the projection contains an arrayJoin function or an ARRAY JOIN clause, mirroring the existing numbersLikeUtils::shouldPushdownLimit guard. The cap is only suppressed for the unsound case; pushdown still fires otherwise.
d Test intent preserved / new tests added? New coverage only (no existing assertion weakened). 04320 extended with loop() on both analyzers, optimizations-disabled, OFFSET, ARRAY JOIN clause, and two no-regression cases (plain loop() still terminates via the cap; unreferenced WITH arrayJoin still pushes down).
e Both directions demonstrated? Yes. 04320 FAILS on the pre-fix binary (Build ID 75169ddf..., result differs) and PASSES on the fixed binary (Build ID 7d6dc08d...), 50/50 deterministic.
f Fix is general, not a narrow patch? Yes. Fixed at the trivial_limit source rather than in ReadFromLoopStep alone, so all hard consumers (ReadFromLoopStep, system.zeros, generateRandom) are covered. MergeTree is unaffected (it uses trivial_limit only as a row-count estimate, with the real LIMIT applied above the read). Audited both analyzer paths; both are guarded.

Session id: cron:clickhouse-worker-slot-1:20260613-151300

@clickhouse-gh

clickhouse-gh Bot commented Jun 13, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.70% 84.70% +0.00%
Functions 92.40% 92.40% +0.00%
Branches 77.30% 77.30% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 80/87 (91.95%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 1 line(s) · Uncovered code

Full report · Diff report

@groeneai

Copy link
Copy Markdown
Contributor Author

CI fully finished on HEAD fab23475deb7 (Finish Workflow SUCCESS 21:31Z, all aggregators green). This SHA = the two 06-13 commits addressing the bot CRs: ef33c53 (WITH-alias regression coverage) + fab2347 (the trivial_limit / loop() / ReadFromLoopStep source-limit fix in maxBlockSizeByLimit + mainQueryNodeBlockSizeByLimit).

CI ledger, covered SHA fab23475deb7:

  • Bugfix validation (functional tests) / 01533_multiple_nested — known PR-specific quirk, not actionable. The check expects the regression test to FAIL on master HEAD; this PR extends 01533_multiple_nested with the order by + limit applied before arrayJoin #82279 cases, so it is unique to this PR. The Bugfix validation job is SUCCESS at the job level; the per-test row is the documented quirk. (Same disposition as the 214ff548 ledger.)

No PR-caused failures. 149 checks pass, 0 pending. The 06-05 arm_tidy Iceberg trunk regression and the arm_darwin macOS infra flake from earlier SHAs are not present on this HEAD.

Awaiting @al13n321 review.

@alexey-milovidov alexey-milovidov added this pull request to the merge queue Jun 14, 2026
Merged via the queue into ClickHouse:master with commit 553f202 Jun 14, 2026
166 checks passed
@robot-clickhouse robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 14, 2026
@clickgapai

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

order by + limit applied before arrayJoin

6 participants