Cherry pick #105041 to 26.3: Fix LOGICAL_ERROR for reused materialized CTE filtered by IN over another materialized CTE by robot-ch-test-poll4 · Pull Request #106637 · ClickHouse/ClickHouse · GitHub
Skip to content

Cherry pick #105041 to 26.3: Fix LOGICAL_ERROR for reused materialized CTE filtered by IN over another materialized CTE#106637

Closed
robot-ch-test-poll4 wants to merge 35 commits into
backport/26.3/105041from
cherrypick/26.3/105041
Closed

Cherry pick #105041 to 26.3: Fix LOGICAL_ERROR for reused materialized CTE filtered by IN over another materialized CTE#106637
robot-ch-test-poll4 wants to merge 35 commits into
backport/26.3/105041from
cherrypick/26.3/105041

Conversation

@robot-ch-test-poll4

Copy link
Copy Markdown
Contributor

Original pull-request #105041

Do not merge this PR manually

This pull-request is a first step of an automated backporting.
It contains changes similar to calling git cherry-pick locally.
If you intend to continue backporting the changes, then resolve all conflicts if any.
Otherwise, if you do not want to backport them, then just close this pull-request.

The check results does not matter at this step - you can safely ignore them.

Troubleshooting

If the conflicts were resolved in a wrong way

If this cherry-pick PR is completely screwed by a wrong conflicts resolution, and you want to recreate it:

  • delete the pr-cherrypick label from the PR
  • delete this branch from the repository

You also need to check the Original pull-request for pr-backports-created label, and delete if it's presented there

The PR source

The PR is created in the CI job

novikd and others added 30 commits May 15, 2026 16:08
The optimization step is moved one level up, into a dedicated first
traversal of resolveMaterializingCTEs introduced in the next commit.
This lets the recursive buildSetInplace / buildOrderedSetInplace path
win the inner safety-net DelayedMaterializingCTEsStep's claim before
the outer step attempts its own claim.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first traversal optimizes every materialized CTE's pre-built plan
and reports whether any DelayedMaterializingCTEsStep exists in the tree.
The second traversal -- skipped entirely when the first saw none --
replaces DelayedMaterializingCTEsStep nodes with MaterializingCTEsStep,
attaching plans for any CTEs not already claimed inplace.

Why this fixes the LOGICAL_ERROR
--------------------------------
Today makePlansForCTEs did three things atomically: it exchange'd
is_materialization_planned, it called materialized_cte->plan->optimize,
and it std::moved the plan into the synthesized MaterializingCTEsStep.
The outer DelayedMaterializingCTEsStep[ct] in the main plan won the
exchange before descending into [rs]; by the time rs.plan->optimize
fired buildSetInplace via KeyCondition or VirtualColumnUtils, the inner
safety-net DelayedMaterializingCTEsStep inside the IN-subquery's plan
found ct.is_materialization_planned already true and degraded to a
pass-through. The synchronously executed pipeline read ct.storage with
is_built == false and threw.

With two traversals, the optimize calls happen first across the entire
tree, so the recursive buildSetInplace path gets to run
makePlansForCTEs on the inner safety-net step before the outer step's
claim is attempted. The outer step then sees the CTE already claimed
and degrades to a harmless pass-through.

For queries with no materialized CTEs at all (the common case), the
first traversal walks the tree without doing any work, reports no
DelayedMaterializingCTEsStep, and returns -- keeping the total work at
one walk just like before.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er CTE

Covers the LOGICAL_ERROR path that fired when the outer
DelayedMaterializingCTEsStep claimed the inner CTE before buildSetInplace
had a chance to materialize it synchronously for primary-key /
virtual-column analysis. Three query shapes: ANY LEFT JOIN on PK,
cross-join on PK, ANY LEFT JOIN with the IN filter on a non-PK column
(exercising the VirtualColumnUtils path). EXPLAIN is also checked
because the bug fired during plan optimization -- the test only
asserts that EXPLAIN produced at least one row so the reference file
is not pinned to a specific plan shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… Expressions

The two-traversal refactor in resolveMaterializingCTEs shifts the
timing of `materialized_cte->plan->optimize` such that
mergeExpressions now successfully collapses two adjacent
ExpressionSteps in the materialized CTE's local Union branch
(Project names + Change column names | Projection + Change column
names to column identifiers).

Functional results are unchanged (count() = 3 twice). The merged
ExpressionStep performs identical work in a single transform.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the two-traversal split, makePlansForCTEs no longer calls
plan->optimize -- that happens in DelayedMaterializingCTEsStep::optimizePlans
during the first traversal. Drop the now-unused QueryPlanOptimizationSettings
argument from the public method and from the local
addPlansForMaterializingCTEs helper.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A small utility to splice out a top-level `DelayedMaterializingCTEsStep`
from a plan, replacing it with its single original child. Used by the
upcoming change to `DelayedCreatingSetsStep::makePlansForSets` to strip
the safety-net materialization step from set plans being attached for
runtime execution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a set's plan is being attached for runtime construction (via
DelayedCreatingSetsStep → CreatingSetsStep), the safety-net
DelayedMaterializingCTEsStep planted by forceMaterializeCTE is no longer
needed: at runtime the outer MaterializingCTEsStep in the main plan
materializes the CTE before the CreatingSetsStep's pipeline runs.

The strip is non-recursive on purpose. Nested set subquery plans
(reached via DelayedCreatingSetsStep inside this plan) may still be
consumed by buildSetInplace / buildOrderedSetInplace during the
subsequent plan->optimize() and need their safety-nets to claim.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous order claimed materialized-CTE plans at the outer level
during resolveMaterializingCTEs before addStepsToBuildSets had a chance
to trigger synchronous buildSetInplace / buildOrderedSetInplace via the
recursive plan->optimize. When that synchronous inplace path then ran
on a deeply nested IN-subquery filtered by a MergeTree primary key, the
safety-net DelayedMaterializingCTEsStep inside the inplace pipeline saw
the CTE's plan already moved out and degraded to a pass-through,
producing LOGICAL_ERROR while reading the still-empty StorageMemory.

Swapping the order lets the inplace path claim the CTE via the safety-net
first; resolveMaterializingCTEs then materializes only the CTEs that the
inplace path did not already handle, with main's outer step degrading to
a pass-through in those cases.

Closes #102320.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sted IN

Issue #102320: the CTE `a` is referenced once at top level and once inside
a WHERE id IN (... WHERE id IN (...)) chain that hits a MergeTree primary
key. The PK pruning on the outer IN-subquery triggers buildOrderedSetInplace
for the inner IN-subquery, which used to read an empty StorageMemory because
main's resolveMaterializingCTEs had already claimed the CTE before
addStepsToBuildSets fired the inplace path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… CreatingSetStep wrapper

Two fixes for cases the initial single-step strip missed:

1. `addBuildSubqueriesForMaterializedCTEsIfNeeded` stacks one
   `DelayedMaterializingCTEsStep` per CTE dependency level on top of a
   plan. An IN-subquery that transitively references a chain of CTEs
   gets multiple safety-net steps stacked, so stripping just the
   topmost one leaves the next one in place. The remaining safety-net
   claims its CTE in the wrong scope when the recursive `plan->optimize`
   below runs `resolveMaterializingCTEs`, attaching that CTE's plan
   inside the set subquery's `CreatingSetsStep` instead of at the outer
   `MaterializingCTEsStep` in the main plan. The fix loops while the
   top of the chain is `DelayedMaterializingCTEsStep`.

2. `DelayedCreatingSetsStep::makePlansForSets` calls
   `FutureSetFromSubquery::build` first, which wraps the source plan
   with `CreatingSetStep`. That makes the plan root a `CreatingSetStep`
   with the safety-net chain as its child, not at the root itself. The
   helper now peeks past one non-Delayed wrapper to reach the chain.

Without (1), the third query in `03821_materialized_cte_with_sets`
(`WITH b AS MATERIALIZED (SELECT uid FROM a, a), a AS MATERIALIZED ...
SELECT count() FROM (SELECT * FROM b WHERE uid in b) ... LEFT SEMI JOIN
a ...`) regressed because the main IN-subquery's safety-net stack was
`Delayed[a] → Delayed[b] → body`: stripping only the top left `Delayed[b]`
in place, and the recursive `plan->optimize` then claimed `b` inside the
IN-subquery's `CreatingSetsStep`, leaving the join's right side reading
`a.storage` before the outer step had a chance to materialize it.

Without (2), the helper was a no-op for any plan reached via
`makePlansForSets` (since `build` always wraps with `CreatingSetStep`),
which is the only call site for it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lets DelayedMaterializingCTEsStep::optimizePlans dedup the
plan->optimize(...) call across multiple DelayedMaterializingCTEsStep
instances that share the same MaterializedCTE pointer (e.g. UNION
branches plus the UNION-level step). Logically parallel to the existing
is_materialization_planned flag, but for the optimize stage rather than
the claim/move stage.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before this commit, `DelayedMaterializingCTEsStep::optimizePlans` (pass 1
of `resolveMaterializingCTEs`) called `cte->plan->optimize(...)` once per
Delayed step that referenced the CTE. For shapes where multiple Delayed
steps share one `MaterializedCTE` pointer -- e.g. UNION/INTERSECT/EXCEPT
branches plus the UNION-level step, all planted by
`addBuildSubqueriesForMaterializedCTEsIfNeeded` -- the same plan was
optimized N times. Optimize is idempotent, so this was a CPU regression
rather than a correctness bug, but it scales with the number of UNION
branches and the optimization itself walks the CTE plan tree (and
recursively triggers no-op `buildSetInplace` calls on already-built
sets).

Claim `is_plan_optimized` atomically before invoking optimize, mirroring
the existing `is_materialization_planned` claim in `makePlansForCTEs`.

Addresses review comment on PR #105041:
#105041 (comment)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…al shape

The dedup commit (3a0728b + 6c08747) reduces redundant
plan->optimize(...) invocations across DelayedMaterializingCTEsStep
instances that share a MaterializedCTE pointer. With the dedup in
place, mergeExpressions no longer fires the second time during pass 1
of resolveMaterializingCTEs and the two adjacent ExpressionSteps in
the local Union branch stay unmerged -- matching the pre-PR shape this
test originally expected.

Reverts the cosmetic update introduced in 5b2a827 (which adjusted
the reference for the merged shape that pass-1's redundant optimize
calls used to produce). Functional results (count() = 3 twice) are
unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per the project rule "When mentioning logical errors, say 'exception'
instead of 'crash', because they don't crash the server in the release
build." Pure comment edit, no behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1. Start subquery execution only after the first call to MaterializingCTETransform::work
2. Use std::promise for reader and writer synchronisation
Enumerates every path that creates a MemorySource reading a materialized
CTE's StorageMemory and shows each is downstream of a DelayedPortsProcessor
gated on the corresponding MaterializingCTETransform. Pre-requisite for
removing the build_promise/build_future runtime synchronisation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Set with memory_order_release in MaterializingCTETransform::generate after
executor->finish() returns. Not yet read by any consumer - this is the first
of three additive commits that introduce a fail-fast assertion against
planner wiring bugs before removing the build_promise/build_future
synchronisation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `DelayedPortsProcessor` inserted by `MaterializingCTEsStep::updatePipeline`
already gates every reader's `MemorySource` on the corresponding
`MaterializingCTETransform` finishing, so by the time `MemorySource::generate`
runs the writer must already have set `is_built` to `true`. If the
assertion ever fires, the planner failed to wire the gate - surface a
precise `LOGICAL_ERROR` rather than blocking on `build_future.get` or reading
from a half-populated `StorageMemory`.

The `build_future.get` wait is kept for one more commit as defence in
depth, to be removed once the full test suite has run cleanly against the
new assertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The `DelayedPortsProcessor` gate inserted by `MaterializingCTEsStep` already
ensures the writer has finished before any reader's `MemorySource` is
scheduled, so the runtime wait is redundant. The `is_built` assertion
added in the previous commit holds across the full materialized-CTE test
bucket, confirming every reader path is gated.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reader/writer ordering for materialized CTEs is already enforced by the
`DelayedPortsProcessor` inserted by `MaterializingCTEsStep::updatePipeline`
(via `addPipelineBefore`) at the scheduler level: back-pressure from its
un-`setNeeded()` delayed inputs cascades up to every reader's
`MemorySource`, so `generate` is not invoked before the writer's
`MaterializingCTETransform` has finished. The runtime `build_future.get`
inside `ReadFromMemoryStorageStep` (already removed earlier in the
branch) duplicated this guarantee with a thread-blocking wait; the
matching `build_promise.set_*` sites raised `Exception(LOGICAL_ERROR, ...)`
on legitimate non-execution paths like `EXPLAIN PIPELINE` / `EXPLAIN
ESTIMATE`, which `abortOnFailedAssertion` in the `Exception` ctor turned
into a process abort under debug and sanitizer builds (PR #105041 AST
fuzzer).

Replaces the `std::promise<void>` / `std::shared_future<void>` pair in
`MaterializedCTE` with a single `std::atomic_bool is_built`, set with
`memory_order_release` from `MaterializingCTETransform::generate` after
`executor->finish()` and read with `memory_order_acquire` by the
fail-fast assertion in `ReadFromMemoryStorageStep` (added earlier in
the branch). Reimplements `MaterializedCTE::isBuilt` to load the atomic
so external callers in `Planner.cpp` and `RemoteQueryExecutor.cpp` need
no migration. Removes `MaterializingCTETransform::onCancel`, the `work`
try/catch wrapper, the destructor fallback, and the `build_promise.set_value`
call - every site whose sole purpose was to fulfil the promise.
Cancellation propagates through `PipelineExecutor`'s normal cancel path;
writer exceptions propagate to the executor and surface as the query's
terminal error.

Preserves the lazy `init` split introduced in `d60f24195e3` (the
constructor no longer eagerly opens the temp-table write pipeline; the
first `work` call invokes `init`) and the `executor->cancel` cleanup in
the destructor.

Adds a regression test
(`04260_explain_pipeline_materialized_cte.sql`) covering
`EXPLAIN PIPELINE` / `EXPLAIN PIPELINE graph=1` / `EXPLAIN ESTIMATE`
over a `SELECT` with `MATERIALIZED` CTEs, plus the `IN (subquery)`
shape that exercises the `buildOrderedSetInplace` safety-net path.

Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=105041&sha=latest&name_0=PR

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a one-line comment explaining why `executor->cancel()` is wrapped in a
log-and-swallow `try/catch` in the destructor. Pre-refactor this was implicit
("the destructor still has work to do below"); post-refactor (after the
promise/future removal) the destructor's body is solely the cancel, so the
swallowing pattern is the entire reason for the `try/catch`. Spelling it
out prevents a future reader from suspecting a missing fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes a regression introduced by removing the `build_future.get()`
wait in `MemorySource::generate` (commit c9e0a9f): a materialized
CTE referenced from a UNION inside an IN-subquery against a non-PK
column tripped the `is_built` fail-fast at
`ReadFromMemoryStorageStep.cpp:86` because
`removeTopLevelDelayedMaterializingCTEsStep` stripped only the
union-level safety-net `DelayedMaterializingCTEsStep`, leaving the
per-branch ones in the sub-plan tree. The recursive `plan->optimize`
inside `DelayedCreatingSetsStep::makePlansForSets` then let branch
1's per-branch step win `is_materialization_planned.exchange(true)`,
parking the writer inside that branch and leaving branch 2's reader
and the outer plan's `MaterializingCTEsStep` ungated.

Renames the helper to `removeAllDelayedMaterializingCTEsStep` and
rewrites it as a full tree walk that replaces every
`DelayedMaterializingCTEsStep` node anywhere in the sub-plan with
its single child. Nested `DelayedCreatingSetsStep` source plans
(held in `subqueries`, not in the immediate tree) keep their
safety-nets for their own `buildSetInplace` consumers.

Adds tests/queries/0_stateless/04295_04278_materialized_cte_in_union_runtime_set.{sql,reference}
with two pinned shapes: a minimal `WHERE b IN (SELECT FROM ct UNION
ALL SELECT FROM ct)` against a `MergeTree ORDER BY a` table where
`b` is non-PK (so the inplace path does not fire), plus the verbatim
AST-fuzzer query that originally surfaced the failure.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=105041&sha=latest&name_0=PR
PR: #105041

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
novikd and others added 5 commits May 28, 2026 21:10
Addresses review feedback on PR #105959 (comment r3319962590):
the `is_built.store(true, release)` introduced in 5b78bc7 dropped
the runtime backstop for the `MaterializedCTE` single-writer
invariant. The plan-time guard
`is_materialization_planned.exchange(true)` in
`DelayedMaterializingCTEsStep::makePlansForCTEs` is the primary
enforcement, but a future planner regression that wires two
`MaterializingCTETransform`s for the same `MaterializedCTE` would
now silently double-write into the shared `StorageMemory` and return
duplicated CTE rows instead of throwing - exactly the class of
planner topology bug the prior commit on this branch
(9335a9c, "Drop all DelayedMaterializingCTEStep's in the Set
plan") had to fix once.

Replaces the plain release-store with
`exchange(true, std::memory_order_release)` + conditional throw, so
the publish ordering is preserved (still pairs with the acquire-load
in `ReadFromMemoryStorageStep.cpp:86`) and the double-write is
detected at runtime. Cost is one atomic CAS instead of one atomic
store on a path that runs once per CTE materialization.

Review comment: #105959 (comment)
PR: #105959

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A query with materialized CTEs whose runtime sets read another materialized
CTE could throw `Reading from materialized CTE '...' before its materialization
completed - DelayedPortsProcessor gate is missing in the query plan`.

Root cause: when building a `CreatingSetsTransform` / `MaterializingCTETransform`,
the source pipeline's totals/extremes were dropped via a childless `NullSink`.
`ExecutingGraph::initializeExecution` seeds every childless node, so seeding that
`NullSink` cascaded `setNeeded` up to the set-builder that reads the not-yet-built
CTE, activating it before the writer ran and tripping the fail-fast check in
`MemorySource::generate`.

Replace the childless `NullSink` drop with a new `DroppingTransform` that consumes
all of the source pipeline's output ports (data plus totals and extremes), forwards
the data streams and discards totals and extremes. Because it keeps its data outputs
connected, it is not a childless node, so `initializeExecution` does not seed it and
the set-builder is only activated through the normal gated data path.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fix LOGICAL_ERROR for reused materialized CTE filtered by IN over another materialized CTE
@robot-ch-test-poll4 robot-ch-test-poll4 added pr-cherrypick Cherry-pick of merge-commit before backporting. Do not use manually - automated use only! do not test disable testing on pull request pr-critical-bugfix labels Jun 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do not test disable testing on pull request pr-cherrypick Cherry-pick of merge-commit before backporting. Do not use manually - automated use only! pr-critical-bugfix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants