Fix LOGICAL_ERROR for reused materialized CTE filtered by IN over another materialized CTE#105041
Conversation
The optimization step is moved one level up, into a dedicated first traversal of resolveMaterializingCTEs introduced in the next commit. This lets the recursive buildSetInplace / buildOrderedSetInplace path win the inner safety-net DelayedMaterializingCTEsStep's claim before the outer step attempts its own claim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The first traversal optimizes every materialized CTE's pre-built plan and reports whether any DelayedMaterializingCTEsStep exists in the tree. The second traversal -- skipped entirely when the first saw none -- replaces DelayedMaterializingCTEsStep nodes with MaterializingCTEsStep, attaching plans for any CTEs not already claimed inplace. Why this fixes the LOGICAL_ERROR -------------------------------- Today makePlansForCTEs did three things atomically: it exchange'd is_materialization_planned, it called materialized_cte->plan->optimize, and it std::moved the plan into the synthesized MaterializingCTEsStep. The outer DelayedMaterializingCTEsStep[ct] in the main plan won the exchange before descending into [rs]; by the time rs.plan->optimize fired buildSetInplace via KeyCondition or VirtualColumnUtils, the inner safety-net DelayedMaterializingCTEsStep inside the IN-subquery's plan found ct.is_materialization_planned already true and degraded to a pass-through. The synchronously executed pipeline read ct.storage with is_built == false and threw. With two traversals, the optimize calls happen first across the entire tree, so the recursive buildSetInplace path gets to run makePlansForCTEs on the inner safety-net step before the outer step's claim is attempted. The outer step then sees the CTE already claimed and degrades to a harmless pass-through. For queries with no materialized CTEs at all (the common case), the first traversal walks the tree without doing any work, reports no DelayedMaterializingCTEsStep, and returns -- keeping the total work at one walk just like before. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…er CTE Covers the LOGICAL_ERROR path that fired when the outer DelayedMaterializingCTEsStep claimed the inner CTE before buildSetInplace had a chance to materialize it synchronously for primary-key / virtual-column analysis. Three query shapes: ANY LEFT JOIN on PK, cross-join on PK, ANY LEFT JOIN with the IN filter on a non-PK column (exercising the VirtualColumnUtils path). EXPLAIN is also checked because the bug fired during plan optimization -- the test only asserts that EXPLAIN produced at least one row so the reference file is not pinned to a specific plan shape. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… Expressions The two-traversal refactor in resolveMaterializingCTEs shifts the timing of `materialized_cte->plan->optimize` such that mergeExpressions now successfully collapses two adjacent ExpressionSteps in the materialized CTE's local Union branch (Project names + Change column names | Projection + Change column names to column identifiers). Functional results are unchanged (count() = 3 twice). The merged ExpressionStep performs identical work in a single transform. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the two-traversal split, makePlansForCTEs no longer calls plan->optimize -- that happens in DelayedMaterializingCTEsStep::optimizePlans during the first traversal. Drop the now-unused QueryPlanOptimizationSettings argument from the public method and from the local addPlansForMaterializingCTEs helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Just for posterity -- other (probably bad) attempts to fix this by me: |
This comment was marked as resolved.
This comment was marked as resolved.
A small utility to splice out a top-level `DelayedMaterializingCTEsStep` from a plan, replacing it with its single original child. Used by the upcoming change to `DelayedCreatingSetsStep::makePlansForSets` to strip the safety-net materialization step from set plans being attached for runtime execution. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When a set's plan is being attached for runtime construction (via DelayedCreatingSetsStep → CreatingSetsStep), the safety-net DelayedMaterializingCTEsStep planted by forceMaterializeCTE is no longer needed: at runtime the outer MaterializingCTEsStep in the main plan materializes the CTE before the CreatingSetsStep's pipeline runs. The strip is non-recursive on purpose. Nested set subquery plans (reached via DelayedCreatingSetsStep inside this plan) may still be consumed by buildSetInplace / buildOrderedSetInplace during the subsequent plan->optimize() and need their safety-nets to claim. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous order claimed materialized-CTE plans at the outer level during resolveMaterializingCTEs before addStepsToBuildSets had a chance to trigger synchronous buildSetInplace / buildOrderedSetInplace via the recursive plan->optimize. When that synchronous inplace path then ran on a deeply nested IN-subquery filtered by a MergeTree primary key, the safety-net DelayedMaterializingCTEsStep inside the inplace pipeline saw the CTE's plan already moved out and degraded to a pass-through, producing LOGICAL_ERROR while reading the still-empty StorageMemory. Swapping the order lets the inplace path claim the CTE via the safety-net first; resolveMaterializingCTEs then materializes only the CTEs that the inplace path did not already handle, with main's outer step degrading to a pass-through in those cases. Closes #102320. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…sted IN Issue #102320: the CTE `a` is referenced once at top level and once inside a WHERE id IN (... WHERE id IN (...)) chain that hits a MergeTree primary key. The PK pruning on the outer IN-subquery triggers buildOrderedSetInplace for the inner IN-subquery, which used to read an empty StorageMemory because main's resolveMaterializingCTEs had already claimed the CTE before addStepsToBuildSets fired the inplace path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… CreatingSetStep wrapper Two fixes for cases the initial single-step strip missed: 1. `addBuildSubqueriesForMaterializedCTEsIfNeeded` stacks one `DelayedMaterializingCTEsStep` per CTE dependency level on top of a plan. An IN-subquery that transitively references a chain of CTEs gets multiple safety-net steps stacked, so stripping just the topmost one leaves the next one in place. The remaining safety-net claims its CTE in the wrong scope when the recursive `plan->optimize` below runs `resolveMaterializingCTEs`, attaching that CTE's plan inside the set subquery's `CreatingSetsStep` instead of at the outer `MaterializingCTEsStep` in the main plan. The fix loops while the top of the chain is `DelayedMaterializingCTEsStep`. 2. `DelayedCreatingSetsStep::makePlansForSets` calls `FutureSetFromSubquery::build` first, which wraps the source plan with `CreatingSetStep`. That makes the plan root a `CreatingSetStep` with the safety-net chain as its child, not at the root itself. The helper now peeks past one non-Delayed wrapper to reach the chain. Without (1), the third query in `03821_materialized_cte_with_sets` (`WITH b AS MATERIALIZED (SELECT uid FROM a, a), a AS MATERIALIZED ... SELECT count() FROM (SELECT * FROM b WHERE uid in b) ... LEFT SEMI JOIN a ...`) regressed because the main IN-subquery's safety-net stack was `Delayed[a] → Delayed[b] → body`: stripping only the top left `Delayed[b]` in place, and the recursive `plan->optimize` then claimed `b` inside the IN-subquery's `CreatingSetsStep`, leaving the join's right side reading `a.storage` before the outer step had a chance to materialize it. Without (2), the helper was a no-op for any plan reached via `makePlansForSets` (since `build` always wraps with `CreatingSetStep`), which is the only call site for it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lets DelayedMaterializingCTEsStep::optimizePlans dedup the plan->optimize(...) call across multiple DelayedMaterializingCTEsStep instances that share the same MaterializedCTE pointer (e.g. UNION branches plus the UNION-level step). Logically parallel to the existing is_materialization_planned flag, but for the optimize stage rather than the claim/move stage. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before this commit, `DelayedMaterializingCTEsStep::optimizePlans` (pass 1 of `resolveMaterializingCTEs`) called `cte->plan->optimize(...)` once per Delayed step that referenced the CTE. For shapes where multiple Delayed steps share one `MaterializedCTE` pointer -- e.g. UNION/INTERSECT/EXCEPT branches plus the UNION-level step, all planted by `addBuildSubqueriesForMaterializedCTEsIfNeeded` -- the same plan was optimized N times. Optimize is idempotent, so this was a CPU regression rather than a correctness bug, but it scales with the number of UNION branches and the optimization itself walks the CTE plan tree (and recursively triggers no-op `buildSetInplace` calls on already-built sets). Claim `is_plan_optimized` atomically before invoking optimize, mirroring the existing `is_materialization_planned` claim in `makePlansForCTEs`. Addresses review comment on PR #105041: #105041 (comment) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…al shape The dedup commit (3a0728b + 6c08747) reduces redundant plan->optimize(...) invocations across DelayedMaterializingCTEsStep instances that share a MaterializedCTE pointer. With the dedup in place, mergeExpressions no longer fires the second time during pass 1 of resolveMaterializingCTEs and the two adjacent ExpressionSteps in the local Union branch stay unmerged -- matching the pre-PR shape this test originally expected. Reverts the cosmetic update introduced in 5b2a827 (which adjusted the reference for the merged shape that pass-1's redundant optimize calls used to produce). Functional results (count() = 3 twice) are unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per the project rule "When mentioning logical errors, say 'exception' instead of 'crash', because they don't crash the server in the release build." Pure comment edit, no behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The fix seems to be working; the only thing is that Bugfix validation (functional tests) is somehow broken (it seems it does not handle LOGICAL_ERROR without this fix properly). |
|
Bugfix validation check was broken in #104081. Details |
| /// `subqueries`, not as children of any node in this tree) are | ||
| /// untouched — they keep their safety-nets for their own | ||
| /// `buildSetInplace` / `buildOrderedSetInplace` consumers. | ||
| removeAllDelayedMaterializingCTEsStep(*plan); |
There was a problem hiding this comment.
This unconditional strip assumes the set plan is being attached under a SELECT plan that still has an outer DelayedMaterializingCTEsStep for every referenced CTE. That is not true for the other DelayedCreatingSetsStep callers: MutationsInterpreter::buildSubqueryPlansForSetsAndAdd builds IN-subquery plans with forceMaterializeCTE, but the mutation plan never adds an outer materialized-CTE wrapper.
For a shape like ALTER TABLE ... UPDATE ... WHERE x IN (WITH c AS MATERIALIZED (...) SELECT ... FROM c), this removes the only DelayedMaterializingCTEsStep; plan->optimize then attaches a runtime set plan whose ReadFromMemoryStorageStep has no writer/gate and will hit the new is_built fail-fast (or read an empty StorageMemory without it). Please only strip these safety nets when the caller has an enclosing materialization step, or teach the mutation/non-SELECT callers to add that outer CTE materialization before DelayedCreatingSetsStep.
|
|
||
| bool has_totals = pipe.getTotalsPort() != nullptr; | ||
| bool has_extremes = pipe.getExtremesPort() != nullptr; | ||
| auto dropping = std::make_shared<DroppingTransform>(header, pipe.numOutputPorts(), has_totals, has_extremes); |
There was a problem hiding this comment.
DroppingTransform is constructed with the main data header for every input, including totals/extremes. That loses the old dropTotals behavior, which created a NullSink from each aux port's own header. Pipe::addTransform explicitly permits totals to have a different header from the data stream because TotalsHavingTransform can output finalized totals while data is not finalized.
In that valid shape, connect(*totals_port, *totals_in) will now throw while trying to discard totals, so a set/CTE subquery with such totals cannot be materialized. Please give DroppingTransform separate headers for data, totals, and extremes (or build the aux inputs from pipe.getTotalsPort()->getSharedHeader() / pipe.getExtremesPort()->getSharedHeader()) while keeping only the data outputs on the main header.
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 176/201 (87.56%) | Lost baseline coverage: none · Uncovered code |
Cherry pick #105041 to 26.4: Fix LOGICAL_ERROR for reused materialized CTE filtered by IN over another materialized CTE
…TE filtered by IN over another materialized CTE
Cherry pick #105041 to 26.5: Fix LOGICAL_ERROR for reused materialized CTE filtered by IN over another materialized CTE
…TE filtered by IN over another materialized CTE

Motivation
Two related shapes of
LOGICAL_ERROR: Reading from materialized CTE 'X' before it has been materializedtriggered byenable_materialized_cte.Issue #101940:
Issue #102320:
EXPLAINon either query also threw — the bug fires during planoptimization, not just at execution time.
Root cause
The Planner installs two layers of
DelayedMaterializingCTEsStep:added by
addBuildSubqueriesForMaterializedCTEsIfNeededso the materializedCTEs run before the main pipeline at runtime.
forceMaterializeCTEso the synchronous inplace path(
FutureSetFromSubquery::buildSetInplace/buildOrderedSetInplace,reached via
KeyCondition::tryPrepareSetIndexForInorVirtualColumnUtils::buildSetsForDAG) can materialize a referenced CTEduring plan optimization.
makePlansForCTEsdid the atomic claim, theoptimizecall, and thestd::moveofmaterialized_cte->planin one step, andresolveMaterializingCTEsran beforeaddStepsToBuildSets. So in both bugshapes the outer step claimed the CTE and moved its plan out before the
inner safety-net inside the inplace pipeline could win the race —
the safety-net pass-throughs read an empty
StorageMemory.Fix
Three coordinated changes:
Split
resolveMaterializingCTEsinto two traversals. Pass 1(
DelayedMaterializingCTEsStep::optimizePlans) just optimizes each CTE'spre-built plan; pass 2 (the existing
addPlansForMaterializingCTEswalk)claims and attaches.
makePlansForCTEsno longer optimizes, only claimsand
std::moves. With this alone, the failing query from SEGV / LOGICAL_ERROR when materialized CTE is used in IN subquery of another materialized CTE referenced in a join #101940 isfixed because the safety-net inside
rs.plan->optimizeclaimsctduring pass 1, before main's pass 2 attempts the outer claim.
Run
addStepsToBuildSetsbeforeresolveMaterializingCTEsinQueryPlan::optimize. When the inplace path fires fromaddStepsToBuildSetson the main plan (LOGICAL_ERROR with materialized CTE #102320's shape, where the INtarget is a MergeTree column and the PK pruning fires during the
recursive
outer_IN.plan->optimize), the safety-net inside the nestedIN-subquery is now reached before the outer claim runs, again winning
the race.
Strip safety-net
DelayedMaterializingCTEsStepchains from set plansin
DelayedCreatingSetsStep::makePlansForSets. When a set plan isbeing attached for runtime construction (vs synchronous inplace
execution), its safety-nets are redundant — at runtime the outer
MaterializingCTEsStepmaterializes the CTE before theCreatingSetsStep's pipeline runs. Stripping the whole stacked chainprevents the recursive
plan->optimizefrom claiming the CTE in thewrong scope when the safety-net chain has multiple levels (one per
transitive CTE dependency). The strip handles
CreatingSetStepwrappingthe chain (added by
FutureSetFromSubquery::build).Test plan
04227_materialized_cte_reused_with_in_subquery.sqlcovers SEGV / LOGICAL_ERROR when materialized CTE is used in IN subquery of another materialized CTE referenced in a join #101940 inANY-LEFT-JOIN-on-PK, cross-join, non-PK IN (
VirtualColumnUtilspath),and EXPLAIN.
04228_materialized_cte_nested_in_subquery_pk.sqlcovers LOGICAL_ERROR with materialized CTE #102320 (CTEreferenced through a PK-pruned nested IN).
0380*/0392*/0400*/04036_…distributed_race/04041_…parallel_replicas/ etc.).04041_materialized_cte_parallel_replicas.referenceis updated: withthe new traversal timing,
mergeExpressionsnow merges two adjacentExpressionSteps in the CTE's local Union branch into one. Functionalresults unchanged.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fixed two
LOGICAL_ERROR: Reading from materialized CTE 'X' before it has been materializedshapes thrown by queries withenable_materialized_cte: (1) a reused materialized CTE filtered byIN (subquery)over another materialized CTE, and (2) a materialized CTE referenced both directly and inside aWHERE ... IN (...)filter that hits a MergeTree primary key through a nested IN-subquery.EXPLAINon the same queries was affected too because the bugs fired during plan optimization. Closes #101940, closes #102320.Documentation entry for user-facing changes
Version info
26.6.1.454