Fix Block structure mismatch in UnionStep/IntersectOrExceptStep when a branch constant-folds#107719
Conversation
|
cc @KochetovNicolai @davenger - const-fold half split out of #107041 per review. Makes the plan-time UnionStep/IntersectOrExceptStep header check tolerate a Const branch meeting a full one via getLeastSuperColumn, matching what updatePipeline already does at execution time. |
|
Workflow [PR], commit [cfe8828] Summary: ❌
AI ReviewSummary
Final Verdict
|
|
The fix looks too complex. Filter-push-down can fold the constant, but this should not break the logic. |
2441ef7 to
4493fad
Compare
|
@KochetovNicolai Simplified, thanks. Pushed The check now keeps Filter-push-down folding a constant in one branch is handled by that single rule: the column becomes full in the common header. No-divergence plans return the front header unchanged. Verified on a debug build: |
4493fad to
ffd0383
Compare
|
Good catch, confirmed. Fixed by keeping the check strict except for top-level
Verified: 04327 passes (20/20 with randomization); const/full UNION/INTERSECT/EXCEPT drop the const to full; all-const-agree keeps the front header unchanged; a top-level type difference is still rejected. Pushed in ffd0383. |
…a branch constant-folds A UNION/INTERSECT/EXCEPT whose branches differ only in their WHERE predicate threw `Block structure mismatch in UnionStep stream` (Code 49), aborting debug/sanitizer servers, when one branch's predicate constant-folded. Set-operation branches are optimized independently. Filter push-down through a join in one branch can constant-fold a projection column, leaving that branch's output column Const while the sibling branch stays full. The plan-time header check in UnionStep/IntersectOrExceptStep::updateOutputHeader used strict assertBlocksHaveEqualStructure and rejected the divergent constness, even though updatePipeline already tolerates exactly this at execution time. The check stays strict except for top-level Const. Each branch is compared to the front branch with only its top-level Const stripped (via convertToFullColumnIfConst, which leaves Sparse and Replicated wrappers untouched), then validated with strict assertBlocksHaveEqualStructure. Different types and divergent Sparse/Replicated representations remain rejected at plan time. The common header follows the getLeastSuperColumn contract: a column stays Const only when every branch is Const with the same value, otherwise it is materialized. Comparing only constness (not the value) would keep the front branch's Const when two branches fold to different constants, after which makeConvertingActions fails trying to convert one branch to the other branch's Const value. liftUpUnion assumed the union forwards each branch unchanged and rebuilt the UnionStep forcing every branch input header to the lifted step's full output. When checkHeaders materialized a divergent Const, the lifted union then declared a full input header for a branch still exposing a Const, moving the mismatch into the optimizer's own header validation. It now skips the rewrite when the union output differs from any branch input header. Closes ClickHouse#106956 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ffd0383 to
cfe8828
Compare
|
Fixed in cfe8828. Took route (b): the common-header build now keeps a column While validating the all- Repro that exercises the execution path (both branches keep rows, same column folds to SELECT a, c FROM (
SELECT t2.a AS a, (t1.b = t2.b) AS c
FROM (SELECT 1 AS a, 2 AS b) AS t1 INNER JOIN (SELECT 1 AS a, 2 AS b) AS t2 ON t1.a = t2.a WHERE t1.b = t2.b
UNION ALL
SELECT t2.a AS a, NOT (t1.b = t2.b) AS c
FROM (SELECT 1 AS a, 2 AS b) AS t1 INNER JOIN (SELECT 1 AS a, 2 AS b) AS t2 ON t1.a = t2.a WHERE t1.b = t2.b
) ORDER BY a, c;On master this aborts at the original session: cron:clickhouse-worker-slot-3:20260617-191100 |
|
@KochetovNicolai CI surfaced a regression from the current commit ( Repro (old analyzer only; pre-existing test SET enable_analyzer = 0;
SELECT c >= 0 FROM (SELECT randConstant() AS c FROM remote('127.0.0.{1,2}', numbers_mt(1)));
-- Code: 44. Cannot convert column `c` because it is non constant in source stream but must be constant in result.Root cause: The same-value rule should stay (dropping it brings back a silent wrong result for Two fix points, both with caveats:
Do you have a preference, or a third option I'm missing? |
|
@groeneai The failure to |
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 61/61 (100.00%) | Lost baseline coverage: none · Uncovered code |
|
@Algunenano Yes, that one is the known PR-caused regression from this commit ( Mechanism: The same-value materialize rule should stay (dropping it reintroduces a silent wrong result for Do you have a preference between the two? I lean to (2) as the more localized change, but it is your and KochetovNicolai's call. |
@KochetovNicolai Can you please have a look? I think breaking the old analyzer isn't ok, and adding more code there seems like an ok thing to avoid adding complexity to the code that should remain in the future. So I'd prefer option 2. WDTY? |

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fixed a
Block structure mismatch in UnionStep streamlogical error (server abort on debug/sanitizer builds,Code: 49on release builds) that occurred when sibling branches of aUNION/INTERSECT/EXCEPTdiffered only in theirWHEREpredicate and one branch's predicate constant-folded to aConstcolumn.Description
Reported in #106956. Split out of #107041 at the reviewer's request (the sparse half stays in #107041, this PR covers the constant-folding half only).
Reproduces deterministically with default settings, no tables needed:
Set-operation branches are optimized independently. When one branch's predicate constant-folds (here
NOT (t1.b = t2.b)over constant join inputs), filter push-down through the join leaves that branch's projection columnConstwhile the sibling keeps a full column. The plan-time header check inUnionStep::updateOutputHeader/IntersectOrExceptStep::updateOutputHeaderused strictassertBlocksHaveEqualStructureand rejected the divergent constness, even though the execution-time path (updatePipeline) already tolerates exactly this viamakeConvertingActions.The fix makes the plan-time check tolerant the same way: it requires matching column names and types but allows a
Constto meet a full column, and computes the common header withgetLeastSuperColumn(the primitive the planner already uses to build the union header) so a column staysConstonly when every branch agrees and is widened to a full column otherwise. When no branch constness diverges the front header is returned unchanged, so plans for ordinary set operations are byte-identical to before (no new plan nodes, no EXPLAIN changes).Added regression test
04327_union_branch_const_header_mismatch(coveringUNION ALL,UNION DISTINCT,SELECT DISTINCT, an explicitCAST(..., 'Nullable(UInt8)')projection,INTERSECT, andEXCEPT).