Fix logical error in disjunction filter push-down through JOIN with type-changing USING key#107407
Conversation
|
cc @vdimir @davenger — could you review? This fixes a logical error ( |
|
Workflow [PR], commit [6f21d80] Summary: ✅
AI ReviewSummaryThis PR fixes two PR Metadata
Findings
Final VerdictStatus: Minimum required action: update the changelog category and entry to cover both the disjunction-pushdown exception and the analyzer |
…ype-changing USING key The disjunction (partial predicate) push-down path built a FilterStep directly against the JOIN input header without the type-fixup that the main push-down path applies via fix_predicate_for_join_logical_step. When a USING key is widened by the JOIN (e.g. UInt32 vs Nullable(UInt32)), a pushed predicate over that column kept the JOIN-output result type while its arguments were re-evaluated against the narrower input type, tripping the result-type check in executeActionForPartialResult: "Unexpected return type from equals. Expected Nullable(UInt8). Got UInt8". Restrict the disjunction push-down to columns whose type is stable across the JOIN boundary via a require_stable_types flag on get_available_columns_for_filter. Partial push-down is an opportunistic broadening pre-filter (the full filter still runs above the JOIN), so skipping type-changing columns is safe; push-down for stable-type columns is unchanged. Found by the AST fuzzer (arm_asan_ubsan, STID 3262-3b6a). Same class as issue ClickHouse#89802. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
c04071a to
03a72c1
Compare
Pre-PR validation gate
Session id: cron:clickhouse-worker-slot-2:20260613-220600 |
scope.join_using_columns is a list of raw pointers to a stack-local map (join_using_column_name_to_column_node) that tryResolveIdentifierFromJoin pushes while recursing into one JOIN side and pops afterwards. The push/pop wrapped the recursive tryResolveIdentifierFromJoinTreeNode call without an exception guard, so when that recursion threw the pop was skipped. The throw is reachable from normal analysis: the if/multiIf constant-fold special cases in resolveFunction.cpp resolve statically-dead branches inside a try/catch and swallow the exception. If a dead branch references an unknown identifier (UNKNOWN_IDENTIFIER), the throw unwound past the pop, the stack-local map was destroyed, and its pointer was left dangling in scope.join_using_columns. A later tryBindIdentifierToJoinUsingColumn iterated the list and dereferenced the dangling pointer, segfaulting (NULL pointer read) in the analyzer. Make the pop run on every exit path with SCOPE_EXIT, and balance it against a captured flag so a push is always matched by exactly one pop. Found by the AST fuzzer (amd_debug, targeted, STID 5737-6ff0) on PR ClickHouse#107407, which mutated the new 04330 JOIN USING / EXPLAIN seed query into a form with a dead if/multiIf branch referencing a non-existent qualifier. Pre-existing bug, independent of the disjunction push-down change in this PR. Regression query added to 04330_join_disjunctions_pushdown_using_type_mismatch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pre-PR validation gate (analyzer NULL deref fix)
Session id: cron:clickhouse-worker-slot-0:20260613-234100 |
| ORDER BY l.x | ||
| FORMAT TSV; | ||
|
|
||
| -- Analyzer identifier-resolution crash (NULL deref) reachable from this query family. |
There was a problem hiding this comment.
This PR now includes a second user-visible fix covered by this regression block: a SIGSEGV in analyzer identifier resolution for JOIN ... USING. The PR metadata is still categorized as Bug Fix and the changelog entry only describes the disjunction-pushdown exception, so the release note will miss the critical analyzer fix. Please switch the category to Critical Bug Fix (crash, data loss, RBAC) and include both fixes in the changelog entry, e.g.:
Fix an exception in disjunction filter push-down over type-widened JOIN ... USING keys, and fix a SIGSEGV in analyzer identifier resolution for JOIN ... USING queries with statically-dead if/multiIf branches that reference unknown identifiers.
There was a problem hiding this comment.
Updated the changelog entry to cover both fixes (the disjunction push-down exception and the analyzer SIGSEGV in JOIN ... USING resolution).
On the category: I have kept Bug Fix rather than self-selecting Critical Bug Fix. Per our contribution policy I do not pick the Critical Bug Fix category on my own; a maintainer applies it. Both fixes here are pre-existing trunk crashes surfaced by the AST fuzzer (not regressions in a released version), so a maintainer is best placed to decide whether the crash class warrants Critical Bug Fix. Happy to switch immediately if a reviewer confirms.
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 26/26 (100.00%) | Lost baseline coverage: none · Uncovered code |
|
CI finished on
|
|
📊 Cloud Performance Report ✅ AI verdict: This is a correctness fix: it stops the join disjunction (partial-predicate) push-down from applying a predicate built for a type-widened join output to a non-widened input column, and adds a scope guard so JOIN identifier resolution doesn't leave a dangling pointer when a dead branch throws. The change only alters behavior for joins with type-changing USING keys. All 8 flagged ClickBench improvements are on join-free single-table queries the PR cannot touch, and the 2 TPC-H improvements are on intrinsically noisy queries whose stable join types never hit the narrow type-mismatch path. All flagged improvements were downgraded to not-sure as run-to-run/environment variance; note the source build ran only 6 iterations versus 30 on base. clickbenchFlagged queries (8 of 43)
q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on. tpch_adapted_1_officialFlagged queries (2 of 22)q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on. Debug info
|
Cherry pick #107407 to 26.4: Fix logical error in disjunction filter push-down through JOIN with type-changing USING key
…h-down through JOIN with type-changing USING key
Cherry pick #107407 to 26.5: Fix logical error in disjunction filter push-down through JOIN with type-changing USING key
…h-down through JOIN with type-changing USING key
Backport #107407 to 26.4: Fix logical error in disjunction filter push-down through JOIN with type-changing USING key
Backport #107407 to 26.5: Fix logical error in disjunction filter push-down through JOIN with type-changing USING key

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix a logical error (
Unexpected return type from equals. Expected Nullable(UInt8). Got UInt8) when the disjunction (partial predicate) push-down optimization pushes a condition over aUSINGkey whose type is widened by the JOIN. Also fix a server crash (segmentation fault) in the analyzer when resolving identifiers inJOIN ... USINGqueries that contain constant-foldableif/multiIfbranches referencing unknown identifiers.Description
Found by the AST fuzzer on PR #68978: report, check
AST fuzzer (arm_asan_ubsan), STID 3262-3b6a. The crash is a pre-existing trunk bug unrelated to that PR. Same class as the open fuzz issue #89802.Minimal reproducer:
Root cause: the
USING (x)key isUInt32on the left andNullable(UInt32)on the right, so the column is widened toNullablein the JOIN output. A predicate overxis built against that widened type (soequalsreturnsNullable(UInt8)). The main filter push-down path repairs such type changes forJoinStepLogicalviafix_predicate_for_join_logical_step, andget_available_columns_for_filtertherefore intentionally allows type-changing columns through. But the disjunction (partial predicate) push-down path (tryToExtractPartialPredicate+addFilterOnTop) reuses the same column lists and builds aFilterStepdirectly against the non-widened JOIN input header, with no type-fixup.ActionsDAG::updateHeaderthen re-evaluatesequalsasUInt8while the node still claimsNullable(UInt8), tripping the result-type check inexecuteActionForPartialResultand aborting on debug/sanitizer builds.Fix: restrict the disjunction push-down to columns whose type is stable across the JOIN boundary (input type equals JOIN-output type), via a new
require_stable_typesflag onget_available_columns_for_filter. Partial push-down is an opportunistic broadening pre-filter (the full filter still runs above the JOIN), so skipping type-changing columns is safe and only forgoes a pre-filter that cannot be correctly typed without the fixup the main path applies. Push-down for stable-type columns is unchanged.A regression test is added in
tests/queries/0_stateless/04330_join_disjunctions_pushdown_using_type_mismatch.sql. It fails (aborts) without the fix and passes with it, and also asserts the optimization still fires for a stable-type disjunction.Additional fix in this PR: analyzer NULL deref in JOIN USING identifier resolution
While fuzzing the new 04330 seed query, the
AST fuzzer (amd_debug, targeted)hit a separate, pre-existing crash: a NULL pointer read (SIGSEGV) inIdentifierResolver::tryBindIdentifierToJoinUsingColumn. STID 5737-6ff0, checkAST fuzzer (amd_debug, targeted): report.Root cause:
scope.join_using_columnsis a list of raw pointers to a stack-local map thattryResolveIdentifierFromJoinpushes while recursing into one JOIN side and pops afterwards. The push/pop wrapped the recursivetryResolveIdentifierFromJoinTreeNodecall without an exception guard. Theif/multiIfconstant-fold special cases inresolveFunction.cppresolve statically-dead branches inside atry/catchand swallow the exception; when a dead branch referenced an unknown identifier, the resultingUNKNOWN_IDENTIFIERunwound past the pop, the stack-local map was destroyed, and the dangling pointer was left inscope.join_using_columns. A latertryBindIdentifierToJoinUsingColumndereferenced it and crashed.Fix: run the pop on every exit path with
SCOPE_EXIT, balanced against a captured flag so each push is matched by exactly one pop. Pre-existing bug, independent of the push-down change above. A regression query is added to the same 04330 test (crashes without the fix, returns a result with it).Version info
26.6.1.96026.5.5.8,26.4.5.86