Fix LOGICAL_ERROR for Nothing-typed JOIN ON residual predicate#106981
Conversation
A non-equi JOIN ON predicate over a column of type Nothing (for example the column produced by ARRAY JOIN []) aborted the server with Logical error: Unexpected expression in JOIN ON section. Expected boolean (UInt8), got 'Nothing'. toBoolIfNeeded() in JoinStepLogical coerced a non-UInt8 residual condition to boolean by wrapping it in and(x, true). That works for every type except a non-nullable Nothing: and(Nothing, true) stays Nothing because Nothing is the bottom type. The residual mixed-join expression then had output type Nothing, which HashJoin::validateAdditionalFilterExpression rejects. Cast a Nothing-typed condition to UInt8 (Nullable(UInt8) when the input is nullable) instead. The column has no values, so the predicate matches no rows. The multi-condition path in concatConditions had the same gap and is now also routed through toBoolIfNeeded. Closes ClickHouse#78761 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
cc @vdimir @KochetovNicolai — could you review this? It fixes a LOGICAL_ERROR ("Expected boolean (UInt8), got 'Nothing'") in the logical join planner: |
|
Workflow [PR], commit [5b37fe2] Summary: ❌
AI ReviewSummaryThis PR fixes Missing context / blind spots
Final Verdict✅ No new findings. |
The test creates a local MergeTree, runs deterministic JOINs over an ARRAY JOIN [] Nothing-typed residual predicate, and checks stable row values. There is no parallel-replicas incompatibility: the queries produce identical correct results under parallel replicas (verified locally with cluster_for_parallel_replicas, the JOIN dispatched via ReadFromRemoteParallelReplicas, 15/15 runs across 2-6 replicas and both parallel_replicas_local_plan modes). Remove the unnecessary exclusion so the test gets parallel-replicas coverage. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The regression test for ClickHouse#78761 failed on the old-analyzer CI variant (Stateless tests amd_llvm_coverage, old analyzer, ...) with Code: 47. DB::Exception: Missing columns: 'a0' ... (UNKNOWN_IDENTIFIER) The old analyzer (TreeRewriter/ExpressionAnalyzer) cannot resolve an ARRAY JOIN alias used inside a JOIN ON clause, so the query errors out before reaching the planner. The LOGICAL_ERROR this test guards against, and its fix in JoinStepLogical::toBoolIfNeeded, only exist in the new analyzer's planner. Pin enable_analyzer = 1 so the test exercises the fixed code path on every CI variant instead of erroring on the old one. Verified locally on a debug build: with the old analyzer forced (--client-option enable_analyzer=0) the test fails without this line and passes with it; 20/20 runs pass under the previously red configuration. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pre-PR validation gate (follow-up fixup
|
| # | Question | Answer |
|---|---|---|
| a | Deterministic repro? | Yes. On a debug build, run the test through clickhouse-test with the session forced to the old analyzer (--client-option enable_analyzer=0): the test fails 100% with Code: 47. UNKNOWN_IDENTIFIER: Missing columns: 'a0' — byte-for-byte the CI red. Equivalently, clickhouse-client --query "SELECT 1 FROM t tx ARRAY JOIN [] AS a0 LEFT JOIN t ON t.c0 = a0 AND t.c0 <> a0 SETTINGS enable_analyzer = 0" errors on demand. |
| b | Root cause explained? | The old analyzer (TreeRewriter/ExpressionAnalyzer) cannot resolve an ARRAY JOIN alias (a0) referenced inside a JOIN ON clause, so the query aborts in collectUsedColumns before the planner ever runs. The LOGICAL_ERROR this test guards against, and its fix in JoinStepLogical::toBoolIfNeeded, exist only in the new analyzer's planner. So on the old-analyzer CI variant the test query errored at name-resolution time rather than exercising the fix. |
| c | Fix matches root cause? | Yes. The fix pins SET enable_analyzer = 1; at the top of the test, so it always runs on the analyzer that contains the bug and the fix, on every CI variant — including the old-analyzer job. This is the dominant idiom in the suite (~700 stateless tests use it). No production code changed for this; the C++ fix is unchanged. |
| d | Test intent preserved / new tests added? | Preserved. The same six query shapes (single/multi/mixed residual, INNER, ConcurrentHashJoin, and the non-empty-array correctness check) still run and still assert the same results. Pinning the analyzer does not weaken what the test checks — it ensures the test reaches the guarded code path. |
| e | Both directions demonstrated? | Yes, through the actual clickhouse-test runner under the previously-red enable_analyzer=0 variant: pre-fix (no SET) → [FAIL] with Code: 47 UNKNOWN_IDENTIFIER: Missing columns: 'a0'; with-fix → [OK], 20/20 runs pass. The only failures under full randomization were Code: 36 Invalid time zone: zoneinfo/UTC, proven environmental (an unrelated control test 00001_select_1 reproduces the identical tz error at the same rate locally); zero JOIN/Nothing/UInt8 errors and zero output diffs. |
| f | Fix is general, not a narrow patch? | N/A for this follow-up (test-only analyzer pin). The underlying C++ fix in this PR already covers both affected code paths in JoinStepLogical — toBoolIfNeeded (single residual) and concatConditions (multi residual) — and preserves nullability via Nullable(UInt8) when the input is nullable. |
Session id: cron:clickhouse-worker-slot-18:20260610-203300
|
CI finished for HEAD
The PR's own regression test |
CI finish ledger — d441299Every failure below has an owner: a fixing PR (ours or external). Only
This PR (Nothing-typed JOIN ON residual predicate fix) is approved by @ vdimir. The earlier Session id: cron:our-pr-ci-monitor:20260630-220000 |
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered: 26/26 (100.00%) · Uncovered code |
c8de357

A non-equi JOIN ON predicate over a
Nothing-typed column (e.g. fromARRAY JOIN []) aborted the server with a LOGICAL_ERROR:toBoolIfNeeded()coerced a non-UInt8residual condition to boolean by wrapping it inand(x, true). That works for every type except a non-nullableNothing:and(Nothing, true)staysNothing(the bottom type), so the residual mixed-join expression had output typeNothing, whichHashJoin::validateAdditionalFilterExpressionrejects. This fuzzer-found bug exists since v24.5.Fix: cast a
Nothing-typed condition toUInt8(Nullable(UInt8)when nullable). The column has no values, so the predicate matches no rows. The multi-condition path inconcatConditionshad the same gap and is now also routed throughtoBoolIfNeeded.Reproducer (from the issue):
Closes #78761
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fix a
LOGICAL_ERROR("Unexpected expression in JOIN ON section. Expected boolean (UInt8), got 'Nothing'") when a non-equiJOIN ... ONpredicate references aNothing-typed column, such as one produced byARRAY JOIN [].Version info
26.7.1.432