Fix LOGICAL_ERROR for Nothing-typed JOIN ON residual predicate by groeneai · Pull Request #106981 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix LOGICAL_ERROR for Nothing-typed JOIN ON residual predicate#106981

Merged
vdimir merged 6 commits into
ClickHouse:masterfrom
groeneai:fix-join-on-nothing-typed-residual
Jul 2, 2026
Merged

Fix LOGICAL_ERROR for Nothing-typed JOIN ON residual predicate#106981
vdimir merged 6 commits into
ClickHouse:masterfrom
groeneai:fix-join-on-nothing-typed-residual

Conversation

@groeneai

@groeneai groeneai commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

A non-equi JOIN ON predicate over a Nothing-typed column (e.g. from ARRAY JOIN []) aborted the server with a LOGICAL_ERROR:

Logical error: Unexpected expression in JOIN ON section. Expected boolean (UInt8), got 'Nothing'.

toBoolIfNeeded() coerced a non-UInt8 residual condition to boolean by wrapping it in and(x, true). That works for every type except a non-nullable Nothing: and(Nothing, true) stays Nothing (the bottom type), so the residual mixed-join expression had output type Nothing, which HashJoin::validateAdditionalFilterExpression rejects. This fuzzer-found bug exists since v24.5.

Fix: cast a Nothing-typed condition to UInt8 (Nullable(UInt8) when nullable). The column has no values, so the predicate matches no rows. The multi-condition path in concatConditions had the same gap and is now also routed through toBoolIfNeeded.

Reproducer (from the issue):

SELECT 1 FROM t1 tx ARRAY JOIN [] AS a0 LEFT JOIN t1 ON t1.c0 = a0 AND t1.c0 <> a0;

Closes #78761

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fix a LOGICAL_ERROR ("Unexpected expression in JOIN ON section. Expected boolean (UInt8), got 'Nothing'") when a non-equi JOIN ... ON predicate references a Nothing-typed column, such as one produced by ARRAY JOIN [].

Version info

  • Merged into: 26.7.1.432

A non-equi JOIN ON predicate over a column of type Nothing (for example
the column produced by ARRAY JOIN []) aborted the server with

  Logical error: Unexpected expression in JOIN ON section.
  Expected boolean (UInt8), got 'Nothing'.

toBoolIfNeeded() in JoinStepLogical coerced a non-UInt8 residual condition
to boolean by wrapping it in and(x, true). That works for every type except
a non-nullable Nothing: and(Nothing, true) stays Nothing because Nothing is
the bottom type. The residual mixed-join expression then had output type
Nothing, which HashJoin::validateAdditionalFilterExpression rejects.

Cast a Nothing-typed condition to UInt8 (Nullable(UInt8) when the input is
nullable) instead. The column has no values, so the predicate matches no rows.
The multi-condition path in concatConditions had the same gap and is now also
routed through toBoolIfNeeded.

Closes ClickHouse#78761

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

cc @vdimir @KochetovNicolai — could you review this? It fixes a LOGICAL_ERROR ("Expected boolean (UInt8), got 'Nothing'") in the logical join planner: toBoolIfNeeded coerced a non-equi JOIN ON residual predicate via and(x, true), which leaves a non-nullable Nothing-typed predicate (e.g. from ARRAY JOIN []) still typed Nothing; the fix casts it to a boolean instead. Closes #78761.

@nikitamikhaylov nikitamikhaylov added the can be tested Allows running workflows for external contributors label Jun 10, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [5b37fe2]

Summary:

job_name test_name status info comment
AST fuzzer (amd_debug, targeted) FAIL
Logical error: 'or_argument_nodes.size() > 1' (STID: 2508-3fad) FAIL cidb

AI Review

Summary

This PR fixes JOIN ... ON residual predicate coercion for Nothing-typed expressions by casting Nothing / Nullable(Nothing) to UInt8 / Nullable(UInt8) in toBoolIfNeeded, and by routing multi-condition conjunctions through the same helper. I did not find any remaining correctness issues in the current head: the planner change matches HashJoin::validateAdditionalFilterExpression's UInt8 contract, and the regression test now pins enable_analyzer = 1 so it reaches the fixed planner path instead of failing earlier in the old analyzer.

Missing context / blind spots
  • ⚠️ I did not rerun the reproducer locally because this checkout has no build directory or clickhouse binary; confidence here is based on code inspection plus the PR's existing CI and coverage evidence.
Final Verdict

✅ No new findings.

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Jun 10, 2026
Comment thread tests/queries/0_stateless/04327_join_on_nothing_typed_residual.sql Outdated
groeneai and others added 2 commits June 10, 2026 20:34
The test creates a local MergeTree, runs deterministic JOINs over an
ARRAY JOIN [] Nothing-typed residual predicate, and checks stable row
values. There is no parallel-replicas incompatibility: the queries
produce identical correct results under parallel replicas (verified
locally with cluster_for_parallel_replicas, the JOIN dispatched via
ReadFromRemoteParallelReplicas, 15/15 runs across 2-6 replicas and both
parallel_replicas_local_plan modes). Remove the unnecessary exclusion so
the test gets parallel-replicas coverage.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The regression test for ClickHouse#78761 failed on the old-analyzer CI variant
(Stateless tests amd_llvm_coverage, old analyzer, ...) with

  Code: 47. DB::Exception: Missing columns: 'a0' ... (UNKNOWN_IDENTIFIER)

The old analyzer (TreeRewriter/ExpressionAnalyzer) cannot resolve an
ARRAY JOIN alias used inside a JOIN ON clause, so the query errors out
before reaching the planner. The LOGICAL_ERROR this test guards against,
and its fix in JoinStepLogical::toBoolIfNeeded, only exist in the new
analyzer's planner. Pin enable_analyzer = 1 so the test exercises the
fixed code path on every CI variant instead of erroring on the old one.

Verified locally on a debug build: with the old analyzer forced
(--client-option enable_analyzer=0) the test fails without this line and
passes with it; 20/20 runs pass under the previously red configuration.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate (follow-up fixup a019401b625d)

This addresses the only PR-caused CI failure on the prior HEAD: the new regression test 04327_join_on_nothing_typed_residual failed on the Stateless tests (amd_llvm_coverage, old analyzer, s3 storage, DatabaseReplicated, WasmEdge, parallel) variant.

# Question Answer
a Deterministic repro? Yes. On a debug build, run the test through clickhouse-test with the session forced to the old analyzer (--client-option enable_analyzer=0): the test fails 100% with Code: 47. UNKNOWN_IDENTIFIER: Missing columns: 'a0' — byte-for-byte the CI red. Equivalently, clickhouse-client --query "SELECT 1 FROM t tx ARRAY JOIN [] AS a0 LEFT JOIN t ON t.c0 = a0 AND t.c0 <> a0 SETTINGS enable_analyzer = 0" errors on demand.
b Root cause explained? The old analyzer (TreeRewriter/ExpressionAnalyzer) cannot resolve an ARRAY JOIN alias (a0) referenced inside a JOIN ON clause, so the query aborts in collectUsedColumns before the planner ever runs. The LOGICAL_ERROR this test guards against, and its fix in JoinStepLogical::toBoolIfNeeded, exist only in the new analyzer's planner. So on the old-analyzer CI variant the test query errored at name-resolution time rather than exercising the fix.
c Fix matches root cause? Yes. The fix pins SET enable_analyzer = 1; at the top of the test, so it always runs on the analyzer that contains the bug and the fix, on every CI variant — including the old-analyzer job. This is the dominant idiom in the suite (~700 stateless tests use it). No production code changed for this; the C++ fix is unchanged.
d Test intent preserved / new tests added? Preserved. The same six query shapes (single/multi/mixed residual, INNER, ConcurrentHashJoin, and the non-empty-array correctness check) still run and still assert the same results. Pinning the analyzer does not weaken what the test checks — it ensures the test reaches the guarded code path.
e Both directions demonstrated? Yes, through the actual clickhouse-test runner under the previously-red enable_analyzer=0 variant: pre-fix (no SET) → [FAIL] with Code: 47 UNKNOWN_IDENTIFIER: Missing columns: 'a0'; with-fix → [OK], 20/20 runs pass. The only failures under full randomization were Code: 36 Invalid time zone: zoneinfo/UTC, proven environmental (an unrelated control test 00001_select_1 reproduces the identical tz error at the same rate locally); zero JOIN/Nothing/UInt8 errors and zero output diffs.
f Fix is general, not a narrow patch? N/A for this follow-up (test-only analyzer pin). The underlying C++ fix in this PR already covers both affected code paths in JoinStepLogicaltoBoolIfNeeded (single residual) and concatConditions (multi residual) — and preserves nullability via Nullable(UInt8) when the input is nullable.

Session id: cron:clickhouse-worker-slot-18:20260610-203300

@groeneai

Copy link
Copy Markdown
Contributor Author

CI finished for HEAD a019401b625d. No PR-caused failures. Remaining reds are both unrelated to this change:

  • AST fuzzer (amd_debug, targeted) / Logical error 'Column identifier ... is already registered' (STID 4697-4326): chronic trunk AST-fuzzer bug that hits many unrelated PRs and master; tracked separately.
  • Stateless tests (arm_asan_ubsan, azure, sequential) Server died / Segfault: known 2026-06-10/11 Azure ASan-allocator OOM infra outage (95+ unrelated PRs in 24h; stack has no ClickHouse frames).

The PR's own regression test 04327_join_on_nothing_typed_residual passes on every variant. Ready for review.

@vdimir vdimir self-assigned this Jun 29, 2026
@groeneai

Copy link
Copy Markdown
Contributor Author

CI finish ledger — d441299

Every failure below has an owner: a fixing PR (ours or external). Only CH Inc sync is exempt.

Check / test Reason Owner / fixing PR
AST fuzzer (amd_debug, targeted) / Logical error or_argument_nodes.size() > 1 (STID 2508-3fad) logical error (trunk, analyzer) #107113 (ours, open)
AST fuzzer (arm_asan_ubsan) / Block structure mismatch in A stream (STID 0993-27f0) logical error (trunk, UnionStep) #107719 (ours, open)
Stress test (amd_tsan) / Hung check failed, possible deadlock deadlock (chronic trunk) #108212 (ours, merged) / #105905 (ours, open)
Stress test (arm_tsan) / Hung check failed, possible deadlock deadlock (chronic trunk) #108212 (ours, merged) / #105905 (ours, open)
CH Inc sync private fork-sync mirror CH Inc sync (private, not actionable)

This PR (Nothing-typed JOIN ON residual predicate fix) is approved by @ vdimir. The earlier 02354_vector_search_rescoring failures cleared after merging master (picked up #108812). None of the above are caused by this PR's diff.

Session id: cron:our-pr-ci-monitor:20260630-220000

@clickhouse-gh

clickhouse-gh Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.40% 85.40% +0.00%
Functions 92.60% 92.60% +0.00%
Branches 77.60% 77.60% +0.00%

Changed lines: Changed C/C++ lines covered: 26/26 (100.00%) · Uncovered code

Full report · Diff report

@vdimir vdimir added this pull request to the merge queue Jul 2, 2026
Merged via the queue into ClickHouse:master with commit c8de357 Jul 2, 2026
172 of 174 checks passed
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logical Error: Unexpected expression in JOIN ON section. Expected boolean (UInt8), got 'Nothing'

4 participants