Fix join runtime filter crash on nested Variant/Dynamic key by groeneai · Pull Request #106931 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix join runtime filter crash on nested Variant/Dynamic key#106931

Merged
Avogar merged 5 commits into
ClickHouse:masterfrom
groeneai:fix-runtime-filter-nested-variant
Jun 16, 2026
Merged

Fix join runtime filter crash on nested Variant/Dynamic key#106931
Avogar merged 5 commits into
ClickHouse:masterfrom
groeneai:fix-runtime-filter-nested-variant

Conversation

@groeneai

@groeneai groeneai commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Fixes a server crash (LOGICAL_ERROR) when a JOIN key contains a Variant or Dynamic nested inside a Tuple, Array, or Map and exactly one distinct value is present on the right side of the join.

The join runtime filter decides whether the single-element "equals" lookup path is safe by checking if the key can hold a NULL. It used hasNullable() plus a top-level isVariant()/isDynamic() check. hasNullable() recurses into Array/Tuple/Map but does not treat Variant/Dynamic as null-bearing, and the isVariant()/isDynamic() checks only inspect the top-level type. So a nested Variant was not detected, the "equals" path was chosen, and equals over a Variant argument returns Nullable(UInt8), tripping the __applyFilter return-type assertion.

A new recursive hasTypeThatCanContainNulls() helper detects any null-capable type (Nullable, LowCardinality(Nullable), Variant, Dynamic) anywhere in the key type. hasNullable() is left unchanged on purpose because it is also used for MergeTree key validation, where Variant is rejected by a separate check.

Reproducer (crashed before, returns (true,1) now):

SET enable_analyzer = 1, enable_join_runtime_filters = 1;
SELECT * FROM
    (SELECT tuple(v, 1) AS k FROM format(TSV, 'v Variant(String, Bool)', 'true\nfalse')) AS t1
    JOIN
    (SELECT tuple(v, 1) AS k FROM format(TSV, 'v Variant(String, Bool)', 'true')) AS t2
    USING (k);

Closes #101415

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix a server crash (LOGICAL_ERROR) in the join runtime filter when the join key contains a Variant or Dynamic type nested inside a Tuple, Array, or Map and the right side of the join has a single distinct value.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Version info

  • Merged into: 26.6.1.864

A join key that contains a Variant or Dynamic nested inside a compound type
(Tuple/Array/Map) crashed with LOGICAL_ERROR 'Unexpected return type from
__applyFilter. Expected UInt8. Got Nullable(UInt8)' when exactly one distinct
value was present on the right side of the join.

The runtime filter computes whether the key can be NULL to decide whether the
single-element "equals" lookup path is safe. It used hasNullable() plus a
top-level isVariant()/isDynamic() check. hasNullable() recurses into
Array/Tuple/Map but does not treat Variant/Dynamic as null-bearing, and the
isVariant()/isDynamic() checks only inspect the top-level type, so a nested
Variant was not detected. The "equals" path was then chosen and produced a
Nullable(UInt8), tripping the __applyFilter return-type assertion.

Detect Variant/Dynamic anywhere in the key type via a new recursive
hasVariantOrDynamic() helper and use it in the runtime filter. hasNullable() is
intentionally left unchanged because it is also used for MergeTree key
validation, where Variant is rejected by a separate check.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

cc @vdimir @Avogar — could you review this? It fixes a join runtime filter crash (Unexpected return type from __applyFilter) when the join key has a Variant/Dynamic nested inside a Tuple/Array/Map: the single-element "equals" path was chosen because nested Variant was not recognized as null-bearing. Fix adds a recursive hasVariantOrDynamic() for the runtime filter's nullability check (hasNullable() left unchanged since it is shared with MergeTree key validation). Closes #101415.

@Algunenano Algunenano added the can be tested Allows running workflows for external contributors label Jun 10, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [0d70b12]

Summary:

job_name test_name status info comment
Stateless tests (arm_asan_ubsan, azure, sequential) FAIL
00991_system_parts_race_condition_long ERROR cidb IGNORED
Server died FAIL cidb, issue ISSUE EXISTS
Segmentation fault (STID: 0883-3e4b) FAIL cidb IGNORED

AI Review

Summary

This PR fixes the JOIN runtime-filter single-value path by making RuntimeFilterBase use hasTypeThatCanContainNulls for nested Nullable, LowCardinality(Nullable), Variant, and Dynamic components. The current code and 04327_runtime_filter_nested_variant coverage address the earlier Dynamic, LowCardinality(Nullable), LEFT ANTI JOIN, and Map key-position review points; I found no remaining code-level blockers or majors.

PR Metadata
  • 💡 The Bug Fix category is correct and a changelog entry is required and present, but the entry should say exception rather than crash for a LOGICAL_ERROR.
  • Suggested changelog entry:
Fix a `LOGICAL_ERROR` exception in the `JOIN` runtime filter when the join key contains a `Variant` or `Dynamic` type nested inside a `Tuple`, `Array`, or `Map` and the right side of the join has a single distinct value.
Final Verdict

Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Jun 10, 2026
Comment thread tests/queries/0_stateless/04327_runtime_filter_nested_variant.sql
The fix detects Variant or Dynamic nested in Tuple/Array/Map, but the test
only exercised nested Variant. A regression that stopped detecting nested
Dynamic in the one-distinct-value runtime filter path would have passed.

Add Tuple(Dynamic), Array(Dynamic) and Map(String, Dynamic) join keys under
allow_dynamic_type_in_join_keys, covering both the single-value "equals" path
and the multi-value set-lookup path. Verified the single-value Dynamic cases
abort with the same 'Unexpected return type from __applyFilter' logical error
before the fix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

Added nested-Dynamic regression cases to 04327_runtime_filter_nested_variant in e6b6796, covering Tuple(Dynamic), Array(Dynamic) and Map(String, Dynamic) join keys under allow_dynamic_type_in_join_keys = 1, with both the single-distinct-value "equals" path and the multi-value set-lookup path.

Verified both directions on a debug build: the single-value nested-Dynamic cases abort with Logical error: 'Unexpected return type from __applyFilter. Expected UInt8. Got Nullable(UInt8)' when RuntimeFilterLookup.h uses the old top-level-only isDynamic()/isVariant() check, and pass (20/20, plus all four join algorithms) with hasVariantOrDynamic(). Existing 04051_runtime_filter_variant_type still passes.

New HEAD: e6b6796.

Session: cron:clickhouse-worker-slot-0:20260610-101200

@Avogar Avogar self-assigned this Jun 10, 2026
Comment thread src/DataTypes/hasNullable.h Outdated
…ntainNulls

Per review feedback, replace the hasNullable() || hasVariantOrDynamic()
pair at the runtime-filter call site with a single recursive helper
hasTypeThatCanContainNulls() that reports any type which can hold a NULL:
Nullable, LowCardinality(Nullable), Variant, and Dynamic, at the top level
or nested inside Array/Tuple/Map.

The helper is built on the existing canContainNull() predicate (Nullable ||
LowCardinality(Nullable) || Variant || Dynamic), so it subsumes both former
checks: hasNullable()'s Nullable/LowCardinality(Nullable) coverage and the
nested Variant/Dynamic coverage. This also closes a gap for keys nesting
LowCardinality(Nullable) inside a compound type: equals() over
Tuple(LowCardinality(Nullable(...)), ...) returns Nullable(UInt8), so the
single-distinct-value "equals" runtime-filter path would otherwise be chosen
and trip the __applyFilter return-type assertion, exactly as for nested
Variant/Dynamic.

hasNullable() itself is left unchanged because it is also consumed by
MergeTree key validation and default-expression handling, where Variant /
Dynamic / LowCardinality semantics must not change.

Extends test 04327 with LowCardinality(Nullable) nested-key cases for the
one-distinct-value and set-lookup paths.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

# Question Answer
a Deterministic repro? Yes. With the old top-level-only check, clickhouse local --enable_analyzer=1 --enable_join_runtime_filters=1 on a single-distinct-value join with a Tuple(LowCardinality(Nullable(String)), UInt8) key aborts every time: Logical error: 'Unexpected return type from __applyFilter. Expected UInt8. Got Nullable(UInt8)'.
b Root cause explained? equals() over a key element that can hold a NULL returns Nullable(UInt8). The prior call site only OR-ed hasNullable() (Nullable / LowCardinality(Nullable)) with top-level-or-nested hasVariantOrDynamic(). A LowCardinality(Nullable(...)) nested inside a Tuple/Array/Map is not detected by either branch (hasNullable recurses but the equals over the compound key yields Nullable(UInt8) via the LC(Nullable) element), so argument_can_have_nulls is false, the one-distinct-value "equals" path is chosen, and __applyFilter asserts on the Nullable(UInt8) return.
c Fix matches root cause? Yes. The single recursive hasTypeThatCanContainNulls() (built on the existing canContainNull() = Nullable / LowCardinality(Nullable) / Variant / Dynamic) is now used for argument_can_have_nulls, so any null-capable type at any nesting depth takes the set-lookup path. It subsumes both former checks.
d Test intent preserved / new tests added? Yes. 04327 keeps all Variant and Dynamic nested cases and adds Tuple(LowCardinality(Nullable(String))) one-distinct-value and set-lookup cases.
e Both directions demonstrated? Yes. Old top-level-only check: full 04327 aborts (exit 134); the isolated LC(Nullable) one-path also aborts. With the fix: full 04327 matches the reference and passes under join_algorithm = hash / parallel_hash / grace_hash / full_sorting_merge.
f Fix is general, not a narrow patch? Yes. hasNullable() is left unchanged (it is shared with MergeTree key validation and default-expression handling, where Variant/Dynamic/LC semantics must not change); only the runtime-filter call site adopts the broader predicate, which is the sole consumer needing it.

Session id: cron:clickhouse-worker-slot-3:20260610-123700

@groeneai

Copy link
Copy Markdown
Contributor Author

@Avogar done in ff72865. Replaced the hasNullable() || hasVariantOrDynamic() pair with a single hasTypeThatCanContainNulls() that reports Nullable, LowCardinality(Nullable), Variant, and Dynamic at any nesting depth. It is built on the existing canContainNull() predicate, so it subsumes both former checks.

hasNullable() is left untouched since it is also used by MergeTree key validation and default-expression handling.

This also closes the LowCardinality(Nullable) gap you implied: equals() over Tuple(LowCardinality(Nullable(...)), ...) returns Nullable(UInt8), so the one-distinct-value path used to crash exactly like nested Variant/Dynamic. Added a Tuple(LowCardinality(Nullable(String))) one-path and set-lookup case to 04327; verified the new case aborts under the old check and passes with the fix.

Comment thread src/Processors/QueryPlan/RuntimeFilterLookup.h
@groeneai

Copy link
Copy Markdown
Contributor Author

CI status for ff72865: the only red check is Stateless tests (arm_asan_ubsan, azure, parallel), where 03634_autopr_input_bytes_estimation failed. That test is flaky independently of this PR: 30-day CIDB shows failures across 6 unrelated PRs and 8 sanitizer/arch variants (33 hits on one unrelated PR). This PR only changes runtime-filter null-capability detection, so the failure is not caused by this change. Finish Workflow and Mergeable Check are green.

@groeneai

Copy link
Copy Markdown
Contributor Author

CI fully finished on HEAD ff72865880088b04c45c28ec58cf9c0def0c9441 (Finish Workflow and Mergeable Check both green). One red per-test check, not PR-caused:

All other checks pass. Ready for review.

…apable keys

Covers the ExactNotContainsRuntimeFilter (LEFT ANTI JOIN) path for keys that
carry a NULL the top-level isNull bypass does not see: Variant/Dynamic, and
nested Nullable/LowCardinality(Nullable) inside Tuple/Array/Map.

These keys compare null-safely in the join (tuple(NULL,1) = tuple(NULL,1)
matches; a top-level Variant/Dynamic NULL also matches NULL), so a nested-NULL
key present on both sides is correctly dropped by the anti-join, which is what
the exclusion runtime filter with transform_null_in does. The existing
addNullBypassForAntiJoin therefore only needs to bypass top-level
Nullable/LowCardinality(Nullable), where NULL = NULL is false in the join.

Each case asserts the LEFT ANTI JOIN result with enable_join_runtime_filters=0
equals the result with it = 1, on the equals path and the set (negative lookup)
path, for Variant/Dynamic/LowCardinality(Nullable) nested keys with NULLs on
both sides.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

For the LEFT ANTI JOIN runtime-filter regression added in commit 62f84fe (new HEAD 62f84fe4ce22):

# Question Answer
a Deterministic repro? Yes. The bot's hypothesized wrong-result is NOT reproducible: on a local debug build with the runtime filter confirmed applied (EXPLAIN ... BuildRuntimeFilter), LEFT ANTI JOIN on nested null-capable keys returns identical rows with enable_join_runtime_filters = 0 and = 1, deterministically, across every shape tested.
b Root cause explained? Yes. ClickHouse join equality is null-SAFE for Variant/Dynamic (top-level NULL = NULL matches, INNER count 1) and for any nested NULL inside Tuple/Array/Map (tuple(NULL,1) = tuple(NULL,1) matches). It is null-UNsafe only for top-level Nullable/LowCardinality(Nullable) (NULL = NULL is false, count 0). The exclusion runtime filter uses transform_null_in = true, which treats NULL as a value, so it agrees with the join exactly where the join is null-safe. addNullBypassForAntiJoin therefore only needs the top-level Nullable / LC(Nullable) bypass, which it has. (isNull(tuple(...)) is always false, so the bypass cannot apply to nested keys regardless.)
c Fix matches root cause? Yes. No code change: the broadened hasTypeThatCanContainNulls only affects the equals-vs-Set choice (the original crash fix) and does not weaken anti-join correctness. The regression test locks in the ON==OFF invariant, which is the actual property in question. A code change to the bypass would be a speculative change for a discrepancy that does not reproduce.
d Test intent preserved / new tests added? New tests added. 6 self-validating LEFT ANTI JOIN cases appended to 04327_runtime_filter_nested_variant.sql, comparing full row multiset OFF vs ON for Tuple/Array/Map of Variant/Dynamic/LowCardinality(Nullable) with nested NULLs on both sides, on both the equals path and the set/negative-lookup path. The existing crash-regression cases are unchanged.
e Both directions demonstrated? Yes. Each new case asserts result(OFF) = result(ON) and returns 1. The original #101415 crash repro still returns (true,1) with the filter on (no crash). Test passes via clickhouse-test (35/35 ignoring an unrelated local zoneinfo/UTC tz-resolution infra error that also hits trivial unrelated tests).
f Fix is general, not a narrow patch? N/A (no source-code change). The investigation covered all sibling shapes (Tuple/Array/Map x Variant/Dynamic/LC(Nullable)), single and multi-key, one and both sides NULL, join_use_nulls = 1, and grace_hash/parallel_hash; no mismatch found in any.

Session id: cron:clickhouse-worker-slot-3:20260610-200000

Comment thread tests/queries/0_stateless/04327_runtime_filter_nested_variant.sql
hasTypeThatCanContainNulls() recurses into both the Map key type and the
value type (DataTypeMap::getKeyType || getValueType), but 04327 only
covered Map(String, Variant) / Map(String, Dynamic), exercising the value
branch. A Variant/Dynamic Map key (Map(Variant(...), String),
Map(Dynamic, String)) is a valid join key that reaches the same
runtime-filter path; the older value-only hasNullable() would report it
as not null-capable, so without the key-side recursion the
single-distinct-value "equals" path would be chosen and trip the
__applyFilter return-type assertion.

Add single-distinct-value equals-path cases and LEFT ANTI JOIN
runtime-filters off==on cases for Variant and Dynamic Map keys, so the
Map coverage exercises both key and value positions.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

Covers HEAD 0d70b12 (fixup over 62f84fe adding Map key-position coverage per the review comment).

# Question Answer
a Deterministic repro? Yes for the equals-path crash guard: Map(Variant(String,Bool), String) / Map(Dynamic, String) join keys with one distinct value on the right deterministically reach the __applyFilter runtime-filter path (confirmed via EXPLAIN actions=1: BuildRuntimeFilter + __applyFilter(...) over the Map key). The added LEFT ANTI cases are self-validating off==on comparisons, deterministic by construction.
b Root cause explained? hasTypeThatCanContainNulls() recurses into both Map key and value (`getKeyType()
c Fix matches root cause? This fixup is test-only. It adds the missing coverage for the key-side branch the helper already implements, so the regression guards both Map positions. No production code change (the helper was already correct; the gap was test coverage).
d Test intent preserved / new tests added? Preserved and extended. Four new self-validating cases added (Variant Map key + Dynamic Map key, each an equals-path crash guard and a LEFT ANTI off==on case). No existing case weakened.
e Both directions demonstrated? The equals-path cases exercise the exact path that asserted before the helper fix (value-only hasNullable() => not null-capable => equals path => Nullable(UInt8) => assertion). With the key-side recursion present they return {true:'x'}. 04327 passes 10/10 deterministically; original #101415 repro still (true,1), no crash.
f Fix is general, not a narrow patch? N/A for code (test-only). Coverage-wise: the new cases close the only untested branch of hasTypeThatCanContainNulls (Map key) so both Map positions and all compound shapes (Tuple/Array/Map, Variant/Dynamic/LowCardinality(Nullable)) are now exercised.

Session id: cron:clickhouse-worker-slot-21:20260610-221400

@clickhouse-gh

clickhouse-gh Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.70% 84.70% +0.00%
Functions 92.30% 92.30% +0.00%
Branches 77.30% 77.30% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 18/18 (100.00%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@groeneai

Copy link
Copy Markdown
Contributor Author

CI status — all failures triaged (commit 0d70b12)

CI has fully finished (Finish Workflow + Mergeable both green). The single red functional check is not caused by this PR (which touches src/DataTypes/hasNullable.{h,cpp} + src/Processors/QueryPlan/RuntimeFilterLookup.h + a runtime-filter test):

Check Failure Classification Disposition
Stateless tests (arm_asan_ubsan, azure, sequential) Server died / Segmentation fault STID 0883-3e4b — stack is __sanitizer::LargeMmapAllocator (ASan internal allocator), no ClickHouse frame Infrastructure — the documented 2026-06-10→11 arm_asan_ubsan, azure, sequential ASan-allocator-OOM outage that hit many unrelated PRs and master in the same window Not actionable on this PR; ASan-internal OOM, no CH code involved
CH Inc sync internal mirror-sync check Infra (not a test) n/a

All other 163 checks passed. Nothing in this PR's code path failed. Ready for review (cc @vdimir @Avogar — the hasTypeThatCanContainNulls generalization @Avogar requested is in this HEAD).

@Avogar

Avogar commented Jun 16, 2026

Copy link
Copy Markdown
Member

Stateless tests (arm_asan_ubsan, azure, sequential) - fixed in master

@Avogar Avogar added this pull request to the merge queue Jun 16, 2026
Merged via the queue into ClickHouse:master with commit f9c3eda Jun 16, 2026
164 of 166 checks passed
@clickgapai

Copy link
Copy Markdown
Contributor

Hi @groeneai @Avogar — while reviewing this PR I found the following:

Happy to discuss — close anything that's wrong or already addressed.

@groeneai

Copy link
Copy Markdown
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Runtime filter crash on Tuple(Variant(...), ...) join key — incomplete fix for nested Variant

5 participants