Treat float NaN keys as NULL on JOIN ON#106540
Conversation
Per IEEE 754 / SQL JOIN ON semantics, NaN never compares equal to NaN, but ClickHouse's hash table compared float keys bitwise via bitEquals (memcmp). Two NaN values with matching bit patterns therefore wrongly joined - INNER JOIN ON NaN = NaN returned rows instead of zero, and ANTI JOIN dropped the NaN-keyed left row instead of keeping it. Fold NaN rows into the JOIN-key null map so they are skipped on both build and probe sides, mirroring how NULL keys behave. Done at three layers: * HashJoin / ConcurrentHashJoin: extend the build- and probe-side null_map after extractNestedColumnsAndNullMap via a new helper JoinCommon::extendJoinKeyNullMapWithFloatNaNs. * FullSortingMergeJoin: in FullMergeJoinCursor::setChunk, OR NaN positions into the per-key null_map so nullableCompareAt short-circuits. * BuildRuntimeFilterTransform: drop NaN rows before they enter the bloom/exact runtime filter, so the runtime filter never carries a NaN entry that would mis-prune a probe-side NaN. GROUP BY / DISTINCT / uniq / IN keep their existing NaN-coalescing behaviour - this changes only the JOIN ON path. Closes: ClickHouse#106531
|
cc @KochetovNicolai @vdimir — could you review this? It fixes wrong results for INNER/LEFT/RIGHT/FULL/SEMI/ANTI JOIN ON with float keys: the hash table compared NaN keys bitwise via |
| DROP TABLE IF EXISTS l; | ||
| DROP TABLE IF EXISTS r; | ||
|
|
||
| CREATE TABLE l (k Float64, v String) ENGINE = Memory(); |
There was a problem hiding this comment.
@groeneai can you add test examples with BFloat16 as well?
There was a problem hiding this comment.
Added in d51a467.
The fix already dispatched on BFloat16 in all three NaN-detection paths (extendJoinKeyNullMapWithFloatNaNs in JoinUtils.cpp, foldFloatNaNsIntoNullMap in MergeJoinTransform.cpp, filterOutNaNs in BuildRuntimeFilterTransform.cpp); the new test block exercises that dispatch end-to-end. New cases mirror the Float32 grid: hash INNER/LEFT/ANTI and full_sorting_merge INNER over BFloat16 keys with nan / -nan mixed with a finite value.
Local verification (build-asan, --no-random-settings --test-runs 30): 30/30 pass.
Session: cron:clickhouse-ci-task-worker:20260605-144800
Per @PedroTadim review on PR ClickHouse#106540: cover BFloat16 in addition to Float32 / Float64 / Nullable(Float64). The fix already dispatches on BFloat16 in all three NaN-detection paths (extendJoinKeyNullMapWithFloatNaNs, foldFloatNaNsIntoNullMap, filterOutNaNs); this commit just exercises that dispatch end-to-end. The new block covers BFloat16 keys for hash INNER / LEFT / ANTI and full_sorting_merge INNER, mirroring the Float32 grid.
Pre-PR validation gate
Session id: cron:clickhouse-ci-task-worker:20260605-144800 |
|
Workflow [PR], commit [deba70c] Summary: ✅
AI ReviewSummaryThis PR changes float-key PR Metadata
Findings💡 Nits
Final VerdictStatus: Minimum required action: update the PR description and changelog entry to match the current |
CI Ledger — covered HEAD: `d51a4676af70`Pulled the failure set after CI fully completed (last check 20:44:02Z, >20min CIDB ingestion buffer met).
`Mergeable Check` reports failed because of the perf job. `Finish Workflow` SUCCESS. All 100+ stateless / integration / stress / fuzzer / unit-test jobs pass on this HEAD. Separately, the AI Review on this run flagged a Major in `BuildRuntimeFilterTransform.cpp:157` — `filterOutNaNs` only handles top-level `ColumnVector(Float)` and would miss `Nullable(Float)` / `LowCardinality(Nullable(Float))` / `Tuple` runtime-filter keys, leaving an `ANTI JOIN` mis-prune window. This is review-feedback rather than a CI test failure; it will be addressed in a fix-up commit. Session: cron:our-pr-ci-monitor:20260605-210000 |
The previous helper only recognised a top-level `ColumnFloat32`/`ColumnFloat64`/ `ColumnVector<BFloat16>`. Runtime filters can be built over `Nullable(Float*)`, `LowCardinality(Nullable(Float*))`, or - for multi-key `LEFT ANTI` - a temporary `Tuple` of common types, so a probe-side `NaN` was still pre-pruned by `ExactNotContainsRuntimeFilter` before the join saw the row. Recursively traverse those wrappers and OR-mark every row whose float payload is `NaN`, then filter the original (still-wrapped) column once. Already-NULL rows do not need to be dropped: the runtime filter stores `NULL` natively (`argument_can_have_nulls`), so there is no bitwise match for them on the probe side. `isNaN` is templated and works on `Float32`, `Float64` and `BFloat16`, so the fix covers all three element types under any of the three wrappers. Test additions in `04318_106531_join_on_nan_keys.sql` cover, with `enable_join_runtime_filters = 1` and `join_algorithm = 'hash'`: * `Nullable(Float64)` `ANTI`/`INNER` * `LowCardinality(Nullable(Float64))` `ANTI` * `Tuple(Float64, Float64)` `LEFT ANTI` (multi-key path)
Pre-PR validation gate (commit 1b3d5dc)
Session id: cron:clickhouse-ci-task-worker:20260605-235700 |
CI Ledger — covered HEAD:
|
| job | classification | disposition |
|---|---|---|
Build (arm_tidy) — cppcoreguidelines-init-variables on src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp:731,733 (last_version, compression_method) |
trunk regression, NOT caused by this PR | Introduced by master commit 9c9b2b9a0da7 (merge of #106102, "catalogs with schema evolutions + concurrent checks", merged 2026-06-05T17:43:31Z). Hit on master and 21+ unrelated concurrent PRs in the 18:04Z–21:25Z window. Trivial 2-line init fix already filed by us as #106615 (APPROVED by @ alexey-milovidov 23:35Z, awaiting CI). 155 downstream jobs are SKIPPED because they depend on arm_tidy. Not actionable on this PR's branch. |
posted-comment:
All other checks for HEAD 1b3d5dc0bf15 passed or skipped-due-to-arm_tidy. The ledger covers the full set; nothing on this PR's diff is failing.
`extendJoinKeyNullMapWithFloatNaNs` (HashJoin / ConcurrentHashJoin path) and `foldFloatNaNsIntoNullMap` (full_sorting_merge path) only dispatched on top-level `ColumnFloat32` / `ColumnFloat64` / `ColumnVector<BFloat16>`. A `ColumnTuple` key fell through, so `tuple(NaN) = tuple(NaN)` (whether planner-generated or written explicitly as `ON tuple(l.k) = tuple(r.k)`) still produced a wrong match because the hash table or merge cursor compared the tuple payload bitwise. Mirror the recursion landed at 1b3d5dc in `BuildRuntimeFilterTransform::filterOutNaNs`: unwrap `Nullable` (respect existing null map for the element-Nullable case), unwrap `LowCardinality` via `convertToFullColumnIfLowCardinality`, and recurse into `ColumnTuple` (a row is dropped if ANY float element is `NaN`, matching scalar tuple equality semantics: `(a, b) = (c, d)` iff `a = c AND b = d`, with `NaN != NaN`). `extendJoinKeyNullMapWithFloatNaNs` first runs a type-only walk (`containsFloatPayload`) so a fresh null map is still allocated lazily; purely integer/string/date keys keep the previous fast path. Tuple-keyed `JOIN ON` test cases added to `04318_106531_join_on_nan_keys.sql` for `hash` and `full_sorting_merge`, covering plain `tuple(Float64)`, multi-element `tuple(Float64, Float64)` where only one element is `NaN`, `tuple(Nullable(Float64))` (preserves NULL-safe tuple equality), and `tuple(LowCardinality(Nullable(Float64)))`. Demonstrated both directions on a debug build: pre-fix (HEAD~) the tuple-keyed `INNER JOIN` returns the matched row plus the wrongly-joined NaN rows; post-fix it returns only the matched row. ANTI returns the NaN-keyed rows after the fix. Closes: ClickHouse#106531
Pre-PR validation gateFollowing commit a08706c (
Session id: cron:clickhouse-ci-task-worker:20260606-043300 |
The HashJoin / FullSortingMergeJoin / BuildRuntimeFilter NaN-as-null treatment did not extend to `SET join_algorithm = 'direct'`. The MergeTree direct path builds the right-side lookup with `IN` (which intentionally keeps coalescing NaN per this PR's design) and then maps found rows back to probe keys via `ColumnsHashing::HashMethodHashed`, which compares the key payload bitwise. Two `NaN` keys with identical bit patterns therefore matched for `INNER` / `LEFT SEMI` and were suppressed for `LEFT ANTI`. Force every probe row whose key holds a `NaN` to the not-found path (sentinel selector + `null_map = false`) inside `getByKeys`, mirroring `JoinCommon::extendJoinKeyNullMapWithFloatNaNs`. The recursion walks `Nullable` / `LowCardinality` / `Tuple` so the same shapes covered on the hash path work here too. `joinKeyContainsFloatPayload` and `markFloatNaNRowsAsNull` are exposed in `JoinCommon` so the helper is shared with the hash and merge paths. New tests in `04318_106531_join_on_nan_keys.sql` cover `SET join_algorithm = 'direct'` over MergeTree-keyed `Float64` and `Nullable(Float64)` for `INNER` / `LEFT` / `LEFT SEMI` / `LEFT ANTI`. Address-requested in clickhouse-gh[bot] inline review on commit a08706c (PR ClickHouse#106540) at 2026-06-06T05:03:46Z. Closes: ClickHouse#106531
Pre-PR validation gate
Session id: cron:clickhouse-ci-task-worker:20260606-065700 |
…umnConst Addresses 5 follow-up review items on PR ClickHouse#106540 (issue ClickHouse#106531): 1. arm_tidy parameter-name mismatch: align JoinUtils.h declaration with .cpp definition (mutable_null_map). Fixes the readability-inconsistent-declaration-parameter-name warning that broke the arm_tidy build on commit c183aff. 2. ColumnNullable branch must respect the outer null map. A ColumnNullable may carry arbitrary trash bytes in its nested column for rows where the null map says NULL. If those bytes happen to be a NaN bit pattern, the previous markFloatNaNRowsAsNull marked the whole row in the join null map and broke null-safe tuple equality (tuple(NULL) = tuple(NULL) wrongly became a non-match). The recursion now threads is_null down so an already-NULL row is left alone. 3. MergeJoinTransform foldFloatNaNsIntoNullMap and BuildRuntimeFilterTransform filterOutNaNs gain a type-only joinKeyContainsFloatPayload guard. Most full-sorting-merge and runtime-filter keys are integer/string/date and never carry a NaN, so we now skip the per-chunk ColumnUInt8/Filter allocation in those cases. 4. Defence-in-depth ColumnConst handling in joinKeyContainsFloatPayload and markFloatNaNRowsAsNull (and the parallel helpers in MergeJoinTransform and BuildRuntimeFilterTransform). Direct joins pass the original probe key column without convertToFullColumnIfConst, so a constant NaN can reach this helper as ColumnConst(ColumnFloat*(NaN), N). Marking all rows when the underlying scalar is NaN keeps the constant-probe semantics aligned with the materialised path. Test: 04318_106531_join_on_nan_keys gains a Tuple(Nullable(Float64)) null-safe-match case where NULL slots carry a NaN trash payload. Pre-fix this returned 0 (BUG); post-fix it returns 1.
Pre-PR validation gate (commit 25b677f)
Session id: cron:clickhouse-ci-task-worker:20260606-084500 |
Address three follow-up CRs from clickhouse-gh[bot] on commit 25b677f (PR ClickHouse#106540, NaN-as-null treatment for JOIN ON): CR 1 — `ColumnSparse` was not unwrapped in `joinKeyContainsFloatPayload` / `markFloatNaNRowsAsNullImpl` (`JoinUtils.cpp`), `markNaNRows` (`MergeJoinTransform.cpp`), and `markNaNRowsImpl` (`BuildRuntimeFilterTransform.cpp`). HashJoin and full_sorting_merge materialise sparse columns upstream, but direct joins and runtime-filter builds can see them raw. Added a `column.isSparse()` branch in each helper that calls `convertToFullColumnIfSparse` and recurses on the dense form. CR 2 — The `ColumnConst` branch in `markFloatNaNRowsAsNullImpl` recursed into the one-row data column while keeping the full-size `mutable_null_map`, so the leaf branch's `chassert(data.size() == mutable_null_map.size())` would trip in debug/ASan builds. Rewrote the branch to evaluate the scalar into a one-row `scalar_mask`, then spread the mark across all rows not excluded by `is_null`. Mirrors the pattern already used in `BuildRuntimeFilterTransform.cpp::markNaNRowsImpl`. Same fix applied to `MergeJoinTransform.cpp::markNaNRows`. CR 3 — Re-confirms CR 2 on the `HashJoin.cpp:792` thread; addressed by the same JoinUtils.cpp rewrite. Test additions to `tests/queries/0_stateless/04318_106531_join_on_nan_keys`: - Constant `NaN` probe key (hash) covering the rewritten ColumnConst branch via the helper's contract. - Build-side sparse `Float64` key with runtime filter enabled, exercising the new `ColumnSparse` unwrap in `markNaNRowsImpl`. Local verification (debug build, x86-64-v2): 30/30 passes via `clickhouse local --queries-file` on the test file, sibling `03214_join_on_tuple_comparison_elimination_bug` continues to pass.
Pre-PR validation gate
Session id: cron:clickhouse-ci-task-worker:20260606-102600 |
The legacy MergeJoin (used by `SET join_algorithm = 'partial_merge'`) sorts both sides and merges by `nullableCompareAt`, which falls through to `compareAt` and compares Float32/Float64/BFloat16 bitwise. Two NaNs with identical bit patterns then form an equal range, so a NaN row on either side spuriously matches another NaN row, violating IEEE 754 `NaN != NaN`. The other algorithms patched in this PR (HashJoin, ConcurrentHashJoin, GraceHashJoin via runtime filter, DirectKeyValueJoin, FullSortingMergeJoin) all fold NaN into the join-key null map. MergeJoin's sort-then-merge contract makes the same fold here require a schema change (wrap float keys in `Nullable` for the cursor view) that the surrounding output path is not set up for, so we reject instead. The constructor throws `NOT_IMPLEMENTED` with a message pointing the user to `hash` / `parallel_hash` / `grace_hash` / `full_sorting_merge`. The new `MergeJoin::isSupported(table_join, right_sample_block)` overload is used by `AUTO` and `PARTIAL_MERGE` selection in `PlannerJoins` and `ExpressionAnalyzer` so existing AUTO queries with float keys silently fall back to HashJoin (which now handles NaN correctly) instead of throwing mid-query when JoinSwitcher would attempt to switch to MergeJoin. A new type-only helper `JoinCommon::joinKeyTypeContainsFloatPayload` walks `Nullable` / `LowCardinality` / `Tuple` so all four wrapper shapes are covered. Test cases added to `tests/queries/0_stateless/04318_106531_join_on_nan_keys.sql`: Float64, Nullable(Float64), LowCardinality(Nullable(Float64)) and Tuple(Float64, Float64) all expected to throw `NOT_IMPLEMENTED` under `partial_merge`; an Int32 partial_merge case verifies non-float keys still work. Pre-fix repro: `SET join_algorithm = 'partial_merge'; SELECT v, w FROM lpm INNER JOIN rpm ON lpm.k = rpm.k` with `(nan, 'nan-l'), (1.5, 'm')` left-side and `(nan, 999), (1.5, 100)` right-side returns BOTH `m\t100` AND `nan-l\t999` (BUG: NaN matches NaN). Post-fix: throws Code 48 NOT_IMPLEMENTED. AUTO with float keys correctly falls back to HashJoin and returns only `m\t100`. clickhouse-gh[bot] CR id 4442281467 / comment 3367XXX (partial_merge algorithm gap) on ClickHouse#106540. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pre-PR validation gateIteration 9 — addressing clickhouse-gh[bot] CR (review id 4442439049, comment 3367178592) on
Session id: cron:clickhouse-ci-task-worker:20260606-114000 |
CI Ledger — covered SHA:
|
| Failure | Classification | Disposition |
|---|---|---|
Stateless tests (amd_debug, parallel) — Logical error: Virtual row boundary violated in MergingSortedAlgorithm ... announced Decimal64_'0.000000000' but the source then produced Decimal64_'1704060000.000000000' (STID 2651-3359) + Server died |
trunk regression / chronic debug-assertion (NOT PR-caused) | Tracked as P0 in 2026-06-06-p0-stid-2651-3359--virtual-row-boundary (master regression in MergingSortedAlgorithm::checkVirtualRowBoundary). Cross-PR: 11 distinct PRs hit this STID in 30d (#105386 x2, #79393, #98472, #94937, master, #48941, #93462, #49966, #67954, #106412, ours). Stack lands in Processors/Merges/Algorithms/MergingSortedAlgorithm.cpp:45 (checkVirtualRowBoundary) on a __table1.timestamp Decimal64 column — completely unrelated to this PR's diff (JoinUtils, MergeJoinTransform, BuildRuntimeFilterTransform, MergeJoin, Planner, ExpressionAnalyzer). |
PR-caused failure check: none. PR diff is in JOIN-key NaN handling code paths (JoinUtils, HashJoin, MergeJoinTransform, BuildRuntimeFilterTransform, MergeJoin, planner). The triggering query in the failure stack is a sort-merge over a single timestamp column, no JOIN involved — different code path. The test triggering the assertion in the cross-PR set spans many years (PR #48941 from 2023 through #106412 today), and master itself was in the hit list — pure trunk regression in the merge-sorted virtual-row machinery.
Iteration 10 (2026-06-06 19:46Z) was metadata-only (PR body changelog disclosing partial_merge rejection migration path) — no new commit; the head SHA stays b201d205cacc.
Moving to on_review.
|
The PR looks ok to me. @vdimir are the changes in a hot loop that may cause a performance degradation? Can you give a final look? |
vdimir
left a comment
There was a problem hiding this comment.
I do not see performance implications, but overall looks a bit over-complicated, I left some comments. Also can we check how nan handling implemented in other places (such as aggregation or whatever we also do some comparisons)?
| const auto & key_type = right_sample_block.getByName(key_name).type; | ||
| if (key_type && JoinCommon::joinKeyTypeContainsFloatPayload(*key_type)) | ||
| throw Exception( | ||
| ErrorCodes::NOT_IMPLEMENTED, |
There was a problem hiding this comment.
This is backward incompatible. Better to keep the existing behavior and add a note to the docs for the algorithm that it may treat NaNs unexpectedly.
| return isSupported(table_join->kind(), table_join->strictness()) && table_join->oneDisjunct(); | ||
| } | ||
|
|
||
| bool MergeJoin::isSupported(const std::shared_ptr<TableJoin> & table_join, const Block & right_sample_block) |
| return markNaNRows(*full, mutable_null_map, is_null); | ||
| } | ||
|
|
||
| if (const auto * tuple = typeid_cast<const ColumnTuple *>(&column)) |
There was a problem hiding this comment.
For tuple we treat tuple(null) == tuple(null), because null here is not on top level, keeping such behavour for compatibility
| /// Extend the JOIN-key null map so any row whose float key holds a `NaN` is treated as `NULL`. | ||
| /// Per `IEEE 754` (and SQL `JOIN ON` semantics) `NaN != NaN`, but the hash table compares float | ||
| /// keys bitwise via `bitEquals`, so two `NaN`s with identical bit patterns wrongly match. Folding | ||
| /// `NaN` rows into the null map makes both build and probe sides skip them, consistent with how | ||
| /// `NULL` keys never join. | ||
| /// | ||
| /// `key_columns` are the raw key columns AFTER `extractNestedColumnsAndNullMap` peeled off any | ||
| /// outer `Nullable`. If `null_map_holder` is empty and at least one float column has a `NaN`, | ||
| /// a fresh `ColumnUInt8` is allocated and assigned. `null_map` is updated to point into the | ||
| /// (possibly new) holder. Non-float columns are skipped. | ||
| void extendJoinKeyNullMapWithFloatNaNs( | ||
| const ColumnRawPtrs & key_columns, | ||
| ColumnPtr & null_map_holder, | ||
| ConstNullMapPtr & null_map); | ||
|
|
||
| /// Quick type-only walk: does `column` (post `extractNestedColumnsAndNullMap` peeling) carry | ||
| /// any float payload at all? Recurses into `Nullable`, `LowCardinality` and `Tuple`. Used to | ||
| /// avoid scanning purely integer/string/date keys for `NaN`. | ||
| bool joinKeyContainsFloatPayload(const IColumn & column); | ||
|
|
||
| /// Same idea but inspecting the static `DataTypePtr` rather than a runtime column. Used in |
There was a problem hiding this comment.
I'm not quite sure why we have such complexity here with sevelra helpers. Why not to introduce getNanMask method for ColumnVector and just use it in implemnetations and disjunct with null_mask?
There was a problem hiding this comment.
Done in 6243982. Introduced ColumnVector<T>::getNanMask(PaddedPODArray<UInt8> &) (OR-marks NaN rows for floating-point T, no-op otherwise) and collapsed the three duplicated NaN cascades (JoinUtils.cpp, BuildRuntimeFilterTransform.cpp, MergeJoinTransform.cpp) into the single shared JoinCommon recursion, which now calls getNanMask at the leaf. Each consumer ORs the resulting mask into its existing null map (disjunct with null_mask), as you suggested. Net -250 lines.
The tuple(NULL) = tuple(NULL) compatibility you raised on MergeJoinTransform.cpp:430 is preserved by the same change: the shared recursion's Nullable branch skips already-NULL rows, so a tuple-wrapped NULL is never marked as NaN. 04318_106531_join_on_nan_keys (which covers that case) passes 50/50 with randomization.
CI Ledger — covered HEAD:
|
| job | failure | classification | disposition |
|---|---|---|---|
Stateless tests (amd_llvm_coverage, old analyzer, s3 storage, DatabaseReplicated, WasmEdge, sequential) |
02440_mutations_finalization (1 initial fail, 0/3 reruns failed → flaky) |
chronic flaky, NOT PR-caused | Tracked in our task 2026-06-09-p3-chronic-flaky-02440mutationsfinaliz. Cross-PR (30 d): 6 distinct unrelated PRs (#77210, #96130, #105822, #103706, #101783, #104976), low single-digit per-PR hits. The recently closed maintainer issue #106390 (closed 2026-06-04) and older #65585 confirm the chronic profile. The test exercises ALTER … UPDATE mutation finalization on MergeTree with replicas — zero overlap with this PR's JOIN-key NaN diff (JoinUtils.{h,cpp}, HashJoin.cpp, AddedColumns.cpp, MergeJoinTransform.cpp, BuildRuntimeFilterTransform.cpp, new test 04318_106531_join_on_nan_keys). |
Mergeable Check reports failed only because of that single chronic-flaky-test row; Finish Workflow is green. Nothing on this PR's diff is failing.
Posted by cron:our-pr-ci-monitor:20260609-160000
Per `@vdimir` `CHANGES_REQUESTED` review on PR ClickHouse#106540, restore the legacy `partial_merge` (`MergeJoin`) behaviour for float `JOIN ON` keys (`NaN` matches `NaN` bitwise) and document the gotcha instead of throwing `NOT_IMPLEMENTED`. The previous commit (`b201d205cacc`) was backward incompatible and rejected existing user queries. Reverted: - `MergeJoin::MergeJoin` constructor `NOT_IMPLEMENTED` throw on float key types. - `MergeJoin::isSupported(table_join, right_sample_block)` overload. - `JoinCommon::joinKeyTypeContainsFloatPayload` helper (now unused). - Test cases asserting `serverError NOT_IMPLEMENTED` for the four float key shapes; the test now verifies that `partial_merge` returns the legacy bitwise-equal output. Added a docs `:::note Float keys` block in the `join_algorithm` setting description so users can pick a `NaN`-safe algorithm (`hash`, `parallel_hash`, `grace_hash`, `full_sorting_merge`, `direct`) when it matters. The other algorithms in this PR (`HashJoin`, `ConcurrentHashJoin`, `GraceHashJoin` runtime filter, `DirectKeyValueJoin`, `FullSortingMergeJoin`) keep their `NaN`-as-null fold and continue to match IEEE 754 `NaN != NaN`. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pre-PR validation gate (commit
|
| # | Question | Answer |
|---|---|---|
| a | Deterministic repro? | Yes. Pre-revert: SET join_algorithm='partial_merge'; SELECT v, w FROM lpm INNER JOIN rpm ON lpm.k = rpm.k with (nan, 'nan-l'), (1.5, 'm') left and (nan, 999), (1.5, 100) right throws Code: 48 NOT_IMPLEMENTED from the new constructor check at MergeJoin.cpp:619. Post-revert: same query returns m\t100\nnan-l\t999 (legacy bitwise-equal NaN behaviour). |
| b | Root cause explained? | The previous commit (b201d205cacc) added a hard NOT_IMPLEMENTED throw in MergeJoin for any float JOIN ON key, which broke existing user queries that used partial_merge with float keys. partial_merge cannot express NaN != NaN through its sort-then-merge comparator, but as @vdimir pointed out, rejecting the query is backward incompatible — pre-PR users got NaN-matches-NaN bitwise, and rejecting at construction time turns previously-passing queries into runtime errors. The right fix is to keep the legacy behaviour and document the algorithm-specific NaN semantics. |
| c | Fix matches root cause? | Yes. The change is a clean revert of (1) the constructor throw, (2) the new isSupported(table_join, right_sample_block) overload, (3) its three call-site updates in ExpressionAnalyzer.cpp / PlannerJoins.cpp, (4) the now-unused JoinCommon::joinKeyTypeContainsFloatPayload helper, and adds a :::note Float keys block to the join_algorithm setting docs (src/Core/Settings.cpp). No symptom-masking. |
| d | Test intent preserved / new tests added? | Yes. The four -- { serverError NOT_IMPLEMENTED } cases for Float64/Nullable/LC(Nullable)/Tuple are removed (they only existed to assert the rejection that we're undoing). They are replaced with a single positive case asserting that partial_merge returns m\t100\nnan-l\t999 — i.e. the legacy bitwise-equal output — which fails without this revert (the constructor would throw before the SELECT can run). The remaining hash/parallel_hash/grace_hash/full_sorting_merge/direct cases that exercise the NaN-as-null fold are unchanged and still verify that those algorithms canonicalize NaN. |
| e | Both directions demonstrated? | Yes. Pre-revert (b201d205cacc): the new test case throws Code 48 — verified locally before reverting. Post-revert (4cb2d05a182b): tests/clickhouse-test --no-random-settings --test-runs 50 04318_106531_join_on_nan_keys passes 50/50; the directly-run partial_merge query returns m\t100\nnan-l\t999, the directly-run hash query returns just m\t100 (NaN-safe), confirming the algorithm-specific NaN handling matches the docs note. 01010_partial_merge_join, 01010_partial_merge_join_const_and_lc, 01010_partial_merge_join_negative, 01147_partial_merge_full_join all pass. |
| f | Fix is general, not a narrow patch? | N/A — this is a documentation + revert change. partial_merge keeps its historical behaviour; the symptom (NaN matches NaN for float keys in partial_merge) is now a documented and explained gotcha rather than a silent correctness bug, and users who need IEEE 754 NaN != NaN are pointed at hash / parallel_hash / grace_hash / full_sorting_merge / direct (all of which canonicalize NaN to NULL via this PR). The other 5 algorithms still get the proper NaN-as-null fold. |
This addresses @vdimir CR comments #1 (MergeJoin.cpp:625) and #2 (MergeJoin.cpp:1296). The remaining vdimir feedback (#3 tuple null compat, #4 getNanMask refactor, top-level NaN-handling audit) and the bot's DirectKeyValueJoin / EmbeddedRocksDB gap will be addressed in follow-up commits.
Session id: cron:clickhouse-ci-task-worker:20260609-161751
|
@vdimir re your question "can we check how nan handling is implemented in other places (aggregation or wherever we do comparisons)?" - I audited the other equality/comparison contexts. Short answer: every other place uses bitwise (
The bitwise behavior in the hash table is load-bearing, not accidental: the comment at Why leaving the others unchanged is correct:
The existing test already pins this ( On your other comments: I will address the |
Address @vdimir review feedback on PR ClickHouse#106540 (the JOIN ON NaN-keys fix for issue ClickHouse#106531). The previous version carried three near-identical ~80-line recursive NaN-detection cascades (in JoinUtils.cpp, BuildRuntimeFilterTransform.cpp and MergeJoinTransform.cpp), each re-implementing the same Nullable/LowCardinality/Tuple/ColumnConst/ ColumnSparse unwrap plus a per-float-type leaf loop. Replace them with a single per-column primitive: ColumnVector<T>::getNanMask, which OR-marks NaN rows for floating-point T (no-op otherwise). The shared JoinCommon recursion (markFloatNaNRowsAsNull / joinKeyContainsFloatPayload) keeps the wrapper unwrap in one place and calls getNanMask at the leaf; the two transforms now reuse that shared recursion instead of private copies. Behaviour is unchanged: NaN keys are still treated as never-matching for JOIN ON, while null-safe tuple equality (tuple(NULL) = tuple(NULL)) is preserved because the Nullable branch skips already-NULL rows. Net -250 lines across the four join code paths. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pre-PR validation gateThis push (
Session id: cron:clickhouse-worker-slot-8:20260610-230810 |
…erMap) The generic direct (key-value) JOIN path did not honour IEEE 754 / SQL JOIN ON NaN != NaN for IKeyValueEntity backends. DirectKeyValueJoin::joinBlock forwarded the probe key to storage->getByKeys; StorageEmbeddedRocksDB (and Redis / KeeperMap) serialize the key Field bitwise, so a probe NaN matched a build-side NaN with the same bit pattern. Only DirectJoinMergeTreeEntity masked NaN internally, so the direct algorithm overlisted NaN-safety in the join_algorithm docs for the generic backends. Fix: fold NaN probe rows to the not-found path inside DirectKeyValueJoin::joinBlock, gated on offsets.empty() so only the generic backends (which return a 1:1 result with empty offsets) are touched; DirectJoinMergeTreeEntity already masks NaN and returns non-empty offsets, so it is left unchanged. Reuses the shared JoinCommon::markFloatNaNRowsAsNull recursion (Nullable / LowCardinality / Tuple), so the masking matches the hash / full_sorting_merge paths exactly. NaN rows are remapped to a default sentinel and null_map=false, mirroring how the MergeTree entity handles its own NaN rows, which yields the correct shape for INNER (drop), LEFT (default / NULL), LEFT SEMI (drop), and LEFT ANTI (keep). Adds 04318_106531_join_on_nan_keys_rocksdb_direct (tagged no-fasttest, use-rocksdb) covering all four join kinds plus join_use_nulls, Float32 keys, non-NaN regression, and a String key literally equal to 'nan' (must still match). Split from the parent 04318 test because EmbeddedRocksDB is not in the fast-test build. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Pre-PR validation gate (commit
|
| # | Question | Answer |
|---|---|---|
| a | Deterministic repro? | Yes. CREATE TABLE rdb (k Float64, v String) ENGINE = EmbeddedRocksDB PRIMARY KEY k; INSERT VALUES (1.0,'one'),(nan,'rocks_nan'),(2.0,'two'); then SELECT l.x, r.v FROM (SELECT arrayJoin([1.0, nan, 2.0]) AS x) l INNER JOIN rdb r ON l.x = r.k SETTINGS join_algorithm='direct' returns nan -> rocks_nan every time on 6243982. EXPLAIN confirms FilledJoin (the direct path), not a fallback. |
| b | Root cause explained? | DirectKeyValueJoin::joinBlock (DirectJoin.cpp:159) forwards the raw probe key to storage->getByKeys. For generic IKeyValueEntity backends, StorageEmbeddedRocksDB::getByKeys serializes the probe Field bitwise (StorageEmbeddedRocksDB.cpp:940-947), so a probe NaN matches a build-side NaN with the same bit pattern. Only DirectJoinMergeTreeEntity::getByKeys masked NaN internally, so the generic path stayed unmasked. |
| c | Fix matches root cause? | Yes. Masks NaN probe rows at the consumer (DirectKeyValueJoin::joinBlock), gated on offsets.empty() so only the generic backends (which return a 1:1 result with empty offsets) are touched; DirectJoinMergeTreeEntity (non-empty offsets) is left unchanged. Reuses the shared JoinCommon::markFloatNaNRowsAsNull recursion introduced in 6243982, so masking matches the hash / full_sorting_merge paths. |
| d | Test intent preserved / new tests added? | New test 04318_106531_join_on_nan_keys_rocksdb_direct (no-fasttest, use-rocksdb): all four direct join kinds (INNER drop, LEFT default, LEFT SEMI drop, LEFT ANTI keep) + join_use_nulls (NaN -> NULL) + Float32 keys + non-NaN regression + a String key literally 'nan' that must still match. Split from the parent 04318 because RocksDB is not in the fast-test build; parent 04318 stays dependency-free and still passes. |
| e | Both directions demonstrated? | Yes. Without the fix (6243982): INNER and LEFT both return nan -> rocks_nan. With the fix (deba70cd): INNER drops NaN, LEFT yields nan -> '' (or NULL with join_use_nulls), SEMI drops, ANTI keeps. Matches the already-fixed hash reference exactly. New test 30/30 deterministic passes; parent 04318 still passes. |
| f | Fix is general, not a narrow patch? | Yes. The fix lives in DirectKeyValueJoin::joinBlock, the single consumer of getByKeys, so it covers ALL generic IKeyValueEntity backends (EmbeddedRocksDB, Redis, KeeperMap), not just RocksDB. The offsets.empty() gate is the structural discriminator between self-masking (MergeTree) and non-masking backends, not a RocksDB-specific check. Not a defensive symptom guard: the masking is the same canonicalization the other algorithms apply, applied at the point the unmasked key enters the lookup. |
Session id: cron:clickhouse-worker-slot-15:20260611-004100
| When using the `partial_merge` algorithm, ClickHouse sorts the data and dumps it to the disk. The `partial_merge` algorithm in ClickHouse differs slightly from the classic realization. First, ClickHouse sorts the right table by joining keys in blocks and creates a min-max index for sorted blocks. Then it sorts parts of the left table by the `join key` and joins them over the right table. The min-max index is also used to skip unneeded right table blocks. | ||
|
|
||
| :::note Float keys | ||
| Float `JOIN ON` keys are compared bitwise by `partial_merge`, so two `NaN` rows with identical bit patterns spuriously match. The other algorithms (`hash`, `parallel_hash`, `grace_hash`, `full_sorting_merge`, `direct`) treat `NaN` as never equal, matching IEEE 754 semantics. Use one of those algorithms when `NaN`-safety matters. |
There was a problem hiding this comment.
The code now documents and tests partial_merge as the legacy exception that still bitwise-matches same-pattern NaN keys, but the PR description and changelog entry still say partial_merge now rejects float JOIN ON keys with NOT_IMPLEMENTED and that users must migrate. That release note is now wrong for the current code: users will expect an exception/migration, while partial_merge still returns same-bit NaN matches.
Please update the PR description and changelog entry to say that partial_merge keeps its legacy bitwise behavior and is the documented exception; only hash, parallel_hash, grace_hash, full_sorting_merge, and direct treat NaN as never equal.
There was a problem hiding this comment.
Fixed the PR description and changelog entry to match the code.
After the 4cb2d05 revert (per the backward-compat request on MergeJoin.cpp), partial_merge no longer rejects float keys with NOT_IMPLEMENTED. It keeps its legacy behavior: float JOIN ON keys are compared bitwise, so two NaN rows with identical bit patterns still match. The description and changelog now state exactly that and point users at the NaN-safe algorithms (hash, parallel_hash, grace_hash, full_sorting_merge, direct). This matches the :::note Float keys block already in this join_algorithm doc.
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 167/190 (87.89%) | Lost baseline coverage: none · Uncovered code |
| auto & selector_data = selector->getData(); | ||
| selector_data.resize(num_rows); | ||
| for (size_t i = 0; i < num_rows; ++i) | ||
| { | ||
| if (nan_mask[i]) | ||
| { | ||
| selector_data[i] = num_rows; | ||
| null_map[i] = 0; | ||
| } | ||
| else | ||
| selector_data[i] = i; | ||
| } | ||
| MutableColumns chunk_columns = joined_chunk.mutateColumns(); | ||
| for (auto & col : chunk_columns) | ||
| { | ||
| col->insertDefault(); | ||
| col = IColumn::mutate(col->index(*selector, 0)); |
There was a problem hiding this comment.
Why to use selector rather then just IColumn::filter ?
There was a problem hiding this comment.
IColumn::filter can only delete rows, but a NaN probe key must not be deleted for every kind: regular LEFT keeps the left row with a default/NULL right side, and LEFT ANTI keeps the row entirely. The generic backend returns a 1:1 result (one right row per probe row) plus a null_map; setting null_map[i]=0 alone handles INNER/SEMI/ANTI (they filter on null_map at the end), but for regular LEFT the row is kept and would still show the stale bitwise-matched right value. So I remap NaN rows to an appended default sentinel via index() and set null_map[i]=0, which routes them through the exact same not-found path the code already produces for a genuinely missing key.
This mirrors DirectJoinMergeTreeEntity::getByKeys, which uses the same selector-to-sentinel pattern for its own not-found rows. Verified against EmbeddedRocksDB (Float64 PK with NaN): all four kinds match the hash reference (INNER -> drop, LEFT -> default right, SEMI -> drop, ANTI -> keep). A plain filter would give the wrong shape for LEFT and ANTI.
That said, I agree the block is a bit heavy. If you prefer, I can drop the index() selector and instead blank only the right columns for NaN rows in place (the chunk is already 1:1 here, so no reindex is needed), keeping null_map[i]=0 to drive the filtering. Happy to switch to whichever you find cleaner.
| /// `Tuple` like `JoinCommon::extendJoinKeyNullMapWithFloatNaNs` does on the hash path. | ||
| PaddedPODArray<UInt8> nan_mask; | ||
| bool any_nan = false; | ||
| if (JoinCommon::joinKeyContainsFloatPayload(*keys[0].column)) |
There was a problem hiding this comment.
Won't it work automatically with direct join with MT, since we build plan with IN which should handle nans proerly?
There was a problem hiding this comment.
I checked this empirically and the IN plan does not make the masking redundant, because IN coalesces NaN (the opposite of JOIN ON):
nan IN (nan) -> 1 (true)
So buildFilterStepWithIn keeps the right-side NaN row rather than excluding it. The IN step is only a coarse candidate pre-filter to fetch matching right rows; the actual key matching happens afterwards in the ColumnsHashing findKey loop, which compares the key payload bitwise, so a probe NaN then matches the retained right NaN.
Confirmed by disabling just the mask and rebuilding: MT direct INNER JOIN ON l.k = r.k with NaN on both sides returns the NaN row (wrong) without the mask, and drops it (correct, matching hash) with it. So the DirectJoinMergeTreeEntity mask is load-bearing, not redundant.
There was a problem hiding this comment.
For NULL we treat tuple(NULL) == tuple(NULL), for backward compatibility perhaps for nans it could be consistent with nulls
There was a problem hiding this comment.
Good question, and it is worth settling explicitly. The reason I treat tuple-nested NaN differently from tuple-nested NULL is that the two are not actually consistent in ClickHouse's own comparison semantics:
nan = nan -> 0 (false)
nan IS NOT DISTINCT FROM nan -> 0 (false) <-- NaN is distinct from itself
NULL IS NOT DISTINCT FROM NULL -> 1 (true)
tuple(nan) = tuple(nan) -> 0 (false)
tuple(NULL) = tuple(NULL) -> NULL (not true)
The null-safe tuple wrap in PlannerJoins.cpp exists specifically so tuple(NULL) matches tuple(NULL), i.e. it implements a IS NOT DISTINCT FROM b. But that very operator says nan IS NOT DISTINCT FROM nan is false, so making tuple-nested NaN match would make the JOIN disagree with the scalar <=> it is supposed to implement. Concretely, on Nullable(Float64):
SELECT v FROM la INNER JOIN ra ON la.k <=> ra.k -- la=(1.5,'m'),(nan,'nan-l'),(NULL,'null-l')
master: m, nan-l, null-l (nan-l wrongly matches)
this PR: m, null-l (NULL still matches null-safe; NaN dropped, matching nan<=>nan=false)
So the current behavior keeps NULL matching (compat preserved) while making NaN consistent with both scalar = and <=>. If you would still prefer tuple-nested NaN to match for strict backward compatibility with pre-PR partial_merge-style bitwise behavior, I can scope the fold to top-level float keys only (plain ON a = b and multi-key AND, which arrive as separate top-level columns, not as a ColumnTuple), leaving explicit/<=>-wrapped tuple keys untouched. That is a smaller change and also drops the tuple recursion. Let me know which semantics you want and I will adjust.

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fixed
INNER JOIN ON(andLEFT/RIGHT/FULL/SEMI/ANTIvariants) on float keys returning wrong results when both sides evaluated toNaN. Per IEEE 754 / SQLJOIN ONsemanticsNaN != NaN, but the hash table compared float keys bitwise so matching bit patterns wrongly joined. The fix applies to thehash,parallel_hash,grace_hash,full_sorting_merge, anddirectalgorithms and the runtime filter path. The legacypartial_mergealgorithm still compares float keys bitwise (so twoNaNrows with identical bit patterns match); use one of the other algorithms whenNaN-safety matters.GROUP BY/DISTINCT/uniq/INsemantics onNaNare unchanged.Description
Per IEEE 754 (and SQL
JOIN ONsemantics)NaNnever compares equal toNaN, but ClickHouse's hash table compared float keys bitwise viabitEquals(memcmp). TwoNaNvalues with matching bit patterns therefore wrongly joined:INNER JOIN ON NaN = NaNreturned rows instead of zero, andANTI JOINdropped theNaN-keyed left row instead of keeping it. Reproducer from the issue:Both sides of the
ONclause evaluate toNaN; two of the three build-sideNaNs have bit pattern0xFFF8000000000000matching the probe-sideNaNfromfmod, sobitEqualsreported equality.The fix folds
NaNrows into the JOIN-key null map so they are skipped on both build and probe sides, mirroring howNULLkeys behave. This is applied at every algorithm so they all see the same semantics:HashJoin/ConcurrentHashJoin- extend the build- and probe-sidenull_mapafterextractNestedColumnsAndNullMapviaJoinCommon::extendJoinKeyNullMapWithFloatNaNs(built on the newColumnVector::getNanMaskprimitive).FullSortingMergeJoin- inFullMergeJoinCursor::setChunk, ORNaNpositions into the per-keynull_mapsonullableCompareAtshort-circuits.BuildRuntimeFilterTransform- dropNaNrows before they enter the bloom / exact runtime filter, so the runtime filter never carries aNaNentry that would mis-prune a probe-sideNaN.direct-DirectJoinMergeTreeEntity::getByKeys(MergeTree-keyed) andDirectKeyValueJoin::joinBlock(genericIKeyValueEntitybackends such asEmbeddedRocksDB) routeNaNprobe rows to the not-found path so they honourNaN != NaN.The legacy
MergeJoin(partial_mergealgorithm) keeps its existing behavior: it sorts both sides and merges vianullableCompareAt, which compares float keys bitwise, so twoNaNrows with identical bit patterns still match. Changing it to foldNaNinto the null map is not feasible without rewriting the surrounding output path (copyLeftRange,addRightColumns, fixedright_table_keysschema), and doing so would be a backward-incompatible change for existingpartial_mergequeries on float keys. Thejoin_algorithmsetting documentation gains a:::note Float keysblock describing this, and points users at theNaN-safe algorithms.GROUP BY/DISTINCT/uniq/INkeep their existingNaN-coalescing behaviour. This change only touches the JOIN ON equality path.The new test
04318_106531_join_on_nan_keyscoversINNER/LEFT/RIGHT/FULL/SEMI/ANTIjoins onFloat32/Float64/Nullable(Float64)keys for bothhashandfull_sorting_mergealgorithms, tuple-keyed and runtime-filter variants, and confirmsGROUP BY/DISTINCTsemantics onNaNare unchanged.04318_106531_join_on_nan_keys_rocksdb_directcovers the genericdirectpath overEmbeddedRocksDB.Closes #106531