Treat float NaN keys as NULL on JOIN ON by groeneai · Pull Request #106540 · ClickHouse/ClickHouse · GitHub
Skip to content

Treat float NaN keys as NULL on JOIN ON#106540

Open
groeneai wants to merge 12 commits into
ClickHouse:masterfrom
groeneai:fix-106531-nan-inner-join
Open

Treat float NaN keys as NULL on JOIN ON#106540
groeneai wants to merge 12 commits into
ClickHouse:masterfrom
groeneai:fix-106531-nan-inner-join

Conversation

@groeneai

@groeneai groeneai commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed INNER JOIN ON (and LEFT / RIGHT / FULL / SEMI / ANTI variants) on float keys returning wrong results when both sides evaluated to NaN. Per IEEE 754 / SQL JOIN ON semantics NaN != NaN, but the hash table compared float keys bitwise so matching bit patterns wrongly joined. The fix applies to the hash, parallel_hash, grace_hash, full_sorting_merge, and direct algorithms and the runtime filter path. The legacy partial_merge algorithm still compares float keys bitwise (so two NaN rows with identical bit patterns match); use one of the other algorithms when NaN-safety matters. GROUP BY / DISTINCT / uniq / IN semantics on NaN are unchanged.

Description

Per IEEE 754 (and SQL JOIN ON semantics) NaN never compares equal to NaN, but ClickHouse's hash table compared float keys bitwise via bitEquals (memcmp). Two NaN values with matching bit patterns therefore wrongly joined: INNER JOIN ON NaN = NaN returned rows instead of zero, and ANTI JOIN dropped the NaN-keyed left row instead of keeping it. Reproducer from the issue:

CREATE TABLE t2 (c0 Int32, c1 Int32, c2 String) ENGINE = MergeTree() ORDER BY lcm(c0, c1);
CREATE TABLE t3 (c0 Int32) ENGINE = Memory();
INSERT INTO t2(c2, c1, c0) VALUES ('r-80l', 821753656, 1509704461), ('a0}', -823561908, -314051915), ('x9', 1545468958, -924334719);
INSERT INTO t3(c0) VALUES (-792862717);
SELECT t2.c0
FROM t3 INNER JOIN t2
  ON (t3.c0 + t3.c0) % pow(t3.c0, t3.c0) = pow(-t2.c0, sqrt(t2.c0))
WHERE NOT (sqrt(intDiv(t2.c0, t2.c0)) > 100);
-- 2 rows before, 0 rows after

Both sides of the ON clause evaluate to NaN; two of the three build-side NaNs have bit pattern 0xFFF8000000000000 matching the probe-side NaN from fmod, so bitEquals reported equality.

The fix folds NaN rows into the JOIN-key null map so they are skipped on both build and probe sides, mirroring how NULL keys behave. This is applied at every algorithm so they all see the same semantics:

  • HashJoin / ConcurrentHashJoin - extend the build- and probe-side null_map after extractNestedColumnsAndNullMap via JoinCommon::extendJoinKeyNullMapWithFloatNaNs (built on the new ColumnVector::getNanMask primitive).
  • FullSortingMergeJoin - in FullMergeJoinCursor::setChunk, OR NaN positions into the per-key null_map so nullableCompareAt short-circuits.
  • BuildRuntimeFilterTransform - drop NaN rows before they enter the bloom / exact runtime filter, so the runtime filter never carries a NaN entry that would mis-prune a probe-side NaN.
  • direct - DirectJoinMergeTreeEntity::getByKeys (MergeTree-keyed) and DirectKeyValueJoin::joinBlock (generic IKeyValueEntity backends such as EmbeddedRocksDB) route NaN probe rows to the not-found path so they honour NaN != NaN.

The legacy MergeJoin (partial_merge algorithm) keeps its existing behavior: it sorts both sides and merges via nullableCompareAt, which compares float keys bitwise, so two NaN rows with identical bit patterns still match. Changing it to fold NaN into the null map is not feasible without rewriting the surrounding output path (copyLeftRange, addRightColumns, fixed right_table_keys schema), and doing so would be a backward-incompatible change for existing partial_merge queries on float keys. The join_algorithm setting documentation gains a :::note Float keys block describing this, and points users at the NaN-safe algorithms.

GROUP BY / DISTINCT / uniq / IN keep their existing NaN-coalescing behaviour. This change only touches the JOIN ON equality path.

The new test 04318_106531_join_on_nan_keys covers INNER / LEFT / RIGHT / FULL / SEMI / ANTI joins on Float32 / Float64 / Nullable(Float64) keys for both hash and full_sorting_merge algorithms, tuple-keyed and runtime-filter variants, and confirms GROUP BY / DISTINCT semantics on NaN are unchanged. 04318_106531_join_on_nan_keys_rocksdb_direct covers the generic direct path over EmbeddedRocksDB.

Closes #106531

Per IEEE 754 / SQL JOIN ON semantics, NaN never compares equal to NaN,
but ClickHouse's hash table compared float keys bitwise via bitEquals
(memcmp). Two NaN values with matching bit patterns therefore wrongly
joined - INNER JOIN ON NaN = NaN returned rows instead of zero, and
ANTI JOIN dropped the NaN-keyed left row instead of keeping it.

Fold NaN rows into the JOIN-key null map so they are skipped on both
build and probe sides, mirroring how NULL keys behave. Done at three
layers:

* HashJoin / ConcurrentHashJoin: extend the build- and probe-side
  null_map after extractNestedColumnsAndNullMap via a new helper
  JoinCommon::extendJoinKeyNullMapWithFloatNaNs.

* FullSortingMergeJoin: in FullMergeJoinCursor::setChunk, OR NaN
  positions into the per-key null_map so nullableCompareAt
  short-circuits.

* BuildRuntimeFilterTransform: drop NaN rows before they enter the
  bloom/exact runtime filter, so the runtime filter never carries a
  NaN entry that would mis-prune a probe-side NaN.

GROUP BY / DISTINCT / uniq / IN keep their existing NaN-coalescing
behaviour - this changes only the JOIN ON path.

Closes: ClickHouse#106531
@groeneai

groeneai commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

@groeneai

groeneai commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

cc @KochetovNicolai @vdimir — could you review this? It fixes wrong results for INNER/LEFT/RIGHT/FULL/SEMI/ANTI JOIN ON with float keys: the hash table compared NaN keys bitwise via bitEquals and matched two NaNs with identical bit patterns. The fix folds NaN rows into the JOIN-key null map (build + probe) on HashJoin, FullSortingMergeJoin, and the BuildRuntimeFilter — leaving GROUP BY / DISTINCT / IN NaN coalescing untouched. cc @PedroTadim per your request on the issue.

DROP TABLE IF EXISTS l;
DROP TABLE IF EXISTS r;

CREATE TABLE l (k Float64, v String) ENGINE = Memory();

@PedroTadim PedroTadim Jun 5, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@groeneai can you add test examples with BFloat16 as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in d51a467.

The fix already dispatched on BFloat16 in all three NaN-detection paths (extendJoinKeyNullMapWithFloatNaNs in JoinUtils.cpp, foldFloatNaNsIntoNullMap in MergeJoinTransform.cpp, filterOutNaNs in BuildRuntimeFilterTransform.cpp); the new test block exercises that dispatch end-to-end. New cases mirror the Float32 grid: hash INNER/LEFT/ANTI and full_sorting_merge INNER over BFloat16 keys with nan / -nan mixed with a finite value.

Local verification (build-asan, --no-random-settings --test-runs 30): 30/30 pass.

Session: cron:clickhouse-ci-task-worker:20260605-144800

Per @PedroTadim review on PR ClickHouse#106540: cover BFloat16 in addition to
Float32 / Float64 / Nullable(Float64). The fix already dispatches on
BFloat16 in all three NaN-detection paths (extendJoinKeyNullMapWithFloatNaNs,
foldFloatNaNsIntoNullMap, filterOutNaNs); this commit just exercises that
dispatch end-to-end.

The new block covers BFloat16 keys for hash INNER / LEFT / ANTI and
full_sorting_merge INNER, mirroring the Float32 grid.
@groeneai

groeneai commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

# Question Answer
a Deterministic repro? Yes. With the original (pre-fix) code, SELECT v FROM (SELECT lbf.v FROM lbf INNER JOIN rbf ON lbf.k = rbf.k) over BFloat16 keys with nan/-nan returned nan-l (and previously also neg-nan-l when matching bit-pattern aliases collide), demonstrating the same bitEquals bug as Float32/Float64.
b Root cause explained? Same as the parent fix: the hash table compared float keys bitwise via HashTable::bitEquals (memcmp), so two NaNs with identical bit patterns wrongly compared equal and joined. BFloat16 shares this code path because it is a ColumnVector<T> like Float32/Float64; the only requirement to fix it is that the three NaN-detection helpers dispatch on BFloat16 too — which they already do.
c Fix matches root cause? Yes. No code change required: the parent commit (3bf27297) already added BFloat16 dispatch in extendJoinKeyNullMapWithFloatNaNs (JoinUtils.cpp:451,474-475), foldFloatNaNsIntoNullMap (MergeJoinTransform.cpp:390-391), and filterOutNaNs (BuildRuntimeFilterTransform.cpp:136-137). This commit only adds end-to-end test coverage.
d Test intent preserved / new tests added? Test additions mirror the existing Float32 grid for BFloat16: hash INNER/LEFT/ANTI plus full_sorting_merge INNER (exercises both the HashJoin and MergeJoinTransform paths). Reference file updated. The GROUP BY / DISTINCT no-regression block is unchanged and continues to assert NaN-coalescing for non-JOIN code paths.
e Both directions demonstrated? Yes. (1) Pre-fix build/ binary returns nan-l on the BFloat16 INNER JOIN. (2) Post-fix build-asan binary returns the expected single match m. (3) clickhouse-test --no-random-settings --test-runs 30 04318_106531_join_on_nan_keys ran 30/30 OK on build-asan.
f Fix is general, not a narrow patch? Yes. The fix already covers the three sibling NaN-detection sites; the BFloat16 dispatch was added uniformly across HashJoin/ConcurrentHashJoin build+probe (AddedColumns.cpp + HashJoin.cpp), FullSortingMergeJoin (MergeJoinTransform.cpp), and the runtime-filter transform. No symmetric path remains uncovered.

Session id: cron:clickhouse-ci-task-worker:20260605-144800

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Jun 5, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [deba70c]

Summary:


AI Review

Summary

This PR changes float-key JOIN ON handling so NaN probe/build rows are routed through the null/not-found path for hash, parallel_hash, grace_hash runtime filters, full_sorting_merge, and direct, while keeping partial_merge as a documented legacy exception. The code and regression coverage look coherent after the latest direct-join fix, but the PR metadata still describes the earlier partial_merge rejection that was reverted.

PR Metadata
  • 💡 Bug Fix is the right changelog category, and a changelog entry is required.
  • 💡 The current PR description and changelog entry are inaccurate: they still say partial_merge now rejects float JOIN ON keys with NOT_IMPLEMENTED, but the current code and test keep the legacy bitwise partial_merge behavior. Suggested replacement changelog entry:
Fixed `INNER JOIN ON` (and `LEFT` / `RIGHT` / `FULL` / `SEMI` / `ANTI` variants) on float keys returning wrong results when both sides evaluated to `NaN`. Per IEEE 754 / SQL `JOIN ON` semantics `NaN != NaN`, but the hash table compared float keys bitwise so matching bit patterns wrongly joined. The fix applies to `hash`, `parallel_hash`, `grace_hash` runtime-filter paths, `full_sorting_merge`, and `direct`. `partial_merge` keeps its legacy bitwise comparison for float `JOIN ON` keys, so same-pattern `NaN` values can still match there; use one of the other algorithms when `NaN` must be treated as never equal. `GROUP BY` / `DISTINCT` / `uniq` / `IN` semantics on `NaN` are unchanged.
Findings

💡 Nits

  • [src/Core/Settings.cpp:3444] The code now documents partial_merge as the legacy exception, and 04318_106531_join_on_nan_keys asserts that partial_merge still returns the same-bit NaN match. The PR description/changelog still promise a NOT_IMPLEMENTED rejection and migration path, so the release note would describe behavior users will not see. Update the metadata to match the current contract.
Final Verdict

Status: ⚠️ Request changes

Minimum required action: update the PR description and changelog entry to match the current partial_merge behavior. No code-level blocker remained after verifying the current direct generic IKeyValueEntity fix.

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Jun 5, 2026
Comment thread src/Processors/Transforms/BuildRuntimeFilterTransform.cpp
@groeneai

groeneai commented Jun 5, 2026

Copy link
Copy Markdown
Contributor Author

CI Ledger — covered HEAD: `d51a4676af70`

Pulled the failure set after CI fully completed (last check 20:44:02Z, >20min CIDB ingestion buffer met).

job test classification disposition
AST fuzzer (amd_debug, targeted, old_compatibility) `test_backup_restore_s3/test.py::test_backup_to_disk[data_file_name_from_first_file_name-from_local_to_s3]` — `Logical error: Sort order of blocks violated` (STID 3413-350b) flaky / chronic trunk bug tracked in task `2026-05-16-ci-p0-sort-order-stid-3413-350b-recurre` (P0 chronic). 30d cross-PR: 7 distinct unrelated PRs (#100173, #103705, #104324, #104804, #104890, #105541, #106197). Stack hits `DB::CheckSortedTransform` during MergeTree merge of UInt64 keys, no JOIN/HashJoin/RuntimeFilter code on the path. Not caused by this NaN-in-JOIN-keys diff.
Performance Comparison (arm_release, master_head, 6/6) 12 `Check Results/logical_functions_medium #N::{old,new}` and `Check Results/min_max_index_with_cast #0::{old,new}` reported "slower" infra / runner jitter Both `::old` (master_head reference) AND `::new` (this PR) are reported slower than last-week reference. Same shape on both sides means trunk-level runner drift, not a PR-side regression. `logical_functions_medium` exercises scalar AND/OR/XOR branch behavior; `min_max_index_with_cast` exercises MergeTree skip-index pruning. Neither path overlaps with our `JoinUtils::extendJoinKeyNullMapWithFloatNaNs` / `HashJoin` / `MergeJoinTransform` / `BuildRuntimeFilterTransform` changes. Tracked in chronic task `2026-06-04-ci-chronic-performance-comparison-armr`.

`Mergeable Check` reports failed because of the perf job. `Finish Workflow` SUCCESS. All 100+ stateless / integration / stress / fuzzer / unit-test jobs pass on this HEAD.

Separately, the AI Review on this run flagged a Major in `BuildRuntimeFilterTransform.cpp:157` — `filterOutNaNs` only handles top-level `ColumnVector(Float)` and would miss `Nullable(Float)` / `LowCardinality(Nullable(Float))` / `Tuple` runtime-filter keys, leaving an `ANTI JOIN` mis-prune window. This is review-feedback rather than a CI test failure; it will be addressed in a fix-up commit.

Session: cron:our-pr-ci-monitor:20260605-210000

The previous helper only recognised a top-level `ColumnFloat32`/`ColumnFloat64`/
`ColumnVector<BFloat16>`. Runtime filters can be built over `Nullable(Float*)`,
`LowCardinality(Nullable(Float*))`, or - for multi-key `LEFT ANTI` - a
temporary `Tuple` of common types, so a probe-side `NaN` was still pre-pruned
by `ExactNotContainsRuntimeFilter` before the join saw the row.

Recursively traverse those wrappers and OR-mark every row whose float payload
is `NaN`, then filter the original (still-wrapped) column once. Already-NULL
rows do not need to be dropped: the runtime filter stores `NULL` natively
(`argument_can_have_nulls`), so there is no bitwise match for them on the
probe side.

`isNaN` is templated and works on `Float32`, `Float64` and `BFloat16`, so the
fix covers all three element types under any of the three wrappers.

Test additions in `04318_106531_join_on_nan_keys.sql` cover, with
`enable_join_runtime_filters = 1` and `join_algorithm = 'hash'`:

* `Nullable(Float64)` `ANTI`/`INNER`
* `LowCardinality(Nullable(Float64))` `ANTI`
* `Tuple(Float64, Float64)` `LEFT ANTI` (multi-key path)
@groeneai

groeneai commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate (commit 1b3d5dc)

# Question Answer
a Deterministic repro? Yes. With SET enable_analyzer = 1, enable_join_runtime_filters = 1, join_algorithm = 'hash'; the queries lrf ANTI JOIN rrf ON lrf.k = rrf.k (where both columns are Nullable(Float64) with NaN rows on each side), the same shape with LowCardinality(Nullable(Float64)) keys, and lrf_t LEFT ANTI JOIN rrf_t ON lrf_t.k1 = rrf_t.k1 AND lrf_t.k2 = rrf_t.k2 (Float64, multi-key) all return wrong results without the fix. The minimised reproducers are the new test cases at the bottom of tests/queries/0_stateless/04318_106531_join_on_nan_keys.sql.
b Root cause explained? BuildRuntimeFilterTransform::filterOutNaNs only recognised a top-level ColumnFloat32 / ColumnFloat64 / ColumnVector<BFloat16>. Because joinRuntimeFilter.cpp computes common_type as the JOIN-key supertype (so Nullable(Float) keys cast up to Nullable(Float)), and because multi-key LEFT ANTI builds a Tuple runtime-filter key via addTupleOfKeys, the column reaching filterOutNaNs is normally ColumnNullable / ColumnLowCardinality / ColumnTuple. Those branches fell through to the no-op return, so NaN rows entered ExactNotContainsRuntimeFilter and the probe-side __applyFilter then bitwise-matched a probe NaN to a build NaN, pre-pruning the probe row before the join saw it. For ANTI, that wrongly removed the probe NaN from the result.
c Fix matches root cause? Yes. markNaNRowsImpl recursively traverses ColumnNullable (under a null-map mask), ColumnLowCardinality (via convertToFullColumnIfLowCardinality) and ColumnTuple (each element), OR-marking every row whose float payload is NaN. A single column->filter(keep_mask, -1) is then applied at the top so the Nullable/LowCardinality/Tuple shape Set::insertFromColumns expects is preserved. No symptom guard or assertion widening.
d Test intent preserved / new tests added? Yes. New tests in 04318_106531_join_on_nan_keys.sql exercise each wrapper class explicitly (Nullable(Float64) ANTI and INNER, LowCardinality(Nullable(Float64)) ANTI, multi-key Tuple LEFT ANTI) with enable_join_runtime_filters = 1 and join_algorithm = 'hash'. The existing tests are unchanged.
e Both directions demonstrated? Yes. Locally rebuilt the debug binary with markNaNRowsImpl short-circuiting on the three wrapped column types (pre-fix behaviour). The new Nullable(Float64) ANTI and LC(Nullable(Float64)) ANTI queries returned only null-l (missing nan-l); the Tuple LEFT ANTI query returned only unmatched (missing nan-l and nan-r-side). After restoring the fix and rebuilding, the same queries return the expected nan-l, null-l and nan-l, nan-r-side, unmatched. The full test passes 30/30 with --no-random-settings --no-random-merge-tree-settings.
f Fix is general, not a narrow patch? Yes. The fix is type-driven, not call-site-driven: markNaNRowsImpl handles every combination of ColumnNullable / ColumnLowCardinality / ColumnTuple over Float32 / Float64 / BFloat16, and recurses through arbitrary nestings (e.g. LowCardinality(Nullable(Float64)) works because the LC branch materialises and re-enters the helper, which then sees ColumnNullable). It addresses the root cause (the helper missed wrapped/tuple shapes) rather than guarding individual call paths. The two sibling NaN-detection helpers in the original PR (extendJoinKeyNullMapWithFloatNaNs in JoinUtils.cpp and foldFloatNaNsIntoNullMap in MergeJoinTransform.cpp) operate on null-map vectors at the join algorithms' boundaries and are not exposed to LowCardinality or Tuple keys at that layer; this fix is scoped to the runtime-filter side where the wrapped shapes do appear.

Session id: cron:clickhouse-ci-task-worker:20260605-235700

Comment thread src/Interpreters/JoinUtils.cpp Outdated
@groeneai

groeneai commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

CI Ledger — covered HEAD: 1b3d5dc0bf15

CI fully finished on the new HEAD (last check 00:55Z, >40min CIDB ingestion buffer met). The new commit 1b3d5dc0bf15 ("filterOutNaNs: unwrap Nullable, LowCardinality and Tuple float keys") addresses the AI Review Major from the previous tick. Pulled the full failure set:

job classification disposition
Build (arm_tidy)cppcoreguidelines-init-variables on src/Storages/ObjectStorage/DataLakes/Iceberg/Mutations.cpp:731,733 (last_version, compression_method) trunk regression, NOT caused by this PR Introduced by master commit 9c9b2b9a0da7 (merge of #106102, "catalogs with schema evolutions + concurrent checks", merged 2026-06-05T17:43:31Z). Hit on master and 21+ unrelated concurrent PRs in the 18:04Z–21:25Z window. Trivial 2-line init fix already filed by us as #106615 (APPROVED by @ alexey-milovidov 23:35Z, awaiting CI). 155 downstream jobs are SKIPPED because they depend on arm_tidy. Not actionable on this PR's branch.

posted-comment:

All other checks for HEAD 1b3d5dc0bf15 passed or skipped-due-to-arm_tidy. The ledger covers the full set; nothing on this PR's diff is failing.

`extendJoinKeyNullMapWithFloatNaNs` (HashJoin / ConcurrentHashJoin path)
and `foldFloatNaNsIntoNullMap` (full_sorting_merge path) only dispatched
on top-level `ColumnFloat32` / `ColumnFloat64` / `ColumnVector<BFloat16>`.
A `ColumnTuple` key fell through, so `tuple(NaN) = tuple(NaN)` (whether
planner-generated or written explicitly as
`ON tuple(l.k) = tuple(r.k)`) still produced a wrong match because the
hash table or merge cursor compared the tuple payload bitwise.

Mirror the recursion landed at 1b3d5dc in
`BuildRuntimeFilterTransform::filterOutNaNs`: unwrap `Nullable` (respect
existing null map for the element-Nullable case),
unwrap `LowCardinality` via `convertToFullColumnIfLowCardinality`, and
recurse into `ColumnTuple` (a row is dropped if ANY float element is
`NaN`, matching scalar tuple equality semantics: `(a, b) = (c, d)` iff
`a = c AND b = d`, with `NaN != NaN`).

`extendJoinKeyNullMapWithFloatNaNs` first runs a type-only walk
(`containsFloatPayload`) so a fresh null map is still allocated lazily;
purely integer/string/date keys keep the previous fast path.

Tuple-keyed `JOIN ON` test cases added to
`04318_106531_join_on_nan_keys.sql` for `hash` and `full_sorting_merge`,
covering plain `tuple(Float64)`, multi-element `tuple(Float64, Float64)`
where only one element is `NaN`, `tuple(Nullable(Float64))` (preserves
NULL-safe tuple equality), and `tuple(LowCardinality(Nullable(Float64)))`.

Demonstrated both directions on a debug build: pre-fix (HEAD~) the
tuple-keyed `INNER JOIN` returns the matched row plus the wrongly-joined
NaN rows; post-fix it returns only the matched row. ANTI returns the
NaN-keyed rows after the fix.

Closes: ClickHouse#106531
@groeneai

groeneai commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

Following commit a08706c (JoinUtils/MergeJoinTransform recursive unwrap for NaN keys, addressing clickhouse-gh[bot] CR on src/Interpreters/JoinUtils.cpp).

# Question Answer
a Deterministic repro? Yes. tests/queries/0_stateless/04318_106531_join_on_nan_keys.sql includes INNER JOIN ... ON tuple(l.k) = tuple(r.k) over Float64 NaN keys (and multi-element tuple, tuple(Nullable(Float64)), tuple(LowCardinality(Nullable(Float64)))) for both hash and full_sorting_merge. Pre-fix every INNER returns the matched row plus the NaN-keyed rows; post-fix only the matched row.
b Root cause explained? JoinCommon::extendJoinKeyNullMapWithFloatNaNs (HashJoin/ConcurrentHashJoin via AddedColumns.cpp:20 and HashJoin.cpp:792) and foldFloatNaNsIntoNullMap (full_sorting_merge via FullMergeJoinCursor::setChunk) only dispatched on top-level ColumnFloat32/ColumnFloat64/ColumnVector<BFloat16>. A ColumnTuple key (planner-generated tuple(a) wrapper for null-safe comparison, or explicit ON tuple(l.k) = tuple(r.k)) fell through, so the hash table / merge cursor compared the tuple payload bitwise via bitEquals / compareAt, and two same-bit-pattern NaNs wrongly matched.
c Fix matches root cause? Yes. Recursive helper markNaNRows(column, mutable_null_map) mirrors BuildRuntimeFilterTransform::markNaNRowsImpl (1b3d5dc): unwrap ColumnNullable (respect existing null map for inner-Nullable), unwrap ColumnLowCardinality via convertToFullColumnIfLowCardinality, recurse into ColumnTuple (drop row if any float element is NaN, matching (a,b)=(c,d) iff a=c AND b=d with NaN!=NaN). extendJoinKeyNullMapWithFloatNaNs keeps lazy allocation via a type-only containsFloatPayload walk so non-float keys don't pay an allocation.
d Test intent preserved / new tests added? Pre-existing assertions kept. New tuple-keyed test cases added to 04318_106531_join_on_nan_keys.sql for hash + full_sorting_merge: tuple(Float64) INNER/ANTI, multi-element tuple(Float64, Float64) where only one element is NaN, tuple(Nullable(Float64)) (NULL-safe tuple equality preserved), tuple(LowCardinality(Nullable(Float64))).
e Both directions demonstrated? Yes. Local debug build, pre-fix run: tuple-keyed INNER JOIN returned matched row + wrongly-joined NaN rows (m, nan-l for tuple(Float64); m, nan-l-k1, nan-l-k2 for the 2-element tuple); post-fix returns only m. ANTI flipped accordingly. 30/30 passes with --no-random-settings. Sibling test 03214_join_on_tuple_comparison_elimination_bug still passes.
f Fix is general, not a narrow patch? Yes. The same NaN-key gap existed in three sibling code paths and is now fixed identically in all three: BuildRuntimeFilterTransform::filterOutNaNs (1b3d5dc), JoinCommon::extendJoinKeyNullMapWithFloatNaNs (this commit), and foldFloatNaNsIntoNullMap (this commit). The recursion handles arbitrary nestings (Tuple(LowCardinality(Nullable(Float))), Nullable(Tuple(...)) with inner-Nullable elements, etc.) by walking column wrappers. Not a defensive guard — fixes the symptom at the root: the helpers now correctly OR-mark every reachable float-NaN row regardless of wrapper depth.

Session id: cron:clickhouse-ci-task-worker:20260606-043300

Comment thread src/Interpreters/HashJoin/HashJoin.cpp
The HashJoin / FullSortingMergeJoin / BuildRuntimeFilter NaN-as-null
treatment did not extend to `SET join_algorithm = 'direct'`. The MergeTree
direct path builds the right-side lookup with `IN` (which intentionally
keeps coalescing NaN per this PR's design) and then maps found rows back
to probe keys via `ColumnsHashing::HashMethodHashed`, which compares the
key payload bitwise. Two `NaN` keys with identical bit patterns therefore
matched for `INNER` / `LEFT SEMI` and were suppressed for `LEFT ANTI`.

Force every probe row whose key holds a `NaN` to the not-found path
(sentinel selector + `null_map = false`) inside `getByKeys`, mirroring
`JoinCommon::extendJoinKeyNullMapWithFloatNaNs`. The recursion walks
`Nullable` / `LowCardinality` / `Tuple` so the same shapes covered on
the hash path work here too. `joinKeyContainsFloatPayload` and
`markFloatNaNRowsAsNull` are exposed in `JoinCommon` so the helper is
shared with the hash and merge paths.

New tests in `04318_106531_join_on_nan_keys.sql` cover
`SET join_algorithm = 'direct'` over MergeTree-keyed `Float64` and
`Nullable(Float64)` for `INNER` / `LEFT` / `LEFT SEMI` / `LEFT ANTI`.

Address-requested in clickhouse-gh[bot] inline review on commit
a08706c (PR ClickHouse#106540) at 2026-06-06T05:03:46Z.

Closes: ClickHouse#106531
@groeneai

groeneai commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

# Question Answer
a Deterministic repro? Yes. SET join_algorithm = 'direct'; SET enable_analyzer = 1; SELECT v, k FROM (SELECT ld.v, ld.k FROM ld INNER JOIN rd ON ld.k = rd.k) ORDER BY v; over ld(k Float64, v String) ENGINE = Memory() and rd(k Float64, w Int32) ENGINE = MergeTree() ORDER BY k with (1.5,'m'),(nan,'nan-l'),(2.5,'unmatched') and (1.5,100),(nan,9999) returns m, 1.5 AND nan-l, nan pre-fix on every run. Same shape for LEFT (nan-l row carries right w=9999 instead of default), LEFT SEMI (nan-l survives), LEFT ANTI (nan-l wrongly suppressed).
b Root cause explained? DirectJoinMergeTreeEntity::buildFilterStepWithIn builds the right lookup with IN (intentionally NaN-coalescing per this PR), so the matching nan rows from the right MergeTree make it past the filter. DirectJoinMergeTreeEntity::getByKeys then maps found right rows back to probe keys using ColumnsHashing::HashMethodHashed, which compares the key payload bitwise via UInt128TrivialHash. Two NaN keys with matching bit patterns hash to the same bucket and bitEquals returns true → findKey reports "found", selector_data points at the matching right row, null_map[k] = true. The downstream INNER / LEFT SEMI filter keeps the row; LEFT ANTI drops it; LEFT returns the right NaN row's payload.
c Fix matches root cause? Yes. getByKeys now pre-computes a per-input-row nan_mask (using JoinCommon::joinKeyContainsFloatPayload + JoinCommon::markFloatNaNRowsAsNull factored out of the hash path), and forces selector_data.push_back(num_found_rows) + out_null_map.push_back(false) for every NaN row. That puts the row on the SAME path the storage already takes for "key not found" — so the downstream selectFirstMatchForEachKey collapse (ANY/SEMI/ANTI) and the null_map-driven filter at the end of joinBlock see exactly the shape they expect for an unmatched probe row. No band-aid: the bug is "the hash map wrongly matched a NaN" and the fix says "treat NaN keys as never-matching, just like NULL". The same recursion the hash path uses for Nullable(Float) / LowCardinality(Nullable(Float)) / tuple-of-Float is reused.
d Test intent preserved / new tests added? Tests added, none weakened. 04318_106531_join_on_nan_keys.sql gets 8 new queries under SET join_algorithm = 'direct': Float64 INNER / LEFT / LEFT SEMI / LEFT ANTI plus Nullable(Float64) INNER / LEFT ANTI. Reference rows mirror the existing hash + full_sorting_merge semantics: NaN dropped on INNER/SEMI, NaN row kept with default right side on LEFT, NaN+unmatched kept on ANTI. The existing assertions for the original #106531 reproducer + the hash, merge, runtime-filter, BFloat16 and tuple shapes are unchanged.
e Both directions demonstrated? Yes. Pre-fix run (rebuilt with the fix git stash-ed away): the new direct-join queries return the bug-shape rows above. Post-fix run: same queries return correct results (only m for INNER, NaN row with default right side for LEFT, only m for SEMI, nan-l, unmatched for ANTI). 30/30 PASS for the full test file via direct clickhouse client --multiquery --queries-file on the post-fix binary.
f Fix is general, not a narrow patch? Yes — verified two questions: (1) Does the same bug pattern exist in sibling code paths? HashJoin / ConcurrentHashJoin (via extendJoinKeyNullMapWithFloatNaNs in JoinUtils.cpp), FullSortingMergeJoin (via foldFloatNaNsIntoNullMap in MergeJoinTransform.cpp), BuildRuntimeFilterTransform (via markNaNRowsImpl) — all already handled in the prior 4 commits of this PR (3bf2729, 1b3d5dc, a08706c). The remaining IKeyValueEntity backends — StorageEmbeddedRocksDB, StorageRedis, StorageKeeperMap, IDictionary — use string/integer-typed primary keys; float keys reach DirectKeyValueJoin only via DirectJoinMergeTreeEntity. The shared JoinCommon::joinKeyContainsFloatPayload + JoinCommon::markFloatNaNRowsAsNull helpers are now exported, so any future float-keyed backend can apply the same treatment in one line. (2) Am I fixing root cause or guarding symptom? Root cause. The bug originates because ColumnsHashing hashes float keys bitwise; we cannot make ColumnsHashing IEEE-754-aware without breaking other consumers. The fix is to not insert NaN keys into the lookup at the only public seam — the boundary between DirectKeyValueJoin (the JOIN-aware caller) and IKeyValueEntity::getByKeys (the storage-level lookup). That seam is exactly where extractNestedColumnsAndNullMap already does the analogous work for Nullable.

Session id: cron:clickhouse-ci-task-worker:20260606-065700

Comment thread src/Interpreters/JoinUtils.cpp Outdated
Comment thread src/Processors/Transforms/MergeJoinTransform.cpp
Comment thread src/Processors/Transforms/BuildRuntimeFilterTransform.cpp
…umnConst

Addresses 5 follow-up review items on PR ClickHouse#106540 (issue ClickHouse#106531):

1. arm_tidy parameter-name mismatch: align JoinUtils.h declaration with
   .cpp definition (mutable_null_map). Fixes the
   readability-inconsistent-declaration-parameter-name warning that
   broke the arm_tidy build on commit c183aff.

2. ColumnNullable branch must respect the outer null map. A
   ColumnNullable may carry arbitrary trash bytes in its nested column
   for rows where the null map says NULL. If those bytes happen to be a
   NaN bit pattern, the previous markFloatNaNRowsAsNull marked the
   whole row in the join null map and broke null-safe tuple equality
   (tuple(NULL) = tuple(NULL) wrongly became a non-match). The
   recursion now threads is_null down so an already-NULL row is left
   alone.

3. MergeJoinTransform foldFloatNaNsIntoNullMap and
   BuildRuntimeFilterTransform filterOutNaNs gain a type-only
   joinKeyContainsFloatPayload guard. Most full-sorting-merge and
   runtime-filter keys are integer/string/date and never carry a NaN,
   so we now skip the per-chunk ColumnUInt8/Filter allocation in those
   cases.

4. Defence-in-depth ColumnConst handling in joinKeyContainsFloatPayload
   and markFloatNaNRowsAsNull (and the parallel helpers in
   MergeJoinTransform and BuildRuntimeFilterTransform). Direct joins
   pass the original probe key column without convertToFullColumnIfConst,
   so a constant NaN can reach this helper as
   ColumnConst(ColumnFloat*(NaN), N). Marking all rows when the
   underlying scalar is NaN keeps the constant-probe semantics aligned
   with the materialised path.

Test: 04318_106531_join_on_nan_keys gains a Tuple(Nullable(Float64))
null-safe-match case where NULL slots carry a NaN trash payload.
Pre-fix this returned 0 (BUG); post-fix it returns 1.
@groeneai

groeneai commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate (commit 25b677f)

# Question Answer
a Deterministic repro? Yes for every CR. CR 1: INSERT INTO l_tnf SELECT tuple(CAST(if(number = 0, NULL, nan) AS Nullable(Float64))) FROM numbers(2) then INNER JOIN returns 0 pre-fix vs 1 post-fix on demand. CR 0 (arm_tidy): clang-tidy readability-inconsistent-declaration-parameter-name on JoinUtils.h:119 vs .cpp:444 — observable in the iter6 CI report. CR 2 / CR 4: foldFloatNaNsIntoNullMap and filterOutNaNs previously allocated a per-chunk ColumnUInt8 / IColumn::Filter for any merge / runtime-filter key — observable by reading the code. CR 3: defence-in-depth, no user-facing repro because the analyzer rejects const = column in JOIN ON, but the type-only joinKeyContainsFloatPayload(*ColumnConst*) path is now exercised by the helpers themselves.
b Root cause explained? (CR 0) Header parameter name mutable_null_mask mismatched the .cpp definition mutable_null_map after the c183aff extraction, tripping readability-inconsistent-declaration-parameter-name. (CR 1) markFloatNaNRowsAsNull recursed into ColumnNullable::getNestedColumn() without using the inner null map. A ColumnNullable is allowed to carry arbitrary trash bytes in its nested column for NULL slots; if those bytes are a NaN bit pattern, the join null-map gets a 1 for an already-NULL row, and tuple(NULL) = tuple(NULL) is wrongly broken. (CR 2 / CR 4) The helpers allocated their per-row state before checking whether the key type can carry a NaN at all. (CR 3) The shared helpers did not recognise ColumnConst, so a constant NaN probe in direct join would fall through.
c Fix matches root cause? (CR 0) Renamed the header parameter to mutable_null_map so it matches the .cpp definition. (CR 1) markFloatNaNRowsAsNull (and its MergeJoinTransform and BuildRuntimeFilterTransform siblings) now thread is_null through the recursion; the ColumnNullable branch ORs the inner null map onto is_null and skips rows that are already NULL. (CR 2 / CR 4) Added a joinKeyContainsFloatPayload guard at the very top of foldFloatNaNsIntoNullMap and filterOutNaNs, before any per-row allocation. (CR 3) Extended joinKeyContainsFloatPayload and markFloatNaNRowsAsNull (and the parallel MergeJoinTransform and BuildRuntimeFilterTransform helpers) to recurse through ColumnConst::getDataColumn(); when the scalar is NaN, the mark is spread across every row that is not already excluded by is_null.
d Test intent preserved / new tests added? New regression case in 04318_106531_join_on_nan_keys.sql: Tuple(Nullable(Float64)) null-safe match where the NULL slot carries a NaN trash payload. Pre-fix returns 0 (CR 1 BUG), post-fix returns 1. The existing 105 lines of test cases (Float32 / Float64 / Nullable(Float64) / BFloat16 / hash + merge + direct grids, tuple variants, runtime filters) all still pass, demonstrating the fast-path guards (CR 2 / CR 4) and ColumnConst recursion (CR 3) are non-regressions.
e Both directions demonstrated? Yes for CR 1 — pre-fix repro shows count() = 0, post-fix shows count() = 1 on the exact same SQL. For CR 0 / CR 2 / CR 3 / CR 4 the validation is the existing test suite + targeted non-float / direct-join cases that already covered them; they all pass post-fix.
f Fix is general, not a narrow patch? Yes. The CR 1 fix (respect outer null map) is applied symmetrically to all three NaN-detection helpers (JoinCommon::markFloatNaNRowsAsNull, MergeJoinTransform::markNaNRows, BuildRuntimeFilterTransform::markNaNRowsImpl). The CR 3 ColumnConst recursion is added at the same three sites for consistency. The CR 2 / CR 4 fast-path guards reuse the shared JoinCommon::joinKeyContainsFloatPayload (introduced in c183aff). I traced the trash-payload bug to its origin (ColumnNullable contract — null rows can hold any nested value); the fix lives at the helper level where the scan happens, not at any specific call site. No symptom guard was added.

Session id: cron:clickhouse-ci-task-worker:20260606-084500

Comment thread src/Interpreters/JoinUtils.cpp
Comment thread src/Interpreters/JoinUtils.cpp Outdated
Address three follow-up CRs from clickhouse-gh[bot] on commit 25b677f
(PR ClickHouse#106540, NaN-as-null treatment for JOIN ON):

CR 1 — `ColumnSparse` was not unwrapped in `joinKeyContainsFloatPayload` /
`markFloatNaNRowsAsNullImpl` (`JoinUtils.cpp`), `markNaNRows`
(`MergeJoinTransform.cpp`), and `markNaNRowsImpl`
(`BuildRuntimeFilterTransform.cpp`). HashJoin and full_sorting_merge
materialise sparse columns upstream, but direct joins and runtime-filter
builds can see them raw. Added a `column.isSparse()` branch in each helper
that calls `convertToFullColumnIfSparse` and recurses on the dense form.

CR 2 — The `ColumnConst` branch in `markFloatNaNRowsAsNullImpl` recursed
into the one-row data column while keeping the full-size `mutable_null_map`,
so the leaf branch's `chassert(data.size() == mutable_null_map.size())`
would trip in debug/ASan builds. Rewrote the branch to evaluate the scalar
into a one-row `scalar_mask`, then spread the mark across all rows not
excluded by `is_null`. Mirrors the pattern already used in
`BuildRuntimeFilterTransform.cpp::markNaNRowsImpl`. Same fix applied to
`MergeJoinTransform.cpp::markNaNRows`.

CR 3 — Re-confirms CR 2 on the `HashJoin.cpp:792` thread; addressed by
the same JoinUtils.cpp rewrite.

Test additions to `tests/queries/0_stateless/04318_106531_join_on_nan_keys`:
- Constant `NaN` probe key (hash) covering the rewritten ColumnConst branch
  via the helper's contract.
- Build-side sparse `Float64` key with runtime filter enabled, exercising
  the new `ColumnSparse` unwrap in `markNaNRowsImpl`.

Local verification (debug build, x86-64-v2): 30/30 passes via
`clickhouse local --queries-file` on the test file, sibling
`03214_join_on_tuple_comparison_elimination_bug` continues to pass.
@groeneai

groeneai commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

# Question Answer
a Deterministic repro? Iter8 addresses three bot CRs that are mostly defensive: CR 2/3 is a debug/ASan chassert(data.size() == mutable_null_map.size()) in the ColumnConst recursion of markFloatNaNRowsAsNullImpl; CR 1 is a ColumnSparse fall-through. Both are reachable in principle from DirectJoinMergeTreeEntity::getByKeys and BuildRuntimeFilterTransform::filterOutNaNs, but in practice upstream materializeColumns / convertToFullColumnIfConst / convertToFullColumnIfSparse calls in HashJoin / MergeTreeRangeReader / runtime-filter stages mostly cover them. The fix follows the bot's defensive guidance: the helper's contract must accept arbitrary column shapes (ColumnConst with N>1 rows, ColumnSparse with float payload) without tripping or silently missing NaN. No user-facing query repro was constructed.
b Root cause explained? (CR 2/3) Iter7's ColumnConst branch called markFloatNaNRowsAsNullImpl(col_const->getDataColumn(), mutable_null_map, is_null). getDataColumn() returns a 1-row scalar column. The recursion lands in markNaNsOfFloatColumn<T> whose first line is chassert(data.size() == mutable_null_map.size()). With data.size() == 1 and mutable_null_map.size() == N, debug/ASan trips. (CR 1) The helper trees in JoinUtils.cpp, MergeJoinTransform.cpp, BuildRuntimeFilterTransform.cpp had no ColumnSparse case; sparse columns fell through typeid_cast chains and the fall-through return false; at the bottom — so a sparse Float64(NaN) build-side or probe-side key was never marked.
c Fix matches root cause? Yes for both. (CR 2/3) Rewrote the ColumnConst branch to evaluate the scalar into a one-row PaddedPODArray<UInt8> scalar_mask(1, 0), recurse into getDataColumn() with that, then spread the mark across mutable_null_map rows not excluded by is_null. Mirrors BuildRuntimeFilterTransform.cpp::markNaNRowsImpl exactly, which the bot itself cited as the correct pattern. (CR 1) Added column.isSparse() branch in all three helper trees (and in joinKeyContainsFloatPayload) that calls convertToFullColumnIfSparse() and recurses on the dense form. No symptom guards, no widening, no narrow patch.
d Test intent preserved / new tests added? Two new test cases added to tests/queries/0_stateless/04318_106531_join_on_nan_keys.sql: (1) a constant NaN probe key under hash join — exercises the rewritten ColumnConst branch through the helper's contract (HashJoin materialises upstream so the chassert was only theoretically reachable in user code, but the test is forward-looking regression coverage of the helper's new defensive shape); (2) a build-side sparse Float64 table with enable_join_runtime_filters = 1, exercising the new ColumnSparse unwrap in markNaNRowsImpl. Existing test cases (Float32/Float64/BFloat16/Nullable/Tuple/LowCardinality, hash + full_sorting_merge + direct + runtime-filter, INNER/LEFT/RIGHT/FULL/SEMI/ANTI) are preserved unchanged.
e Both directions demonstrated? Both pre-fix (iter7 source restored via git stash) and post-fix builds run the full 04318_106531_join_on_nan_keys.sql end-to-end via clickhouse local --queries-file and produce the same correct output. Note: in user-facing query paths neither the iter7 chassert nor the iter7 ColumnSparse fall-through ends up reached (upstream materialisation), so a user-visible diff was not constructed; the fix is per the bot's defensive guidance. Post-fix iteration loop: 30/30 PASS (`for i in 1..30; do clickhouse local --queries-file ...
f Fix is general, not a narrow patch? Yes. (CR 1 ColumnSparse) Mirrored across all three NaN-detection helper trees (HashJoin/runtime-filter via JoinUtils.cpp, full_sorting_merge via MergeJoinTransform.cpp, runtime-filter build via BuildRuntimeFilterTransform.cpp) and into joinKeyContainsFloatPayload. Not a single-call-site patch. (CR 2/3 ColumnConst) The fix replaces a brittle helper contract with a robust one — the helper now accepts arbitrary multi-row ColumnConst shapes without size-mismatch issues, regardless of whether HashJoin/MergeJoin/Direct/RuntimeFilter happen to materialise upstream today. (Root cause vs symptom) The bug is that the helper's recursion-with-shared-mask pattern doesn't compose with ColumnConst's 1-row data column. Fix is at the helper level (the producer of the bad shape — col_const->getDataColumn() is 1 row by design), not a defensive guard in callers.

Session id: cron:clickhouse-ci-task-worker:20260606-102600

Comment thread src/Processors/Transforms/MergeJoinTransform.cpp
The legacy MergeJoin (used by `SET join_algorithm = 'partial_merge'`) sorts
both sides and merges by `nullableCompareAt`, which falls through to
`compareAt` and compares Float32/Float64/BFloat16 bitwise. Two NaNs with
identical bit patterns then form an equal range, so a NaN row on either
side spuriously matches another NaN row, violating IEEE 754
`NaN != NaN`.

The other algorithms patched in this PR (HashJoin, ConcurrentHashJoin,
GraceHashJoin via runtime filter, DirectKeyValueJoin, FullSortingMergeJoin)
all fold NaN into the join-key null map. MergeJoin's sort-then-merge
contract makes the same fold here require a schema change (wrap float keys
in `Nullable` for the cursor view) that the surrounding output path is not
set up for, so we reject instead.

The constructor throws `NOT_IMPLEMENTED` with a message pointing the user
to `hash` / `parallel_hash` / `grace_hash` / `full_sorting_merge`. The new
`MergeJoin::isSupported(table_join, right_sample_block)` overload is used
by `AUTO` and `PARTIAL_MERGE` selection in `PlannerJoins` and
`ExpressionAnalyzer` so existing AUTO queries with float keys silently
fall back to HashJoin (which now handles NaN correctly) instead of
throwing mid-query when JoinSwitcher would attempt to switch to
MergeJoin.

A new type-only helper `JoinCommon::joinKeyTypeContainsFloatPayload`
walks `Nullable` / `LowCardinality` / `Tuple` so all four wrapper shapes
are covered.

Test cases added to `tests/queries/0_stateless/04318_106531_join_on_nan_keys.sql`:
Float64, Nullable(Float64), LowCardinality(Nullable(Float64)) and
Tuple(Float64, Float64) all expected to throw `NOT_IMPLEMENTED` under
`partial_merge`; an Int32 partial_merge case verifies non-float keys
still work.

Pre-fix repro: `SET join_algorithm = 'partial_merge'; SELECT v, w FROM lpm INNER JOIN rpm ON lpm.k = rpm.k`
with `(nan, 'nan-l'), (1.5, 'm')` left-side and `(nan, 999), (1.5, 100)`
right-side returns BOTH `m\t100` AND `nan-l\t999` (BUG: NaN matches NaN).
Post-fix: throws Code 48 NOT_IMPLEMENTED. AUTO with float keys correctly
falls back to HashJoin and returns only `m\t100`.

clickhouse-gh[bot] CR id 4442281467 / comment 3367XXX (partial_merge
algorithm gap) on ClickHouse#106540.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

groeneai commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

Iteration 9 — addressing clickhouse-gh[bot] CR (review id 4442439049, comment 3367178592) on src/Processors/Transforms/MergeJoinTransform.cpp:549 flagging the legacy MergeJoin (partial_merge algorithm) gap.

# Question Answer
a Deterministic repro? Yes. SET join_algorithm = 'partial_merge'; SELECT v, w FROM lpm INNER JOIN rpm ON lpm.k = rpm.k with lpm containing (nan, 'nan-l'), (1.5, 'm') and rpm containing (nan, 999), (1.5, 100) returns nan-l\t999 AND m\t100 pre-fix on every run (BUG). Post-fix: throws Code: 48 NOT_IMPLEMENTED.
b Root cause explained? MergeJoin::nullableCompareAt falls through to IColumn::compareAt for non-Nullable float keys. compareAt on ColumnFloat* uses FloatCompareHelper::compare which compares Float bits via the integer encoding intCompare, returning 0 for two NaNs with the same bit pattern. nullableCompareTrackAt has the same gap. After sortBlock, same-bit-pattern NaNs land adjacent and getNextEqualRangeImpl forms an equal range, so joinEquals joins them.
c Fix matches root cause? Yes. Rejects float JOIN keys at MergeJoin construction with a clear NOT_IMPLEMENTED error (option 2 of the bot's two-option suggestion). The bot accepted "reject/fall back" as one of the two valid fix shapes. The alternative (wrap float keys as Nullable for the cursor) requires schema gymnastics that the legacy block-output path is not set up for.
d Test intent preserved / new tests added? Yes. tests/queries/0_stateless/04318_106531_join_on_nan_keys.sql extended with 5 new partial_merge cases: 4 rejection cases (Float64, Nullable(Float64), LowCardinality(Nullable(Float64)), Tuple(Float64, Float64)) all expected to throw NOT_IMPLEMENTED, plus 1 Int32 "non-float still works" case. Sibling tests 01115_join_with_dictionary (partial_merge with non-float keys + Float64 columns) and 03214_join_on_tuple_comparison_elimination_bug continue to pass.
e Both directions demonstrated? Yes. Pre-fix binary (rebuilt with the fix git-stashed): the BUG repro returns 2 rows incorrectly (NaN matches NaN). Post-fix binary: same query throws Code 48 NOT_IMPLEMENTED with the explicit message; AUTO with float keys silently falls back to HashJoin and returns only m. 30/30 deterministic test passes via clickhouse local --queries-file.
f Fix is general, not a narrow patch? Yes. New type-only helper JoinCommon::joinKeyTypeContainsFloatPayload(IDataType) recurses through Nullable / LowCardinality / Tuple so all four wrapper shapes are covered. Both Planner paths (src/Planner/PlannerJoins.cpp PARTIAL_MERGE+AUTO, src/Interpreters/ExpressionAnalyzer.cpp PARTIAL_MERGE+AUTO) use the new MergeJoin::isSupported(table_join, right_sample_block) overload — JoinSwitcher (the AUTO path with HashJoin → MergeJoin spill) is gated by the same predicate so float keys never reach a switch attempt. Constructor throws as defense-in-depth even if a future caller uses the legacy isSupported(table_join) overload. No symptom widening.

Session id: cron:clickhouse-ci-task-worker:20260606-114000

Comment thread src/Interpreters/MergeJoin.cpp Outdated
@groeneai

groeneai commented Jun 6, 2026

Copy link
Copy Markdown
Contributor Author

CI Ledger — covered SHA: b201d205cacc1c25e1a963e53fcbce8922293a7a

CI fully finished (Finish Workflow ✅ at 2026-06-06T13:00Z, last test completion 13:06:48Z, ≥20min CIDB ingestion buffer met).

Failure Classification Disposition
Stateless tests (amd_debug, parallel)Logical error: Virtual row boundary violated in MergingSortedAlgorithm ... announced Decimal64_'0.000000000' but the source then produced Decimal64_'1704060000.000000000' (STID 2651-3359) + Server died trunk regression / chronic debug-assertion (NOT PR-caused) Tracked as P0 in 2026-06-06-p0-stid-2651-3359--virtual-row-boundary (master regression in MergingSortedAlgorithm::checkVirtualRowBoundary). Cross-PR: 11 distinct PRs hit this STID in 30d (#105386 x2, #79393, #98472, #94937, master, #48941, #93462, #49966, #67954, #106412, ours). Stack lands in Processors/Merges/Algorithms/MergingSortedAlgorithm.cpp:45 (checkVirtualRowBoundary) on a __table1.timestamp Decimal64 column — completely unrelated to this PR's diff (JoinUtils, MergeJoinTransform, BuildRuntimeFilterTransform, MergeJoin, Planner, ExpressionAnalyzer).

PR-caused failure check: none. PR diff is in JOIN-key NaN handling code paths (JoinUtils, HashJoin, MergeJoinTransform, BuildRuntimeFilterTransform, MergeJoin, planner). The triggering query in the failure stack is a sort-merge over a single timestamp column, no JOIN involved — different code path. The test triggering the assertion in the cross-PR set spans many years (PR #48941 from 2023 through #106412 today), and master itself was in the hit list — pure trunk regression in the merge-sorted virtual-row machinery.

Iteration 10 (2026-06-06 19:46Z) was metadata-only (PR body changelog disclosing partial_merge rejection migration path) — no new commit; the head SHA stays b201d205cacc.

Moving to on_review.

@PedroTadim

Copy link
Copy Markdown
Member

The PR looks ok to me. @vdimir are the changes in a hot loop that may cause a performance degradation? Can you give a final look?

@vdimir vdimir left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see performance implications, but overall looks a bit over-complicated, I left some comments. Also can we check how nan handling implemented in other places (such as aggregation or whatever we also do some comparisons)?

Comment thread src/Interpreters/MergeJoin.cpp Outdated
const auto & key_type = right_sample_block.getByName(key_name).type;
if (key_type && JoinCommon::joinKeyTypeContainsFloatPayload(*key_type))
throw Exception(
ErrorCodes::NOT_IMPLEMENTED,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is backward incompatible. Better to keep the existing behavior and add a note to the docs for the algorithm that it may treat NaNs unexpectedly.

Comment thread src/Interpreters/MergeJoin.cpp Outdated
return isSupported(table_join->kind(), table_join->strictness()) && table_join->oneDisjunct();
}

bool MergeJoin::isSupported(const std::shared_ptr<TableJoin> & table_join, const Block & right_sample_block)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for compatibility

return markNaNRows(*full, mutable_null_map, is_null);
}

if (const auto * tuple = typeid_cast<const ColumnTuple *>(&column))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tuple we treat tuple(null) == tuple(null), because null here is not on top level, keeping such behavour for compatibility

Comment thread src/Interpreters/JoinUtils.h Outdated
Comment on lines +93 to +113
/// Extend the JOIN-key null map so any row whose float key holds a `NaN` is treated as `NULL`.
/// Per `IEEE 754` (and SQL `JOIN ON` semantics) `NaN != NaN`, but the hash table compares float
/// keys bitwise via `bitEquals`, so two `NaN`s with identical bit patterns wrongly match. Folding
/// `NaN` rows into the null map makes both build and probe sides skip them, consistent with how
/// `NULL` keys never join.
///
/// `key_columns` are the raw key columns AFTER `extractNestedColumnsAndNullMap` peeled off any
/// outer `Nullable`. If `null_map_holder` is empty and at least one float column has a `NaN`,
/// a fresh `ColumnUInt8` is allocated and assigned. `null_map` is updated to point into the
/// (possibly new) holder. Non-float columns are skipped.
void extendJoinKeyNullMapWithFloatNaNs(
const ColumnRawPtrs & key_columns,
ColumnPtr & null_map_holder,
ConstNullMapPtr & null_map);

/// Quick type-only walk: does `column` (post `extractNestedColumnsAndNullMap` peeling) carry
/// any float payload at all? Recurses into `Nullable`, `LowCardinality` and `Tuple`. Used to
/// avoid scanning purely integer/string/date keys for `NaN`.
bool joinKeyContainsFloatPayload(const IColumn & column);

/// Same idea but inspecting the static `DataTypePtr` rather than a runtime column. Used in

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure why we have such complexity here with sevelra helpers. Why not to introduce getNanMask method for ColumnVector and just use it in implemnetations and disjunct with null_mask?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 6243982. Introduced ColumnVector<T>::getNanMask(PaddedPODArray<UInt8> &) (OR-marks NaN rows for floating-point T, no-op otherwise) and collapsed the three duplicated NaN cascades (JoinUtils.cpp, BuildRuntimeFilterTransform.cpp, MergeJoinTransform.cpp) into the single shared JoinCommon recursion, which now calls getNanMask at the leaf. Each consumer ORs the resulting mask into its existing null map (disjunct with null_mask), as you suggested. Net -250 lines.

The tuple(NULL) = tuple(NULL) compatibility you raised on MergeJoinTransform.cpp:430 is preserved by the same change: the shared recursion's Nullable branch skips already-NULL rows, so a tuple-wrapped NULL is never marked as NaN. 04318_106531_join_on_nan_keys (which covers that case) passes 50/50 with randomization.

@groeneai

groeneai commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

CI Ledger — covered HEAD: 62c5d9b982a2

CI fully finished on this HEAD: Finish Workflow SUCCESS at 2026-06-09T15:22:45Z, no in-flight checks, ≥40 min ingestion buffer met.

job failure classification disposition
Stateless tests (amd_llvm_coverage, old analyzer, s3 storage, DatabaseReplicated, WasmEdge, sequential) 02440_mutations_finalization (1 initial fail, 0/3 reruns failed → flaky) chronic flaky, NOT PR-caused Tracked in our task 2026-06-09-p3-chronic-flaky-02440mutationsfinaliz. Cross-PR (30 d): 6 distinct unrelated PRs (#77210, #96130, #105822, #103706, #101783, #104976), low single-digit per-PR hits. The recently closed maintainer issue #106390 (closed 2026-06-04) and older #65585 confirm the chronic profile. The test exercises ALTER … UPDATE mutation finalization on MergeTree with replicas — zero overlap with this PR's JOIN-key NaN diff (JoinUtils.{h,cpp}, HashJoin.cpp, AddedColumns.cpp, MergeJoinTransform.cpp, BuildRuntimeFilterTransform.cpp, new test 04318_106531_join_on_nan_keys).

Mergeable Check reports failed only because of that single chronic-flaky-test row; Finish Workflow is green. Nothing on this PR's diff is failing.

Posted by cron:our-pr-ci-monitor:20260609-160000

Per `@vdimir` `CHANGES_REQUESTED` review on PR ClickHouse#106540, restore
the legacy `partial_merge` (`MergeJoin`) behaviour for float `JOIN ON`
keys (`NaN` matches `NaN` bitwise) and document the gotcha instead of
throwing `NOT_IMPLEMENTED`. The previous commit (`b201d205cacc`) was
backward incompatible and rejected existing user queries.

Reverted:
- `MergeJoin::MergeJoin` constructor `NOT_IMPLEMENTED` throw on float
  key types.
- `MergeJoin::isSupported(table_join, right_sample_block)` overload.
- `JoinCommon::joinKeyTypeContainsFloatPayload` helper (now unused).
- Test cases asserting `serverError NOT_IMPLEMENTED` for the four float
  key shapes; the test now verifies that `partial_merge` returns the
  legacy bitwise-equal output.

Added a docs `:::note Float keys` block in the `join_algorithm` setting
description so users can pick a `NaN`-safe algorithm (`hash`,
`parallel_hash`, `grace_hash`, `full_sorting_merge`, `direct`) when it
matters.

The other algorithms in this PR (`HashJoin`, `ConcurrentHashJoin`,
`GraceHashJoin` runtime filter, `DirectKeyValueJoin`,
`FullSortingMergeJoin`) keep their `NaN`-as-null fold and continue to
match IEEE 754 `NaN != NaN`.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

groeneai commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate (commit 4cb2d05a182b — revert partial_merge float-key rejection per @vdimir feedback)

# Question Answer
a Deterministic repro? Yes. Pre-revert: SET join_algorithm='partial_merge'; SELECT v, w FROM lpm INNER JOIN rpm ON lpm.k = rpm.k with (nan, 'nan-l'), (1.5, 'm') left and (nan, 999), (1.5, 100) right throws Code: 48 NOT_IMPLEMENTED from the new constructor check at MergeJoin.cpp:619. Post-revert: same query returns m\t100\nnan-l\t999 (legacy bitwise-equal NaN behaviour).
b Root cause explained? The previous commit (b201d205cacc) added a hard NOT_IMPLEMENTED throw in MergeJoin for any float JOIN ON key, which broke existing user queries that used partial_merge with float keys. partial_merge cannot express NaN != NaN through its sort-then-merge comparator, but as @vdimir pointed out, rejecting the query is backward incompatible — pre-PR users got NaN-matches-NaN bitwise, and rejecting at construction time turns previously-passing queries into runtime errors. The right fix is to keep the legacy behaviour and document the algorithm-specific NaN semantics.
c Fix matches root cause? Yes. The change is a clean revert of (1) the constructor throw, (2) the new isSupported(table_join, right_sample_block) overload, (3) its three call-site updates in ExpressionAnalyzer.cpp / PlannerJoins.cpp, (4) the now-unused JoinCommon::joinKeyTypeContainsFloatPayload helper, and adds a :::note Float keys block to the join_algorithm setting docs (src/Core/Settings.cpp). No symptom-masking.
d Test intent preserved / new tests added? Yes. The four -- { serverError NOT_IMPLEMENTED } cases for Float64/Nullable/LC(Nullable)/Tuple are removed (they only existed to assert the rejection that we're undoing). They are replaced with a single positive case asserting that partial_merge returns m\t100\nnan-l\t999 — i.e. the legacy bitwise-equal output — which fails without this revert (the constructor would throw before the SELECT can run). The remaining hash/parallel_hash/grace_hash/full_sorting_merge/direct cases that exercise the NaN-as-null fold are unchanged and still verify that those algorithms canonicalize NaN.
e Both directions demonstrated? Yes. Pre-revert (b201d205cacc): the new test case throws Code 48 — verified locally before reverting. Post-revert (4cb2d05a182b): tests/clickhouse-test --no-random-settings --test-runs 50 04318_106531_join_on_nan_keys passes 50/50; the directly-run partial_merge query returns m\t100\nnan-l\t999, the directly-run hash query returns just m\t100 (NaN-safe), confirming the algorithm-specific NaN handling matches the docs note. 01010_partial_merge_join, 01010_partial_merge_join_const_and_lc, 01010_partial_merge_join_negative, 01147_partial_merge_full_join all pass.
f Fix is general, not a narrow patch? N/A — this is a documentation + revert change. partial_merge keeps its historical behaviour; the symptom (NaN matches NaN for float keys in partial_merge) is now a documented and explained gotcha rather than a silent correctness bug, and users who need IEEE 754 NaN != NaN are pointed at hash / parallel_hash / grace_hash / full_sorting_merge / direct (all of which canonicalize NaN to NULL via this PR). The other 5 algorithms still get the proper NaN-as-null fold.

This addresses @vdimir CR comments #1 (MergeJoin.cpp:625) and #2 (MergeJoin.cpp:1296). The remaining vdimir feedback (#3 tuple null compat, #4 getNanMask refactor, top-level NaN-handling audit) and the bot's DirectKeyValueJoin / EmbeddedRocksDB gap will be addressed in follow-up commits.

Session id: cron:clickhouse-ci-task-worker:20260609-161751

Comment thread src/Core/Settings.cpp
@groeneai

Copy link
Copy Markdown
Contributor Author

@vdimir re your question "can we check how nan handling is implemented in other places (aggregation or wherever we do comparisons)?" - I audited the other equality/comparison contexts. Short answer: every other place uses bitwise (bitEquals/memcmp) semantics that coalesce same-bit NaNs, and that is intentional. JOIN ON is the only context that is a SQL predicate (a = b must be false for NaN), which is why this PR changes only JOIN ON.

Context Primitive Same-bit NaN behavior Location
GROUP BY / DISTINCT / uniq bitEquals (memcmp) coalesce: one group / one distinct value Common/HashTable/HashTable.h:74-82
IN / Set membership bitEquals (same HashSet as GROUP BY) NaN IN (NaN) matches HashTable.h:74-82 (via Interpreters/SetVariants.cpp)
float key hashing intHashCRC32 raw bits hashed; only 0.0/-0.0 canonicalized, NaN is not Common/HashTable/Hash.h:109-127
ORDER BY / sort FloatCompareHelper::compare NaN == NaN for ordering (returns 0); NaN ordered per nan_direction_hint Core/CompareHelper.h:67-81
JOIN ON (this PR) was bitEquals; now NaN folded into key null map NaN = NaN does not match Interpreters/JoinUtils.cpp

The bitwise behavior in the hash table is load-bearing, not accidental: the comment at HashTable.h:70-72 says float keys must be compared by bit equality "otherwise the invariants in hash table probing do not [get] met when NaNs are present." So I deliberately did not touch the hash-table layer. The fix sits one level up, in the JOIN-key null map, so GROUP BY / IN / DISTINCT / uniq keep their current semantics and only the JOIN ON predicate changes.

Why leaving the others unchanged is correct:

  • GROUP BY / DISTINCT / uniq bucket identical values; coalescing same-bit NaNs is the expected behavior (matches std::hash<float> and Postgres, where NaN groups with itself).
  • ORDER BY needs a total order, so NaN must have a defined position and two NaNs must compare equal. Changing that would break sorting.
  • IN is set membership, same bucketing argument as GROUP BY.

The existing test already pins this (04318_106531_join_on_nan_keys.sql:408-414): GROUP BY and DISTINCT on NaN still collapse to one group/value after the JOIN fix.

On your other comments: I will address the getNanMask simplification (replace the helper cascade with a per-column primitive disjuncted into the null map) and the MergeJoinTransform tuple-null compatibility note in follow-up commits.

Address @vdimir review feedback on PR ClickHouse#106540 (the JOIN ON NaN-keys fix
for issue ClickHouse#106531). The previous version carried three near-identical
~80-line recursive NaN-detection cascades (in JoinUtils.cpp,
BuildRuntimeFilterTransform.cpp and MergeJoinTransform.cpp), each
re-implementing the same Nullable/LowCardinality/Tuple/ColumnConst/
ColumnSparse unwrap plus a per-float-type leaf loop.

Replace them with a single per-column primitive: ColumnVector<T>::getNanMask,
which OR-marks NaN rows for floating-point T (no-op otherwise). The shared
JoinCommon recursion (markFloatNaNRowsAsNull / joinKeyContainsFloatPayload)
keeps the wrapper unwrap in one place and calls getNanMask at the leaf;
the two transforms now reuse that shared recursion instead of private copies.

Behaviour is unchanged: NaN keys are still treated as never-matching for
JOIN ON, while null-safe tuple equality (tuple(NULL) = tuple(NULL)) is
preserved because the Nullable branch skips already-NULL rows. Net -250
lines across the four join code paths.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate

This push (6243982) addresses @vdimir review item C ("introduce getNanMask method for ColumnVector and just use it in implementations and disjunct with null_mask"). It is a behaviour-preserving refactor of the NaN-handling code already validated in CI on the prior HEAD.

# Question Answer
a Deterministic repro? Yes. SELECT count() FROM (SELECT nan::Float64 AS k) l JOIN (SELECT nan::Float64 AS k) r ON l.k=r.k returns 1 without the original fix and 0 with it, deterministically, across hash / full_sorting_merge / direct and Float32/Float64/BFloat16.
b Root cause explained? The three join code paths each carried a near-identical ~80-line recursive NaN cascade (Nullable/LowCardinality/Tuple/ColumnConst/ColumnSparse unwrap + a per-float-type leaf loop). That duplication is the "over-complicated, several helpers" vdimir flagged. The actual NaN test is a one-liner per float column.
c Fix matches root cause? Yes. Adds the named primitive ColumnVector<T>::getNanMask (OR-marks NaN rows; no-op for non-float T) and collapses the three copies into the single shared JoinCommon recursion, which now calls getNanMask at the leaf. Consumers OR the mask into their existing null map ("disjunct with null_mask").
d Test intent preserved / new tests added? Preserved. No test assertions changed. 04318_106531_join_on_nan_keys already covers tuple(Nullable(Float64)) null-safe match, trash-NaN-payload NULL rows, direct/runtime-filter/full-sorting-merge — all still pass. Item B (tuple(NULL)=tuple(NULL) still matches) is preserved structurally: the Nullable branch skips already-NULL rows, so a tuple-wrapped NULL is never marked NaN.
e Both directions demonstrated? Yes. 04318_106531_join_on_nan_keys passes 50/50 with full randomization on a local debug build; direct queries confirm NaN keys still drop (0 matches) and non-NaN floats still match (1) on hash/full_sorting_merge/BFloat16, and tuple(NULL,1)=tuple(NULL,1) still matches (1).
f Fix is general, not a narrow patch? Yes. This is the generalization: instead of three independent NaN cascades drifting apart, NaN detection lives in one place (ColumnVector::getNanMask) and the unwrap logic in one place (JoinCommon). All four join paths (HashJoin, FullSortingMergeJoin, BuildRuntimeFilter, DirectKeyValueJoin) share it. Net -250 lines.

Session id: cron:clickhouse-worker-slot-8:20260610-230810

…erMap)

The generic direct (key-value) JOIN path did not honour IEEE 754 / SQL
JOIN ON NaN != NaN for IKeyValueEntity backends. DirectKeyValueJoin::joinBlock
forwarded the probe key to storage->getByKeys; StorageEmbeddedRocksDB (and
Redis / KeeperMap) serialize the key Field bitwise, so a probe NaN matched a
build-side NaN with the same bit pattern. Only DirectJoinMergeTreeEntity masked
NaN internally, so the direct algorithm overlisted NaN-safety in the
join_algorithm docs for the generic backends.

Fix: fold NaN probe rows to the not-found path inside DirectKeyValueJoin::joinBlock,
gated on offsets.empty() so only the generic backends (which return a 1:1 result
with empty offsets) are touched; DirectJoinMergeTreeEntity already masks NaN and
returns non-empty offsets, so it is left unchanged. Reuses the shared
JoinCommon::markFloatNaNRowsAsNull recursion (Nullable / LowCardinality / Tuple),
so the masking matches the hash / full_sorting_merge paths exactly. NaN rows are
remapped to a default sentinel and null_map=false, mirroring how the MergeTree
entity handles its own NaN rows, which yields the correct shape for INNER (drop),
LEFT (default / NULL), LEFT SEMI (drop), and LEFT ANTI (keep).

Adds 04318_106531_join_on_nan_keys_rocksdb_direct (tagged no-fasttest, use-rocksdb)
covering all four join kinds plus join_use_nulls, Float32 keys, non-NaN regression,
and a String key literally equal to 'nan' (must still match). Split from the parent
04318 test because EmbeddedRocksDB is not in the fast-test build.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate (commit deba70cd)

Addresses the clickhouse-gh[bot] change-request on Settings.cpp (the generic direct / EmbeddedRocksDB NaN gap). Chose Option A (close the gap) over Option B (bound scope + reword docs), so direct now legitimately belongs in the NaN-safe algorithm list.

# Question Answer
a Deterministic repro? Yes. CREATE TABLE rdb (k Float64, v String) ENGINE = EmbeddedRocksDB PRIMARY KEY k; INSERT VALUES (1.0,'one'),(nan,'rocks_nan'),(2.0,'two'); then SELECT l.x, r.v FROM (SELECT arrayJoin([1.0, nan, 2.0]) AS x) l INNER JOIN rdb r ON l.x = r.k SETTINGS join_algorithm='direct' returns nan -> rocks_nan every time on 6243982. EXPLAIN confirms FilledJoin (the direct path), not a fallback.
b Root cause explained? DirectKeyValueJoin::joinBlock (DirectJoin.cpp:159) forwards the raw probe key to storage->getByKeys. For generic IKeyValueEntity backends, StorageEmbeddedRocksDB::getByKeys serializes the probe Field bitwise (StorageEmbeddedRocksDB.cpp:940-947), so a probe NaN matches a build-side NaN with the same bit pattern. Only DirectJoinMergeTreeEntity::getByKeys masked NaN internally, so the generic path stayed unmasked.
c Fix matches root cause? Yes. Masks NaN probe rows at the consumer (DirectKeyValueJoin::joinBlock), gated on offsets.empty() so only the generic backends (which return a 1:1 result with empty offsets) are touched; DirectJoinMergeTreeEntity (non-empty offsets) is left unchanged. Reuses the shared JoinCommon::markFloatNaNRowsAsNull recursion introduced in 6243982, so masking matches the hash / full_sorting_merge paths.
d Test intent preserved / new tests added? New test 04318_106531_join_on_nan_keys_rocksdb_direct (no-fasttest, use-rocksdb): all four direct join kinds (INNER drop, LEFT default, LEFT SEMI drop, LEFT ANTI keep) + join_use_nulls (NaN -> NULL) + Float32 keys + non-NaN regression + a String key literally 'nan' that must still match. Split from the parent 04318 because RocksDB is not in the fast-test build; parent 04318 stays dependency-free and still passes.
e Both directions demonstrated? Yes. Without the fix (6243982): INNER and LEFT both return nan -> rocks_nan. With the fix (deba70cd): INNER drops NaN, LEFT yields nan -> '' (or NULL with join_use_nulls), SEMI drops, ANTI keeps. Matches the already-fixed hash reference exactly. New test 30/30 deterministic passes; parent 04318 still passes.
f Fix is general, not a narrow patch? Yes. The fix lives in DirectKeyValueJoin::joinBlock, the single consumer of getByKeys, so it covers ALL generic IKeyValueEntity backends (EmbeddedRocksDB, Redis, KeeperMap), not just RocksDB. The offsets.empty() gate is the structural discriminator between self-masking (MergeTree) and non-masking backends, not a RocksDB-specific check. Not a defensive symptom guard: the masking is the same canonicalization the other algorithms apply, applied at the point the unmasked key enters the lookup.

Session id: cron:clickhouse-worker-slot-15:20260611-004100

Comment thread src/Core/Settings.cpp
When using the `partial_merge` algorithm, ClickHouse sorts the data and dumps it to the disk. The `partial_merge` algorithm in ClickHouse differs slightly from the classic realization. First, ClickHouse sorts the right table by joining keys in blocks and creates a min-max index for sorted blocks. Then it sorts parts of the left table by the `join key` and joins them over the right table. The min-max index is also used to skip unneeded right table blocks.

:::note Float keys
Float `JOIN ON` keys are compared bitwise by `partial_merge`, so two `NaN` rows with identical bit patterns spuriously match. The other algorithms (`hash`, `parallel_hash`, `grace_hash`, `full_sorting_merge`, `direct`) treat `NaN` as never equal, matching IEEE 754 semantics. Use one of those algorithms when `NaN`-safety matters.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code now documents and tests partial_merge as the legacy exception that still bitwise-matches same-pattern NaN keys, but the PR description and changelog entry still say partial_merge now rejects float JOIN ON keys with NOT_IMPLEMENTED and that users must migrate. That release note is now wrong for the current code: users will expect an exception/migration, while partial_merge still returns same-bit NaN matches.

Please update the PR description and changelog entry to say that partial_merge keeps its legacy bitwise behavior and is the documented exception; only hash, parallel_hash, grace_hash, full_sorting_merge, and direct treat NaN as never equal.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@groeneai check this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the PR description and changelog entry to match the code.

After the 4cb2d05 revert (per the backward-compat request on MergeJoin.cpp), partial_merge no longer rejects float keys with NOT_IMPLEMENTED. It keeps its legacy behavior: float JOIN ON keys are compared bitwise, so two NaN rows with identical bit patterns still match. The description and changelog now state exactly that and point users at the NaN-safe algorithms (hash, parallel_hash, grace_hash, full_sorting_merge, direct). This matches the :::note Float keys block already in this join_algorithm doc.

@clickhouse-gh

clickhouse-gh Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.70% 84.70% +0.00%
Functions 92.30% 92.30% +0.00%
Branches 77.30% 77.30% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 167/190 (87.89%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

Comment on lines +179 to +195
auto & selector_data = selector->getData();
selector_data.resize(num_rows);
for (size_t i = 0; i < num_rows; ++i)
{
if (nan_mask[i])
{
selector_data[i] = num_rows;
null_map[i] = 0;
}
else
selector_data[i] = i;
}
MutableColumns chunk_columns = joined_chunk.mutateColumns();
for (auto & col : chunk_columns)
{
col->insertDefault();
col = IColumn::mutate(col->index(*selector, 0));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why to use selector rather then just IColumn::filter ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IColumn::filter can only delete rows, but a NaN probe key must not be deleted for every kind: regular LEFT keeps the left row with a default/NULL right side, and LEFT ANTI keeps the row entirely. The generic backend returns a 1:1 result (one right row per probe row) plus a null_map; setting null_map[i]=0 alone handles INNER/SEMI/ANTI (they filter on null_map at the end), but for regular LEFT the row is kept and would still show the stale bitwise-matched right value. So I remap NaN rows to an appended default sentinel via index() and set null_map[i]=0, which routes them through the exact same not-found path the code already produces for a genuinely missing key.

This mirrors DirectJoinMergeTreeEntity::getByKeys, which uses the same selector-to-sentinel pattern for its own not-found rows. Verified against EmbeddedRocksDB (Float64 PK with NaN): all four kinds match the hash reference (INNER -> drop, LEFT -> default right, SEMI -> drop, ANTI -> keep). A plain filter would give the wrong shape for LEFT and ANTI.

That said, I agree the block is a bit heavy. If you prefer, I can drop the index() selector and instead blank only the right columns for NaN rows in place (the chunk is already 1:1 here, so no reindex is needed), keeping null_map[i]=0 to drive the filtering. Happy to switch to whichever you find cleaner.

/// `Tuple` like `JoinCommon::extendJoinKeyNullMapWithFloatNaNs` does on the hash path.
PaddedPODArray<UInt8> nan_mask;
bool any_nan = false;
if (JoinCommon::joinKeyContainsFloatPayload(*keys[0].column))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't it work automatically with direct join with MT, since we build plan with IN which should handle nans proerly?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked this empirically and the IN plan does not make the masking redundant, because IN coalesces NaN (the opposite of JOIN ON):

nan IN (nan)  -> 1   (true)

So buildFilterStepWithIn keeps the right-side NaN row rather than excluding it. The IN step is only a coarse candidate pre-filter to fetch matching right rows; the actual key matching happens afterwards in the ColumnsHashing findKey loop, which compares the key payload bitwise, so a probe NaN then matches the retained right NaN.

Confirmed by disabling just the mask and rebuilding: MT direct INNER JOIN ON l.k = r.k with NaN on both sides returns the NaN row (wrong) without the mask, and drops it (correct, matching hash) with it. So the DirectJoinMergeTreeEntity mask is load-bearing, not redundant.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For NULL we treat tuple(NULL) == tuple(NULL), for backward compatibility perhaps for nans it could be consistent with nulls

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, and it is worth settling explicitly. The reason I treat tuple-nested NaN differently from tuple-nested NULL is that the two are not actually consistent in ClickHouse's own comparison semantics:

nan = nan                        -> 0      (false)
nan IS NOT DISTINCT FROM nan     -> 0      (false)   <-- NaN is distinct from itself
NULL IS NOT DISTINCT FROM NULL   -> 1      (true)
tuple(nan)  = tuple(nan)         -> 0      (false)
tuple(NULL) = tuple(NULL)        -> NULL   (not true)

The null-safe tuple wrap in PlannerJoins.cpp exists specifically so tuple(NULL) matches tuple(NULL), i.e. it implements a IS NOT DISTINCT FROM b. But that very operator says nan IS NOT DISTINCT FROM nan is false, so making tuple-nested NaN match would make the JOIN disagree with the scalar <=> it is supposed to implement. Concretely, on Nullable(Float64):

SELECT v FROM la INNER JOIN ra ON la.k <=> ra.k   -- la=(1.5,'m'),(nan,'nan-l'),(NULL,'null-l')
master:     m, nan-l, null-l     (nan-l wrongly matches)
this PR:    m, null-l            (NULL still matches null-safe; NaN dropped, matching nan<=>nan=false)

So the current behavior keeps NULL matching (compat preserved) while making NaN consistent with both scalar = and <=>. If you would still prefer tuple-nested NaN to match for strict backward compatibility with pre-PR partial_merge-style bitwise behavior, I can scope the fold to top-level float keys only (plain ON a = b and multi-key AND, which arrive as separate top-level columns, not as a ColumnTuple), leaving explicit/<=>-wrapped tuple keys untouched. That is a smaller change and also drops the tuple recursion. Let me know which semantics you want and I will adjust.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default

Projects

None yet

Development

Successfully merging this pull request may close these issues.

INNER JOIN on NaN condition returns incorrect rows with HAVING + UNION DISTINCT when enable_optimize_predicate_expression=0

4 participants