Fix UB in BSINumericIndexedVector#100086
Conversation
…IndexedVector` `UInt64(std::floor(rhs))` is undefined behavior when `rhs` is negative: for integer types, `std::floor` implicitly converts to double first; for float types, casting a negative double to `UInt64` is UB per C++ standard. `checkValidValue` throws for negative values at runtime, but the compiler may not see that line 1281 is unreachable for negatives, and UBSan fires. Fix by using `if constexpr` to skip `std::floor` for integer types and an explicit non-negative check for float types. https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=99869&sha=9b37a8c67f4b0fceacf60174b3bb77d2b8477431&name_0=PR&name_1=AST%20fuzzer%20%28amd_ubsan%29 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR aims to eliminate undefined behavior in BSINumericIndexedVector when pointwise operations (notably ...PointwiseEqual) are invoked with negative scalar values, and adds a stateless regression test for the scenario.
Changes:
- Adds a new stateless query test for negative scalar inputs to
numericIndexedVectorPointwiseEqual. - Introduces a negative-value rejection in
checkValidValue. - Adjusts scalar-to-integer conversion logic in
BSINumericIndexedVector::pointwiseEqual(lhs, rhs).
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
src/AggregateFunctions/AggregateFunctionGroupNumericIndexedVectorDataBSI.h |
Adds validation and changes scalar conversion logic in pointwise equality path. |
tests/queries/0_stateless/04049_numeric_indexed_vector_negative_scalar_ubsan.sql |
New regression test asserting INCORRECT_DATA for negative scalar inputs. |
tests/queries/0_stateless/04049_numeric_indexed_vector_negative_scalar_ubsan.reference |
Reference output file for the new stateless test. |
You can also share your feedback on Copilot code review. Take the survey.
| if (value < 0) | ||
| { | ||
| throw Exception(ErrorCodes::INCORRECT_DATA, "NumericIndexedVector does not support negative values"); | ||
| } |
| UInt64 long_value; | ||
| if constexpr (std::is_floating_point_v<ValueType>) | ||
| { | ||
| auto floored = std::floor(rhs); | ||
| long_value = floored >= 0 ? static_cast<UInt64>(floored) : static_cast<UInt64>(static_cast<Int64>(floored)); | ||
| } |
| -- Negative scalar values passed to NumericIndexedVector pointwise operations | ||
| -- should throw INCORRECT_DATA, not trigger undefined behavior. | ||
|
|
||
| DROP TABLE IF EXISTS t_int; | ||
| CREATE TABLE t_int (ds Date, uin UInt32, value Int64) ENGINE = MergeTree() ORDER BY ds; | ||
| INSERT INTO t_int VALUES ('2023-12-26', 1, 1); | ||
|
|
||
| SELECT numericIndexedVectorPointwiseEqual(groupNumericIndexedVectorState(uin, value), -1) FROM t_int; -- { serverError INCORRECT_DATA } | ||
|
|
|
Please ping me when it's ready for review, I'll take a look |
…e` instead of rejecting negative values The blanket negative value check in `checkValidValue` broke all tests using signed types (Int8/16/32/64). BSI vectors support negative values via two's complement. The actual UBSan issue is the unguarded `Int64(value * scaling)` cast in `initializeFromVectorAndValue` when the float value is out of Int64 range. Add an overflow check matching the existing pattern in the `add` method. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ric_indexed_vector
| if constexpr (std::is_floating_point_v<ValueType>) | ||
| { | ||
| auto floored = std::floor(rhs); | ||
| long_value = floored >= 0 ? static_cast<UInt64>(floored) : static_cast<UInt64>(static_cast<Int64>(floored)); |
There was a problem hiding this comment.
pointwiseEqual still has UB here for out-of-range floating values:
floored >= 0branch:static_cast<UInt64>(floored)is undefined whenfloored > UInt64::max().floored < 0branch:static_cast<Int64>(floored)is undefined whenfloored < Int64::lowest().
checkValidValue only rejects NaN/Inf, so finite values like 1e30 / -1e30 can still hit this path.
Please add an explicit bounds check (or reuse float64ToUInt64, which already clamps both sides) before integer casts.
|
@rienath, I think it is ready for review. |
…ric_indexed_vector
…oat scalars The previous fix addressed the float-to-UInt64 cast for negative values by going through Int64 first, but the `decimal_value` computation was still broken for negative floats (converting the two's complement UInt64 back to float gives a huge positive number, leading to another UB cast). Rewrite `pointwiseEqual` to use the same unified fixed-point conversion as `initializeFromVectorAndValue`: compute `Int64 scaled_value` with an explicit overflow guard, then compare all bits of the two's complement representation in a single loop. For out-of-range scalars, return an empty bitmap (no element can match) instead of triggering UB. Also add test coverage for `pointwiseEqual` with out-of-range and negative float scalars. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ric_indexed_vector
…m shift underflow
When `total_bit_num` is 0 (possible with `groupNumericIndexedVectorState('BSI', 0, 0)`),
the expression `bit_pattern >> (total_bit_num - 1)` causes undefined behavior because
`total_bit_num - 1` wraps to `UINT32_MAX`. Since a zero-bit BSI can only represent 0,
and the `rhs == 0` case is already handled above, return an empty bitmap immediately.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ric_indexed_vector
`static_cast<Float64>(std::numeric_limits<Int64>::max())` rounds up to 2^63 in Float64, so values at exactly 2^63 passed the old `fabs(value) > lim` guard and then hit `static_cast<Int64>(...)` which is undefined behavior. Replace the rounded-Float64 bound with an exact-integer-domain check: compute the scaled value first, then compare against 2^63 (which is exactly representable in Float64) using `>= int64_upper || < -int64_upper`. Applied to all three call sites: `initializeFromVectorAndValue`, `pointwiseEqual`, and `addValue`. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LLVM Coverage Report
Changed lines: 92.54% (62/67) | lost baseline coverage: 9 line(s) · Uncovered code |
alexey-milovidov
left a comment
There was a problem hiding this comment.
The code is quite complex, but the changes look good.

Closes #100052
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Make correctly processing negative values inside NumericIndexedVectorDataBSI
Version info
26.4.1.778