Fix UB in BSINumericIndexedVector by divanik · Pull Request #100086 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix UB in BSINumericIndexedVector#100086

Merged
alexey-milovidov merged 11 commits into
masterfrom
divanik/fix_UB_numeric_indexed_vector
Apr 10, 2026
Merged

Fix UB in BSINumericIndexedVector#100086
alexey-milovidov merged 11 commits into
masterfrom
divanik/fix_UB_numeric_indexed_vector

Conversation

@divanik

@divanik divanik commented Mar 19, 2026

Copy link
Copy Markdown
Contributor

Closes #100052

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Make correctly processing negative values inside NumericIndexedVectorDataBSI

Version info

  • Merged into: 26.4.1.778

divanik and others added 2 commits March 19, 2026 17:57
…IndexedVector`

`UInt64(std::floor(rhs))` is undefined behavior when `rhs` is negative:
for integer types, `std::floor` implicitly converts to double first;
for float types, casting a negative double to `UInt64` is UB per C++ standard.

`checkValidValue` throws for negative values at runtime, but the compiler
may not see that line 1281 is unreachable for negatives, and UBSan fires.

Fix by using `if constexpr` to skip `std::floor` for integer types and
an explicit non-negative check for float types.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=99869&sha=9b37a8c67f4b0fceacf60174b3bb77d2b8477431&name_0=PR&name_1=AST%20fuzzer%20%28amd_ubsan%29

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Mar 19, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 19, 2026
@divanik divanik changed the title Fix undefined behaviour Fix UB in BSINumericIndexedVector Mar 19, 2026
@divanik divanik requested review from Copilot and rienath March 19, 2026 17:07

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to eliminate undefined behavior in BSINumericIndexedVector when pointwise operations (notably ...PointwiseEqual) are invoked with negative scalar values, and adds a stateless regression test for the scenario.

Changes:

  • Adds a new stateless query test for negative scalar inputs to numericIndexedVectorPointwiseEqual.
  • Introduces a negative-value rejection in checkValidValue.
  • Adjusts scalar-to-integer conversion logic in BSINumericIndexedVector::pointwiseEqual(lhs, rhs).

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/AggregateFunctions/AggregateFunctionGroupNumericIndexedVectorDataBSI.h Adds validation and changes scalar conversion logic in pointwise equality path.
tests/queries/0_stateless/04049_numeric_indexed_vector_negative_scalar_ubsan.sql New regression test asserting INCORRECT_DATA for negative scalar inputs.
tests/queries/0_stateless/04049_numeric_indexed_vector_negative_scalar_ubsan.reference Reference output file for the new stateless test.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +120 to +123
if (value < 0)
{
throw Exception(ErrorCodes::INCORRECT_DATA, "NumericIndexedVector does not support negative values");
}
Comment on lines +1281 to +1286
UInt64 long_value;
if constexpr (std::is_floating_point_v<ValueType>)
{
auto floored = std::floor(rhs);
long_value = floored >= 0 ? static_cast<UInt64>(floored) : static_cast<UInt64>(static_cast<Int64>(floored));
}
Comment thread src/AggregateFunctions/AggregateFunctionGroupNumericIndexedVectorDataBSI.h Outdated
Comment on lines +1 to +9
-- Negative scalar values passed to NumericIndexedVector pointwise operations
-- should throw INCORRECT_DATA, not trigger undefined behavior.

DROP TABLE IF EXISTS t_int;
CREATE TABLE t_int (ds Date, uin UInt32, value Int64) ENGINE = MergeTree() ORDER BY ds;
INSERT INTO t_int VALUES ('2023-12-26', 1, 1);

SELECT numericIndexedVectorPointwiseEqual(groupNumericIndexedVectorState(uin, value), -1) FROM t_int; -- { serverError INCORRECT_DATA }

@rienath rienath removed their request for review March 23, 2026 12:04
@rienath

rienath commented Mar 23, 2026

Copy link
Copy Markdown
Member

Please ping me when it's ready for review, I'll take a look

alexey-milovidov and others added 2 commits March 24, 2026 15:49
…e` instead of rejecting negative values

The blanket negative value check in `checkValidValue` broke all tests using signed types
(Int8/16/32/64). BSI vectors support negative values via two's complement.

The actual UBSan issue is the unguarded `Int64(value * scaling)` cast in
`initializeFromVectorAndValue` when the float value is out of Int64 range.
Add an overflow check matching the existing pattern in the `add` method.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
if constexpr (std::is_floating_point_v<ValueType>)
{
auto floored = std::floor(rhs);
long_value = floored >= 0 ? static_cast<UInt64>(floored) : static_cast<UInt64>(static_cast<Int64>(floored));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pointwiseEqual still has UB here for out-of-range floating values:

  • floored >= 0 branch: static_cast<UInt64>(floored) is undefined when floored > UInt64::max().
  • floored < 0 branch: static_cast<Int64>(floored) is undefined when floored < Int64::lowest().

checkValidValue only rejects NaN/Inf, so finite values like 1e30 / -1e30 can still hit this path.

Please add an explicit bounds check (or reuse float64ToUInt64, which already clamps both sides) before integer casts.

@alexey-milovidov

Copy link
Copy Markdown
Member

@rienath, I think it is ready for review.

alexey-milovidov and others added 2 commits April 9, 2026 20:11
…oat scalars

The previous fix addressed the float-to-UInt64 cast for negative values by
going through Int64 first, but the `decimal_value` computation was still
broken for negative floats (converting the two's complement UInt64 back to
float gives a huge positive number, leading to another UB cast).

Rewrite `pointwiseEqual` to use the same unified fixed-point conversion as
`initializeFromVectorAndValue`: compute `Int64 scaled_value` with an explicit
overflow guard, then compare all bits of the two's complement representation
in a single loop. For out-of-range scalars, return an empty bitmap (no element
can match) instead of triggering UB.

Also add test coverage for `pointwiseEqual` with out-of-range and negative
float scalars.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…m shift underflow

When `total_bit_num` is 0 (possible with `groupNumericIndexedVectorState('BSI', 0, 0)`),
the expression `bit_pattern >> (total_bit_num - 1)` causes undefined behavior because
`total_bit_num - 1` wraps to `UINT32_MAX`. Since a zero-bit BSI can only represent 0,
and the `rhs == 0` case is already handled above, return an empty bitmap immediately.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread src/AggregateFunctions/AggregateFunctionGroupNumericIndexedVectorDataBSI.h Outdated
Comment thread src/AggregateFunctions/AggregateFunctionGroupNumericIndexedVectorDataBSI.h Outdated
alexey-milovidov and others added 2 commits April 10, 2026 00:31
`static_cast<Float64>(std::numeric_limits<Int64>::max())` rounds up to
2^63 in Float64, so values at exactly 2^63 passed the old `fabs(value) > lim`
guard and then hit `static_cast<Int64>(...)` which is undefined behavior.

Replace the rounded-Float64 bound with an exact-integer-domain check:
compute the scaled value first, then compare against 2^63 (which is
exactly representable in Float64) using `>= int64_upper || < -int64_upper`.

Applied to all three call sites: `initializeFromVectorAndValue`,
`pointwiseEqual`, and `addValue`.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Apr 10, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 90.90% 90.90% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 92.54% (62/67) | lost baseline coverage: 9 line(s) · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is quite complex, but the changes look good.

@alexey-milovidov alexey-milovidov self-assigned this Apr 10, 2026
@alexey-milovidov alexey-milovidov merged commit 23720c1 into master Apr 10, 2026
162 of 163 checks passed
@alexey-milovidov alexey-milovidov deleted the divanik/fix_UB_numeric_indexed_vector branch April 10, 2026 05:17
@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 10, 2026
@clickgapai

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UndefinedBehaviorSanitizer: undefined behavior (STID: 2527-362b)

6 participants