Tests for changes in #107196 by Ergus · Pull Request #107355 · ClickHouse/ClickHouse · GitHub
Skip to content

Tests for changes in #107196#107355

Merged
Avogar merged 7 commits into
ClickHouse:masterfrom
Ergus:pick_tests_60670
Jun 17, 2026
Merged

Tests for changes in #107196#107355
Avogar merged 7 commits into
ClickHouse:masterfrom
Ergus:pick_tests_60670

Conversation

@Ergus

@Ergus Ergus commented Jun 12, 2026

Copy link
Copy Markdown
Member

Related: #107196

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Version info

  • Merged into: 26.6.1.923

@clickhouse-gh

clickhouse-gh Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Jun 12, 2026
@Ergus Ergus requested review from Avogar and davenger June 12, 2026 17:39
Comment thread src/Columns/ColumnString.cpp Outdated
Comment thread src/Columns/tests/gtest_column_string.cpp
…rom`

The consistency guard added for inconsistent source offsets computed
`nested_length` before validating it. With unsigned arithmetic, a source
whose offsets decrease (`offsetAt(start + length) < offsetAt(start)`)
underflows `nested_length`, and `nested_offset + nested_length` wraps back
to the smaller end offset, so the `> chars.size()` check silently passes and
the following `resize`/`memcpy` operate on the huge wrapped length.

Validate the end offset directly (`nested_end < nested_offset ||
nested_end > chars.size()`) before computing `nested_length`, and add a
regression test with decreasing source offsets that asserts `INCORRECT_DATA`.

Addresses the AI review blocker on
ClickHouse#107355

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
/// (the same invariant the copy constructor enforces). Validate the end offset before computing
/// the length: the offsets must be monotonic (nested_end >= nested_offset) and must stay within
/// chars, otherwise the memcpy below would read out of bounds. Computing nested_length first would
/// underflow for decreasing offsets and wrap nested_offset + nested_length back below chars.size(),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new endpoint check still lets malformed intermediate offsets corrupt the destination column. For example, with source offsets [4, 2, 12], chars.size() == 12, and insertRangeFrom(src, 1, 2), the guard sees nested_offset == 4 and nested_end == 12, so the byte copy is bounded. But the offset rewrite below then computes src.offsets[1] - nested_offset, i.e. 2 - 4, which underflows and stores a huge destination offset. The result is a ColumnString whose offsets no longer describe its chars, so later reads can go out of bounds. Please validate every copied offset is monotonic and within [nested_offset, nested_end] before copying, or build offsets in a way that rejects any intermediate offset below the previous one.

@Avogar Avogar self-assigned this Jun 15, 2026
// Intermediate offsets can still be non-monotonic
/// A copied offset that dips below the previous one (in particular below nested_offset) would underflow the subtraction and
/// store a corrupt destination offset, so reject it here.
if (src_offset < prev_src_offset)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This validation still runs too late and does not cover the fast path above. With source offsets [4, 2, 12], chars.size() == 12, and insertRangeFrom(src, 0, 3) into an empty destination, the endpoint guard passes, the bytes are copied, and the start == 0 && offsets.empty() branch assigns the non-monotonic offsets unchanged because this loop is skipped. For a non-zero start, this check can also throw only after chars has been resized/copied and offsets has been resized, so a caught INCORRECT_DATA exception leaves the destination partially mutated. Please validate every copied source offset is within [nested_offset, nested_end] and monotonic before mutating chars or offsets, then perform the copy/assign.

@clickhouse-gh

clickhouse-gh Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.20% 85.10% -0.10%
Functions 92.30% 92.30% +0.00%
Branches 77.40% 77.30% -0.10%

Changed lines: Changed C/C++ lines covered by tests: 134/135 (99.26%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@Avogar Avogar added this pull request to the merge queue Jun 17, 2026
Merged via the queue into ClickHouse:master with commit bcb49a5 Jun 17, 2026
327 of 328 checks passed
@robot-ch-test-poll3 robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 17, 2026
@Ergus

Ergus commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

Hi @Avogar

Since these changes I have been wondering how these changes (specially the last commit) could affect performance (by inhibiting vectorization due to the if-throw inside the look)

Also the checked path is only the "slow" one, the direct one is not checked and similarly with the other copy.

I actually added this commit to the branch: 69b03b7 after it was merged in order to always check monotonicity but only on debug and asan builds...

There is a trade off here... WDYT?

@Avogar

Avogar commented Jun 17, 2026

Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants