iframe-proxy

Ergus · 2026-06-12T17:38:22Z

Related: #107196

Changelog category (leave one):

Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Version info

Merged into: 26.6.1.923

clickhouse-gh · 2026-06-12T17:39:02Z

Workflow [PR], commit [599f6bd]

Summary: ✅

Performance Comparison: Performance dashboard

AI Review

Summary

This PR adds regression coverage around readBigStrict string deserialization and hardens ColumnString::insertRangeFrom against corrupted source offsets. The endpoint guard and the new non-monotonic-offset test cover part of the original failure, but the current implementation still allows malformed intermediate offsets through the empty-destination fast path and can mutate the destination before throwing on the checked path.

Missing context / blind spots

⚠️ The Praktika PR report for 599f6bd27863decde4d319dd4ffadf84a9fe929c returned zero test results, so I could not use CI as evidence that the new tests ran. A completed unit-test/CI run for the current head would close this gap.

Findings

❌ Blockers

[src/Columns/ColumnString.cpp:187] ColumnString::insertRangeFrom promises to reject malformed source offsets before they can corrupt the destination, but the monotonicity validation is inside only the non-fast-path branch and runs after chars and offsets have already been mutated. A source with offsets [4, 2, 12] and insertRangeFrom(src, 0, 3) into an empty destination passes the endpoint check and assigns the non-monotonic offsets unchanged; for non-zero start, the new INCORRECT_DATA exception can fire after the destination has been partially resized/copied. Validate every copied offset is within [nested_offset, nested_end] and monotonic before mutating chars or offsets, then perform the copy/assign.

Tests

⚠️ Add a focused ColumnString test for a full-range copy into an empty destination from offsets like [4, 2, 12], asserting INCORRECT_DATA instead of accepting a corrupt result.
⚠️ Add a focused ColumnString test for the non-zero start malformed-offset path with a pre-populated destination, asserting the destination remains unchanged when INCORRECT_DATA is thrown.

Final Verdict

Status: ❌ Block

Minimum required action: move source-offset validation before any destination mutation and make it cover both the empty-destination fast path and the offset-rewrite path, with the focused regression tests above.

…rom` The consistency guard added for inconsistent source offsets computed `nested_length` before validating it. With unsigned arithmetic, a source whose offsets decrease (`offsetAt(start + length) < offsetAt(start)`) underflows `nested_length`, and `nested_offset + nested_length` wraps back to the smaller end offset, so the `> chars.size()` check silently passes and the following `resize`/`memcpy` operate on the huge wrapped length. Validate the end offset directly (`nested_end < nested_offset || nested_end > chars.size()`) before computing `nested_length`, and add a regression test with decreasing source offsets that asserts `INCORRECT_DATA`. Addresses the AI review blocker on ClickHouse#107355 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

clickhouse-gh · 2026-06-13T22:04:29Z

+    /// (the same invariant the copy constructor enforces). Validate the end offset before computing
+    /// the length: the offsets must be monotonic (nested_end >= nested_offset) and must stay within
+    /// chars, otherwise the memcpy below would read out of bounds. Computing nested_length first would
+    /// underflow for decreasing offsets and wrap nested_offset + nested_length back below chars.size(),


The new endpoint check still lets malformed intermediate offsets corrupt the destination column. For example, with source offsets [4, 2, 12], chars.size() == 12, and insertRangeFrom(src, 1, 2), the guard sees nested_offset == 4 and nested_end == 12, so the byte copy is bounded. But the offset rewrite below then computes src.offsets[1] - nested_offset, i.e. 2 - 4, which underflows and stores a huge destination offset. The result is a ColumnString whose offsets no longer describe its chars, so later reads can go out of bounds. Please validate every copied offset is monotonic and within [nested_offset, nested_end] before copying, or build offsets in a way that rejects any intermediate offset below the previous one.

clickhouse-gh · 2026-06-16T15:34:41Z

+            // Intermediate offsets can still be non-monotonic
+            /// A copied offset that dips below the previous one (in particular below nested_offset) would underflow the subtraction and
+            /// store a corrupt destination offset, so reject it here.
+            if (src_offset < prev_src_offset)


This validation still runs too late and does not cover the fast path above. With source offsets [4, 2, 12], chars.size() == 12, and insertRangeFrom(src, 0, 3) into an empty destination, the endpoint guard passes, the bytes are copied, and the start == 0 && offsets.empty() branch assigns the non-monotonic offsets unchanged because this loop is skipped. For a non-zero start, this check can also throw only after chars has been resized/copied and offsets has been resized, so a caught INCORRECT_DATA exception leaves the destination partially mutated. Please validate every copied source offset is within [nested_offset, nested_end] and monotonic before mutating chars or offsets, then perform the copy/assign.

clickhouse-gh · 2026-06-16T20:04:15Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	85.20%	85.10%	-0.10%
Functions	92.30%	92.30%	+0.00%
Branches	77.40%	77.30%	-0.10%

Changed lines: Changed C/C++ lines covered by tests: 134/135 (99.26%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

Ergus · 2026-06-17T18:08:43Z

Hi @Avogar

Since these changes I have been wondering how these changes (specially the last commit) could affect performance (by inhibiting vectorization due to the if-throw inside the look)

Also the checked path is only the "slow" one, the direct one is not checked and similarly with the other copy.

I actually added this commit to the branch: 69b03b7 after it was merged in order to always check monotonicity but only on debug and asan builds...

There is a trade off here... WDYT?

Avogar · 2026-06-17T18:29:12Z

Add unit tests from @davenger

ffadfd1

clickhouse-gh Bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Jun 12, 2026

Ergus requested review from Avogar and davenger June 12, 2026 17:39

clickhouse-gh Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread src/Columns/ColumnString.cpp Outdated

Merge remote-tracking branch 'principal/master' into pick_tests_60670

d098554

clickhouse-gh Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread src/Columns/tests/gtest_column_string.cpp

Merge branch 'master' into pick_tests_60670

83d0277

alexey-milovidov mentioned this pull request Jun 13, 2026

Skip macOS-incompatible distributed tests 04327 and 04336 #107376

Merged

clickhouse-gh Bot reviewed Jun 13, 2026

View reviewed changes

Avogar self-assigned this Jun 15, 2026

Ergus added 3 commits June 16, 2026 14:57

Merge remote-tracking branch 'origin/master' into pick_tests_60670

874d0d2

Add tests suggested by the agent

5bffa61

Add monotonicity check as siggested by the agent

599f6bd

clickhouse-gh Bot reviewed Jun 16, 2026

View reviewed changes

Avogar approved these changes Jun 17, 2026

View reviewed changes

Avogar added this pull request to the merge queue Jun 17, 2026

Merged via the queue into ClickHouse:master with commit bcb49a5 Jun 17, 2026
327 of 328 checks passed

robot-ch-test-poll3 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 17, 2026

Ergus mentioned this pull request Jun 17, 2026

Make monotnicity checks but only in debug/asan builds. #107774

Open

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tests for changes in #107196#107355

Tests for changes in #107196#107355
Avogar merged 7 commits into
ClickHouse:masterfrom
Ergus:pick_tests_60670

Ergus commented Jun 12, 2026 •

edited by robot-clickhouse

Loading

Uh oh!

clickhouse-gh Bot commented Jun 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

clickhouse-gh Bot Jun 13, 2026

Uh oh!

clickhouse-gh Bot Jun 16, 2026

Uh oh!

clickhouse-gh Bot commented Jun 16, 2026

Uh oh!

Uh oh!

Ergus commented Jun 17, 2026

Uh oh!

Avogar commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

Ergus commented Jun 12, 2026 • edited by robot-clickhouse Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Version info

Uh oh!

clickhouse-gh Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Missing context / blind spots

Findings

Tests

Final Verdict

Uh oh!

Uh oh!

Uh oh!

clickhouse-gh Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

clickhouse-gh Bot commented Jun 16, 2026

LLVM Coverage Report

Uh oh!

Uh oh!

Ergus commented Jun 17, 2026

Uh oh!

Avogar commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Ergus commented Jun 12, 2026 •

edited by robot-clickhouse

Loading

clickhouse-gh Bot commented Jun 12, 2026 •

edited

Loading