iframe-proxy

bharatnc · 2025-12-18T17:19:47Z

Handle a potential size_t overflow in FileCacheUtils::roundUpToMultiple when num + multiple - 1 would exceed SIZE_MAX, and make FileCacheUtils.h self-contained by including <cstddef> / <cstdint> directly instead of relying on transitive includes from Core/Types.h.

Ref: #85552

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix a potential overflow in roundUpToMultiple when computing filesystem cache boundary alignment for very large offsets.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

Note

Medium Risk
Touches filesystem cache range alignment math and introduces a new overflow-throwing path, which could change behavior for extreme inputs/configs.

Overview
Fixes a potential size_t overflow in FileCacheUtils::roundUpToMultiple by replacing the num + multiple - 1 trick with remainder-based rounding and an explicit overflow check that throws std::overflow_error when the rounded value can’t fit.

Makes FileCacheUtils.h self-contained and header-friendly by switching helpers to inline and including <cstddef>, <cstdint>, and <stdexcept> instead of relying on transitive Core/Types.h includes.

^{Reviewed by Cursor Bugbot for commit 3a1b284. Bugbot is set up for automated code reviews on this repo. Configure here.}

Version info

Merged into: 26.6.1.490

clickhouse-gh · 2025-12-18T17:20:15Z

kssenii · 2026-01-05T11:54:47Z

C++ exception with description "Setting 'boundary_alignment' (4194304) cannot be greater than 'max_file_segment_size' (5)" thrown in the test body.
seems unit tests need to be adjusted...

bharatnc · 2026-01-12T12:48:00Z

I'm not able to reproduce this anymore in master.

bharatnc · 2026-01-19T01:24:01Z

I'm not able to reproduce this anymore in master.

But it might still be nice to have these checks, I'll try to keep some of the changes and remove bug fix label.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ary-algn # Conflicts: # src/Interpreters/Cache/FileCacheSettings.cpp

…ary-algn

…ble test Add `#include <cstddef>` and `#include <cstdint>` to `FileCacheUtils.h` to replace the removed `#include <Core/Types.h>`, ensuring `size_t` and `SIZE_MAX` are properly declared. Remove test `03734_filesystem_cache_boundary_alignment` because the underlying bug is no longer reproducible on master, causing bugfix validation to fail. The `boundary_alignment` validation was already added to master independently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-04-07T00:22:31Z

The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix.

…ary-algn

… smaller value The previous overflow guard returned `roundDownToMultiple(SIZE_MAX, multiple)`, which can be strictly less than `num` (e.g. `num = SIZE_MAX - 1`, `multiple = 4` yields `SIZE_MAX - 3`). This violated the post-condition `result >= num` that callers rely on (`chassert(desired_size >= downloaded_size)` in `FileSegment.cpp` and `chassert(aligned_end_offset >= initial_range.right)` in `FileCache.cpp`), and in release builds could underflow subsequent size arithmetic. `FileCacheSettings::validate` already rejects absurdly large `boundary_alignment` values, so the overflow branch is defensive and unreachable on valid configurations. Throw `std::overflow_error` instead of returning a silently corrupted value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…w check The previous `num > SIZE_MAX - multiple + 1` guard detects overflow of the `num + multiple - 1` implementation trick, not overflow of the mathematical result. For `multiple = SIZE_MAX, num = 2` it computes `limit = 1` and throws, but the correct rounded-up multiple `SIZE_MAX` is representable. Switch to a remainder-based computation: `delta = multiple - (num % multiple)`, throw only when `num + delta` would overflow `size_t`. This keeps the post-condition `result >= num` without rejecting representable cases. Make both helpers `inline` instead of `static` so translation units that include the header without calling one of them (e.g. `FileSegment.cpp` no longer exercises `roundDownToMultiple` after the new implementation removes the call) do not hit `-Werror,-Wunused-function`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ary-algn

Adds focused gtests that pin the contract of `roundUpToMultiple` at the `size_t` extremes: - Representable boundaries (e.g. `num = 2`, `multiple = SIZE_MAX` returns `SIZE_MAX`) are not rejected by the new remainder-based rounding (the old `num + multiple - 1` trick would have falsely overflowed). - True overflows (e.g. `num = SIZE_MAX`, `multiple = SIZE_MAX - 1`) throw `std::overflow_error` rather than wrapping. Also covers the basic and zero-multiple cases for `roundDownToMultiple` and `roundUpToMultiple` so a future refactor of the rounding formula cannot silently regress the post-condition `result >= num`.

alexey-milovidov · 2026-05-26T07:41:08Z

@groeneai, please fix this deadlock Hung check failed, possible deadlock found and send a PR.

groeneai · 2026-05-26T08:38:27Z

@alexey-milovidov investigated. The Hung check on Stress test (arm_debug) here is not caused by this PR.

Where it hangs. Single stuck query in hung_check.log: an AsyncInsertFlush of INSERT INTO test_l8brwlta.t (c1) FORMAT Values from 03164_analyzer_validate_tree_size.sql, elapsed 553s, blocked on:

DB::CachePriorityGuard::writeLock()
DB::FileCache::doTryReserve  -- FileCache.cpp:1212  (auto lock = cache_guard.writeLock();)
DB::FileCache::tryReserve
DB::FileSegment::reserve
DB::FileSegmentRangeWriter::write
DB::CachedOnDiskWriteBufferFromFile::cacheData
...
DB::MergeTreeSink::consume

So the thread is waiting on the global cache_guard (a SharedMutex) exclusive lock that protects the priority queue, on a cached_azure disk under stress.

Why this PR cannot cause it. PR #92579 changes one inline helper in src/Interpreters/FileCache/FileCacheUtils.h (roundUpToMultiple) plus a gtest. Its four callers are:

FileCache.cpp:822-825 (read-range alignment in getCacheReadRanges)
FileSegment.cpp:720 (shrinkFileSegmentSize)
FileSegment.cpp:770,772 (getBackgroundDownloadDesiredSize)

None of these appear in the hung stack. The hang is at cache_guard.writeLock(), which is a SharedMutex acquisition for priority-queue mutation, never reached from roundUpToMultiple and unaffected by the new overflow check.

Cross-PR / master data (30d). Hung check on this exact lane:

check_name	hits	PRs	master
`Stress test (arm_debug)`	17	15	0

Across all stress lanes the chronic Hung check family has 565 master hits in 30d, dominated by Stress test (experimental, serverfuzz, ...) variants. This sighting matches the established PR-only flaky shape for the regular Stress test (arm_debug) lane.

Recommendation. Retrigger Stress test (arm_debug) to unblock the PR. The underlying FileCache priority writeLock contention under heavy async-insert + cached_azure stress is owned by kssenii and recent FileCache.cpp work (c52f7c23, e7321d6d, 9ed6da55) is in that area, but it is not actionable from this PR.

I have not opened a separate fix PR for the cache_guard.writeLock contention because I cannot reproduce it locally and the FileCache priority queue lock design needs the component owner's review. Happy to dig deeper if you have a deterministic repro or want me to file an issue with the CIDB data above.

Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=92579&sha=cc53b300b1fbbfbf7cf87655c1e5d4789e8516cc&name_0=PR&name_1=Stress+test+%28arm_debug%29

…ary-algn

alexey-milovidov · 2026-06-03T14:36:41Z

Updated the branch with master (clean merge, net diff still just the two files: FileCacheUtils.h + the new gtest_file_cache_utils.cpp).

All four review threads are resolved, and #101239 (the arm_msan SimSIMD fix Alexey referenced) was merged on 2026-04-07 and is already included.

Built unit_tests_dbms on the merged branch — the whole FileCache tree (including the roundUpToMultiple callers in FileCache.cpp/FileSegment.cpp) compiles, and the new gtest passes all 4 cases (representable SIZE_MAX edge returns SIZE_MAX; true overflow throws).

The four CI failures are all unrelated chronic master flakes, none touching FileCacheUtils or cache range math:

Server died — 1979 failures across 752 PRs in 30d.
Assertion 'px != 0' failed (STID: 3631-2e69) in BackgroundJobsAssignee::threadFunc (pure-virtual-call abort) — also seen on unrelated PRs #100706, #105375, #105157.
test_kafka_consumer_failover — Kafka integration flake.
Hung check — already shown unrelated by @groeneai (stuck on cache_guard.writeLock(), never reached from roundUpToMultiple).

clickhouse-gh · 2026-06-03T18:00:28Z

📊 Cloud Performance Report

✅ AI verdict: no_change — no significant changes across 39 queries analysed

This PR only changes a file-cache size-rounding overflow check (and tightens thread query-id handling); for normal inputs the rounding result is identical and neither change sits on the query execution path that any of the flagged queries run. The two ClickBench improvements (Q4 -14%, Q15 -13%) and two TPC-H regressions (Q2 +15%, Q4 +19%) all land on code these queries never execute, so they read as run-to-run variance rather than real effects of this change — the TPC-H queries were also noisy. All four flagged verdicts are downgraded to not-sure; no actionable performance change is attributable to this PR.

clickbench

⚠️ 2 inconclusive

Flagged queries (2 of 43)

	Query	Verdict	Baseline med (ms)	PR med (ms)	Change	q-value	Hint
⚠️	4	not_sure	265	227	-14.3%	<0.0001	File-cache size-rounding overflow fix is off this SELECT's execution path; -14% is run-to-run variance.
⚠️	15	not_sure	240	210	-12.7%	<0.0001	Same off-path rounding/query-id change; this scan-filter SELECT is untouched, -12.7% is noise.

_{q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on.}

tpch_adapted_1_official

⚠️ 2 inconclusive

Flagged queries (2 of 22)

	Query	Verdict	Baseline med (ms)	PR med (ms)	Change	q-value	Hint
⚠️	2	not_sure	73	84	+15.1%	<0.0001	Overflow-safe rounding fix is off this query's path; +15% is run-to-run variance, not a PR effect.
⚠️	4	not_sure	2868	3418	+19.2%	0.0001	Off-path correctness fix can't add 550ms to this query; measurements were noisy, reads as variance.

_{q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on.}

Debug info

StressHouse run: 715284e1-d27d-4292-9a47-be8ef6ef1327
MIRAI run: b285d114-a2c7-4b0c-9f91-0d1cef9d9193
PR check IDs:
- clickbench_364482_1780559057
- clickbench_364503_1780559057
- tpch_adapted_1_official_364508_1780559057
- tpch_adapted_1_official_364511_1780559057
- tpch_adapted_1_official_364521_1780559057

alexey-milovidov

Can't accept performance degradations.

alexey-milovidov · 2026-06-04T03:52:33Z

Edited manually on GitHub - if it has mistakes, they will be corrected on the next commit.

The previous commits left `FileCacheUtils::roundUpToMultiple` in a non-compiling and incorrect state: - `<common/arithmeticOverflow.h>` is not a valid include path; the header is `<base/arithmeticOverflow.h>`. - `common::addOverflow` takes the result as a third out-parameter (`addOverflow(x, y, res)`), but it was called with two arguments, which failed to compile in the Fast test build. - The body called `roundDownToMultiple(num, multiple)`, which rounds down, dropping the rounding-up entirely. Implement the remainder-based rounding that was requested in review: add `multiple - (num % multiple)` and signal overflow only when the actual rounded-up value does not fit in `size_t`. This is exact at the extreme top of the `size_t` range (the `num + multiple - 1` trick would falsely overflow there) and does a single division plus an add rather than a division and a multiply. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

clickhouse-gh · 2026-06-04T09:15:37Z

LLVM Coverage Report

Changed lines: Changed C/C++ lines covered by tests: 33/33 (100.00%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Dec 18, 2025

fix possible crash with filesystem_cache_boundary_alignment

70706ef

bharatnc force-pushed the ncb/fix-cache-boundary-algn branch from f110148 to 70706ef Compare December 19, 2025 07:49

kssenii self-assigned this Dec 19, 2025

bharatnc force-pushed the ncb/fix-cache-boundary-algn branch 2 times, most recently from 2a08cc5 to 515c457 Compare December 21, 2025 17:50

try fix test

9129ff2

bharatnc force-pushed the ncb/fix-cache-boundary-algn branch from 515c457 to 9129ff2 Compare December 21, 2025 17:51

kssenii reviewed Dec 22, 2025

View reviewed changes

Comment thread src/Interpreters/FileCache/FileCacheUtils.h Outdated

add extra validation

29e757d

bharatnc force-pushed the ncb/fix-cache-boundary-algn branch from ee98161 to 29e757d Compare January 2, 2026 10:36

kssenii reviewed Jan 5, 2026

View reviewed changes

Comment thread src/Interpreters/Cache/FileCache.cpp Outdated

Update 03734_filesystem_cache_boundary_alignment.sql

1a64260

alexey-milovidov unassigned kssenii Apr 6, 2026

alexey-milovidov and others added 2 commits April 6, 2026 22:52

Move boundary_alignment validation into FileCacheSettings::validate

56ea151

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge remote-tracking branch 'origin/master' into ncb/fix-cache-bound…

b31684a

…ary-algn # Conflicts: # src/Interpreters/Cache/FileCacheSettings.cpp

alexey-milovidov self-assigned this Apr 6, 2026

clickhouse-gh Bot reviewed Apr 6, 2026

View reviewed changes

Comment thread src/Interpreters/FileCache/FileCacheUtils.h Outdated

alexey-milovidov and others added 2 commits April 6, 2026 23:49

Merge remote-tracking branch 'origin/master' into ncb/fix-cache-bound…

b4b322e

…ary-algn

clickhouse-gh Bot added pr-improvement Pull request with some product improvements and removed pr-bugfix Pull request with bugfix, not backported by default labels Apr 6, 2026

alexey-milovidov mentioned this pull request Apr 6, 2026

Fix non-deterministic Map ordering in test 03546_json_input_output_map_as_array #101892

Merged

1 task

alexey-milovidov added 2 commits April 6, 2026 17:47

Merge branch 'master' into ncb/fix-cache-boundary-algn

26d715c

Merge remote-tracking branch 'origin/master' into ncb/fix-cache-bound…

85eb338

…ary-algn

clickhouse-gh Bot reviewed May 20, 2026

View reviewed changes

Comment thread src/Interpreters/FileCache/FileCacheUtils.h Outdated

alexey-milovidov and others added 2 commits May 23, 2026 13:54

Merge remote-tracking branch 'origin/master' into ncb/fix-cache-bound…

896e2a9

…ary-algn

clickhouse-gh Bot reviewed May 24, 2026

View reviewed changes

Comment thread src/Interpreters/FileCache/FileCacheUtils.h

alexey-milovidov added 4 commits May 29, 2026 07:06

Merge remote-tracking branch 'origin/master' into ncb/fix-cache-bound…

61bf7c4

…ary-algn

Merge remote-tracking branch 'origin/master' into ncb/fix-cache-bound…

aa50f19

…ary-algn

Merge remote-tracking branch 'origin/master' into ncb/fix-cache-bound…

8cf559c

…ary-algn

Merge remote-tracking branch 'origin/master' into ncb/fix-cache-bound…

7681786

…ary-algn

alexey-milovidov approved these changes Jun 4, 2026

View reviewed changes

alexey-milovidov requested changes Jun 4, 2026

View reviewed changes

Update FileCacheUtils.h

5a7f802

Update FileCacheUtils.h

c9b4227

alexey-milovidov approved these changes Jun 4, 2026

View reviewed changes

clickhouse-gh Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread src/Interpreters/FileCache/FileCacheUtils.h Outdated

groeneai mentioned this pull request Jun 7, 2026

Add support for compact format in EXPLAIN indexes #102031

Draft

alexey-milovidov added this pull request to the merge queue Jun 8, 2026

Merged via the queue into master with commit 5abd1c9 Jun 8, 2026
166 checks passed

alexey-milovidov deleted the ncb/fix-cache-boundary-algn branch June 8, 2026 02:02

robot-clickhouse added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 8, 2026

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

bharatnc commented Dec 18, 2025 • edited by robot-clickhouse Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Version info

Uh oh!

clickhouse-gh Bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Missing context / blind spots

Final Verdict

Uh oh!

Uh oh!

Uh oh!

kssenii commented Jan 5, 2026

Uh oh!

bharatnc commented Jan 12, 2026

Uh oh!

bharatnc commented Jan 19, 2026

Uh oh!

Uh oh!

alexey-milovidov commented Apr 7, 2026

Uh oh!

Uh oh!

Uh oh!

alexey-milovidov commented May 26, 2026

Uh oh!

groeneai commented May 26, 2026

Uh oh!

alexey-milovidov commented Jun 3, 2026

Uh oh!

clickhouse-gh Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

clickbench

tpch_adapted_1_official

Uh oh!

alexey-milovidov left a comment

Choose a reason for hiding this comment

Uh oh!

alexey-milovidov commented Jun 4, 2026

Uh oh!

Uh oh!

clickhouse-gh Bot commented Jun 4, 2026

LLVM Coverage Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bharatnc commented Dec 18, 2025 •

edited by robot-clickhouse

Loading

clickhouse-gh Bot commented Dec 18, 2025 •

edited

Loading

clickhouse-gh Bot commented Jun 3, 2026 •

edited

Loading