fix possible crash with filesystem_cache_boundary_alignment#92579
Conversation
f110148 to
70706ef
Compare
2a08cc5 to
515c457
Compare
515c457 to
9129ff2
Compare
ee98161 to
29e757d
Compare
|
|
|
I'm not able to reproduce this anymore in master. |
But it might still be nice to have these checks, I'll try to keep some of the changes and remove bug fix label. |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ary-algn # Conflicts: # src/Interpreters/Cache/FileCacheSettings.cpp
…ble test Add `#include <cstddef>` and `#include <cstdint>` to `FileCacheUtils.h` to replace the removed `#include <Core/Types.h>`, ensuring `size_t` and `SIZE_MAX` are properly declared. Remove test `03734_filesystem_cache_boundary_alignment` because the underlying bug is no longer reproducible on master, causing bugfix validation to fail. The `boundary_alignment` validation was already added to master independently. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix. |
… smaller value The previous overflow guard returned `roundDownToMultiple(SIZE_MAX, multiple)`, which can be strictly less than `num` (e.g. `num = SIZE_MAX - 1`, `multiple = 4` yields `SIZE_MAX - 3`). This violated the post-condition `result >= num` that callers rely on (`chassert(desired_size >= downloaded_size)` in `FileSegment.cpp` and `chassert(aligned_end_offset >= initial_range.right)` in `FileCache.cpp`), and in release builds could underflow subsequent size arithmetic. `FileCacheSettings::validate` already rejects absurdly large `boundary_alignment` values, so the overflow branch is defensive and unreachable on valid configurations. Throw `std::overflow_error` instead of returning a silently corrupted value. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…w check The previous `num > SIZE_MAX - multiple + 1` guard detects overflow of the `num + multiple - 1` implementation trick, not overflow of the mathematical result. For `multiple = SIZE_MAX, num = 2` it computes `limit = 1` and throws, but the correct rounded-up multiple `SIZE_MAX` is representable. Switch to a remainder-based computation: `delta = multiple - (num % multiple)`, throw only when `num + delta` would overflow `size_t`. This keeps the post-condition `result >= num` without rejecting representable cases. Make both helpers `inline` instead of `static` so translation units that include the header without calling one of them (e.g. `FileSegment.cpp` no longer exercises `roundDownToMultiple` after the new implementation removes the call) do not hit `-Werror,-Wunused-function`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds focused gtests that pin the contract of `roundUpToMultiple` at the `size_t` extremes: - Representable boundaries (e.g. `num = 2`, `multiple = SIZE_MAX` returns `SIZE_MAX`) are not rejected by the new remainder-based rounding (the old `num + multiple - 1` trick would have falsely overflowed). - True overflows (e.g. `num = SIZE_MAX`, `multiple = SIZE_MAX - 1`) throw `std::overflow_error` rather than wrapping. Also covers the basic and zero-multiple cases for `roundDownToMultiple` and `roundUpToMultiple` so a future refactor of the rounding formula cannot silently regress the post-condition `result >= num`.
|
@groeneai, please fix this deadlock |
|
@alexey-milovidov investigated. The Hung check on Where it hangs. Single stuck query in So the thread is waiting on the global Why this PR cannot cause it. PR #92579 changes one inline helper in
None of these appear in the hung stack. The hang is at Cross-PR / master data (30d). Hung check on this exact lane:
Across all stress lanes the chronic Hung check family has 565 master hits in 30d, dominated by Recommendation. Retrigger I have not opened a separate fix PR for the |
|
Updated the branch with All four review threads are resolved, and Built The four CI failures are all unrelated chronic master flakes, none touching
|
|
📊 Cloud Performance Report ✅ AI verdict: This PR only changes a file-cache size-rounding overflow check (and tightens thread query-id handling); for normal inputs the rounding result is identical and neither change sits on the query execution path that any of the flagged queries run. The two ClickBench improvements (Q4 -14%, Q15 -13%) and two TPC-H regressions (Q2 +15%, Q4 +19%) all land on code these queries never execute, so they read as run-to-run variance rather than real effects of this change — the TPC-H queries were also noisy. All four flagged verdicts are downgraded to not-sure; no actionable performance change is attributable to this PR. clickbenchFlagged queries (2 of 43)
q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on. tpch_adapted_1_officialFlagged queries (2 of 22)
q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on. Debug info
|
alexey-milovidov
left a comment
There was a problem hiding this comment.
Can't accept performance degradations.
|
Edited manually on GitHub - if it has mistakes, they will be corrected on the next commit. |
The previous commits left `FileCacheUtils::roundUpToMultiple` in a non-compiling and incorrect state: - `<common/arithmeticOverflow.h>` is not a valid include path; the header is `<base/arithmeticOverflow.h>`. - `common::addOverflow` takes the result as a third out-parameter (`addOverflow(x, y, res)`), but it was called with two arguments, which failed to compile in the Fast test build. - The body called `roundDownToMultiple(num, multiple)`, which rounds down, dropping the rounding-up entirely. Implement the remainder-based rounding that was requested in review: add `multiple - (num % multiple)` and signal overflow only when the actual rounded-up value does not fit in `size_t`. This is exact at the extreme top of the `size_t` range (the `num + multiple - 1` trick would falsely overflow there) and does a single division plus an add rather than a division and a multiply. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered by tests: 33/33 (100.00%) | Lost baseline coverage: none · Uncovered code |

Handle a potential
size_toverflow inFileCacheUtils::roundUpToMultiplewhennum + multiple - 1would exceedSIZE_MAX, and makeFileCacheUtils.hself-contained by including<cstddef>/<cstdint>directly instead of relying on transitive includes fromCore/Types.h.Ref: #85552
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix a potential overflow in
roundUpToMultiplewhen computing filesystem cache boundary alignment for very large offsets.Documentation entry for user-facing changes
Note
Medium Risk
Touches filesystem cache range alignment math and introduces a new overflow-throwing path, which could change behavior for extreme inputs/configs.
Overview
Fixes a potential
size_toverflow inFileCacheUtils::roundUpToMultipleby replacing thenum + multiple - 1trick with remainder-based rounding and an explicit overflow check that throwsstd::overflow_errorwhen the rounded value can’t fit.Makes
FileCacheUtils.hself-contained and header-friendly by switching helpers toinlineand including<cstddef>,<cstdint>, and<stdexcept>instead of relying on transitiveCore/Types.hincludes.Reviewed by Cursor Bugbot for commit 3a1b284. Bugbot is set up for automated code reviews on this repo. Configure here.
Version info
26.6.1.490