iframe-proxy

alexey-milovidov · 2026-06-29T20:16:52Z

UndefinedBehaviorSanitizer reported a signed integer overflow in the master Stress test (amd_asan_ubsan) job:

src/Common/DateLUTImpl.h:1148:32: runtime error: signed integer overflow: -2208988800 + -9223372034646541216 cannot be represented in type 'Time' (aka 'long')
    #0 DateLUTImpl::toStartOfHourInterval<long>(long, unsigned long) const src/Common/DateLUTImpl.h:1148:32
    #2 execute<...DataTypeDateTime64...(IntervalKind::Kind)5> src/Functions/toStartOfInterval.cpp:293:82

The AST fuzzer mutated the regression test added in #108766 into a multi-hour toIntervalHour variant and reached the final date + time reconstruction in DateLUTImpl::toStartOfHourInterval, which overflows below the minimum of Int64 for DateTime64 values far outside any valid date range.

Unlike toStartOfMinuteInterval and roundDown (fixed in #108766), toStartOfHourInterval has no fast path, so even UTC goes through this computation. The INTERVAL 1 HOUR case in the existing test short-circuits to toStartOfHour and never reached the overflowing line, so only intervals of more than one hour exercise it.

The fix adds a private static helper addSaturating that performs the final addition saturating at the Time boundaries instead of overflowing. For valid in-range arguments the result is unchanged (verified by a standalone -fsanitize=undefined equivalence check over the valid range); for out-of-range extremes (whose result is meaningless anyway) it saturates rather than invoking undefined behavior. This is a long-standing latent bug, not user-visible in release builds.

A new regression test tests/queries/0_stateless/04498_tostartof_hour_interval_extreme_overflow.sql covers multi-hour intervals over extreme DateTime64 values (the hours != 1 path the existing 04415 test missed).

While addressing review feedback, a second latent issue in the same function was fixed: the hour count is only validated to be positive, so an extreme but positive toIntervalHour makes UInt64 seconds = hours * 3600 wrap. For toIntervalHour(4611686018427387904) it wraps to exactly 0 (2^62 * 3600 ≡ 0 mod 2^64), and the following time / seconds divided by zero before reaching the saturated reconstruction. The conversion now detects the multiplication overflow with __builtin_mul_overflow and saturates the divisor to the maximum, so such meaningless interval counts round down to the start of the day instead of trapping. The regression test was extended with the overflowing interval-count cases.

The same wrap-to-zero affected the MINUTE sibling toStartOfMinuteInterval, where Int64 divisor = 60 * minutes wraps for toIntervalMinute(4611686018427387904) (60 * 2^62 ≡ 0 mod 2^64), and roundDownToMultiple then divided by zero (a trap on x86, undefined behavior under sanitizers; on ARM the hardware division returns zero, masking it). The minute conversion now applies the same __builtin_mul_overflow guard and saturates the divisor to the maximum of Int64. A new regression test tests/queries/0_stateless/04499_tostartof_minute_interval_extreme_overflow.sql covers the overflowing interval-count cases for the MINUTE path. The other interval siblings are safe from this exact wrap-to-zero: WEEK multiplies by 7 (coprime to two), and DAY/MONTH/SECOND divide by the raw positive interval count without a constant multiply.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=f4c91fb747098bd4f69e0889ea3ca47f52322b06&name_0=MasterCI&name_1=Stress%20test%20%28amd_asan_ubsan%29

Changelog category (leave one):

CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

`UndefinedBehaviorSanitizer` reported a signed integer overflow in the master `Stress test (amd_asan_ubsan)` job: the AST fuzzer mutated the regression test added in #108766 into a multi-hour `toIntervalHour` variant and reached the final `date + time` reconstruction in `DateLUTImpl::toStartOfHourInterval`, which overflows below the minimum of `Int64` for `DateTime64` values far outside any valid date range. Unlike `toStartOfMinuteInterval` and `roundDown`, `toStartOfHourInterval` has no fast path, so even UTC goes through this computation. The `INTERVAL 1 HOUR` case in the existing test short-circuits to `toStartOfHour` and never reached the overflowing line, so only intervals of more than one hour exercise it. Add a private static helper `addSaturating` that performs the final addition saturating at the `Time` boundaries instead of overflowing. For valid in-range arguments the result is unchanged; for out-of-range extremes (whose result is meaningless anyway) it saturates rather than invoking undefined behavior. Regression test: `tests/queries/0_stateless/04489_tostartof_hour_interval_extreme_overflow.sql`. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=f4c91fb747098bd4f69e0889ea3ca47f52322b06&name_0=MasterCI&name_1=Stress%20test%20%28amd_asan_ubsan%29 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

clickhouse-gh · 2026-06-29T20:17:29Z

…terval-overflow

The `Build (arm_tidy)` job failed because `Time res;` in the new `addSaturating` helper is declared uninitialized. `clang-tidy` treats `cppcoreguidelines-init-variables` as an error. Initialize it to `0` (the value is always overwritten by `__builtin_add_overflow`, this just satisfies the check). CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=108848&sha=4ad680f167c2e9b50648f6fe3a53cf8073b6460c&name_0=PR&name_1=Build%20%28arm_tidy%29

…st overflow `toStartOfInterval` only validates that the interval value is positive, and `IntervalHour` is backed by `Int64`, so a huge but positive hour count reaches `toStartOfHourInterval`. There `UInt64 seconds = hours * 3600` wraps; for `toIntervalHour(4611686018427387904)` it wraps to exactly `0`, and the following `time / seconds` divides by zero (undefined behavior / `SIGFPE`) before reaching the saturated reconstruction added earlier in this pull request. Detect the multiplication overflow with `__builtin_mul_overflow` and saturate the divisor to the maximum, consistent with the saturating philosophy of this change: the rounding result for such out-of-range interval counts is meaningless anyway, so a normal value simply rounds down to the start of its day instead of trapping. Extend `04489_tostartof_hour_interval_extreme_overflow` with the overflowing interval-count cases (found by the AI review).

alexey-milovidov · 2026-06-30T01:10:40Z

Continued work on this PR:

Fixed the Build (arm_tidy) red (the only CI failure): clang-tidy flagged the new addSaturating helper's Time res; as cppcoreguidelines-init-variables. Initialized it to 0 (92a92b72ba5).
Addressed the AI-review blocker (src/Common/DateLUTImpl.h:1161): toStartOfInterval only rejects non-positive interval values, so a huge but positive toIntervalHour reaches toStartOfHourInterval, where UInt64 seconds = hours * 3600 wraps. toIntervalHour(4611686018427387904) wraps it to exactly 0 and the following time / seconds divided by zero before reaching the saturated reconstruction. The conversion now uses __builtin_mul_overflow and saturates the divisor to the maximum (3f05677c341), consistent with the saturating philosophy of this fix. Extended 04489_tostartof_hour_interval_extreme_overflow with the overflowing interval-count cases and resolved the thread.
Merged the latest master (the branch was 154 commits behind).

Verification (full binary rebuild from this widely-included header is infeasible here):

src/Functions/toStartOfInterval.cpp (which instantiates toStartOfHourInterval) compiles cleanly with the change.
Called the real DateLUTImpl::toStartOfHourInterval against the built clickhouse_common_io: toStartOfHourInterval<Int64>(1624331862, 4611686018427387904) now returns 1624320000 (2021-06-22 00:00:00) instead of trapping; the normal INTERVAL 2/5 HOUR cases return 1624327200/1624320000 as expected, matching the test reference.

…gainst overflow This is the `MINUTE` sibling of the `toStartOfHourInterval` overflow guarded in the previous commit. `toStartOfInterval` only validates that the interval value is positive, so a huge but positive `toIntervalMinute` reaches `toStartOfMinuteInterval`, where `Int64 divisor = 60 * minutes` wraps. `INTERVAL 4611686018427387904 MINUTE` wraps the divisor to exactly zero (`60 * 2^62 ≡ 0 mod 2^64`), and the following division by it is undefined behaviour: a division-by-zero trap on x86 and a `UndefinedBibSanitizer` report under sanitizers (on ARM the hardware returns zero, which is why the release binary silently produced `1970-01-01 00:00:00`). The conversion now detects the multiplication overflow with `__builtin_mul_overflow` and also clamps a product that exceeds the maximum of `Int64`, saturating the divisor to `Int64::max`. For in-range interval counts the divisor is unchanged; for meaningless extreme counts a normal value simply rounds down to the start of the epoch instead of trapping. A new regression test `04489_tostartof_minute_interval_extreme_overflow` covers the overflowing interval-count cases for the `MINUTE` path that the `HOUR` test missed.

alexey-milovidov · 2026-06-30T15:54:56Z

Continued work on this PR:

Addressed the AI-review blocker (src/Common/DateLUTImpl.h:1140, the MINUTE sibling of the previously-guarded HOUR path): toStartOfInterval only rejects non-positive interval values, so a huge but positive minute count reaches toStartOfMinuteInterval, where Int64 divisor = 60 * minutes wraps. INTERVAL 4611686018427387904 MINUTE wraps the divisor to exactly zero (60 * 2^62 ≡ 0 mod 2^64), and roundDownToMultiple(t, divisor) then divides by zero — a trap on x86 and a UndefinedBehaviorSanitizer report under sanitizers (on ARM the hardware division returns zero, which is why the release binary silently produced 1970-01-01 00:00:00). The conversion now uses __builtin_mul_overflow and clamps a product exceeding the maximum of Int64, saturating the divisor to Int64::max (70c34a87b00). Extended coverage with a new regression test 04489_tostartof_minute_interval_extreme_overflow and resolved the thread.
Confirmed the other interval siblings are safe from this exact wrap-to-zero: WEEK multiplies by 7 (coprime to two, never wraps to zero), and DAY/MONTH/SECOND divide by the raw positive interval count without a constant multiply.

Verification (full binary rebuild from this widely-included header is infeasible here):

The edited src/Common/DateLUTImpl.h parses cleanly with clang++ -std=c++23 -fsyntax-only; both new variables are initialized (no cppcoreguidelines-init-variables regression).
The saturating divisor arithmetic was checked against a verbatim copy of roundDownToMultiple under -fsanitize=undefined with -fno-sanitize-recover=all: normal divisors are unchanged and the extreme count produces no undefined behaviour (roundDownToMultiple(1624331862, Int64::max) = 0).
The full test runs clean against a real clickhouse local and matches the reference; the unfixed code reaches the division by 0 for the extreme MINUTE count.

The three remaining red Stress test jobs are the known Hung check failed, possible deadlock found flake (#107941, linked by the CI comment itself); the hung stacks are in QueryAnalyzer::resolveQuery / executeASTFuzzerQueries / TCPHandler with no frames in the changed DateLUTImpl code.

`04489_max_threads_auto_parsing_compat` has since merged to `master`, taking the `04489` prefix this PR's two regression tests were using. Renumber them to the next free prefixes against `master` so they do not share a number with an already-merged test (matching what `add-test` would assign today): - `04489_tostartof_hour_interval_extreme_overflow` -> `04490_...` - `04489_tostartof_minute_interval_extreme_overflow` -> `04491_...` The cross-reference comment in the `MINUTE` test is updated to point at the renamed `HOUR` test. The SQL and reference contents are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-06-30T19:03:41Z

Continued work on this PR:

Renumbered the two regression tests to avoid an 04489 prefix collision on master (5e1fac428a6). Since the previous CI run, 04489_max_threads_auto_parsing_compat merged to master and took the 04489 prefix that this PR's two tests were using, which would have left three tests sharing 04489 after merge. Renamed to the next free prefixes against master (what add-test would assign today):
- 04489_tostartof_hour_interval_extreme_overflow → 04490_tostartof_hour_interval_extreme_overflow
- 04489_tostartof_minute_interval_extreme_overflow → 04491_tostartof_minute_interval_extreme_overflow
  The cross-reference comment in the MINUTE test was updated to point at the renamed HOUR test; SQL and reference contents are unchanged, and the PR description was updated to reference the new names.

State: AI Review verdict is ✅ Approve, all review threads resolved (0 unresolved), and the PR is MERGEABLE with no conflicts. The branch is 208 commits behind master but the merge-base is from yesterday (< 1 day) and CI is green, so no re-merge was warranted. CI re-runs on the renamed tests; the only previously-red jobs were the known Stress test Hung check flake (#107941, unrelated — no frames in the changed DateLUTImpl code).

…terval-overflow

…ster Since the previous CI run, master gained new stateless tests using the `04490` and `04491` prefixes that this PR's two regression tests were occupying (`04490_dict_get_keys_float_signed_zero_consistency`, `04490_table_readonly_partition_commands`, `04491_dict_get_keys_const_path_non_coordinator_layout`, `04491_table_readonly_move_partition_to_table`). Merging this PR as-is would leave three tests sharing each of those prefixes. Renamed to the next free prefixes against `master` (what `add-test` would assign today): 04490_tostartof_hour_interval_extreme_overflow -> 04492_... 04491_tostartof_minute_interval_extreme_overflow -> 04493_... The cross-reference comment in the `MINUTE` test was updated to point at the renamed `HOUR` test. SQL and reference contents are otherwise unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The `MINUTE` regression test's header comment pointed at the old `04490_tostartof_hour_interval_extreme_overflow` name; update it to the renamed `04492_tostartof_hour_interval_extreme_overflow`. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-07-01T02:59:46Z

Continued work on this PR:

Re-merged the latest master and renumbered the two regression tests again to avoid a fresh 04490/04491 prefix collision (abdd64cc87d, 52516bcf9ce). Since the last CI run, master gained new stateless tests that took the 04490 and 04491 prefixes this PR's tests were using (04490_dict_get_keys_float_signed_zero_consistency, 04490_table_readonly_partition_commands, 04491_dict_get_keys_const_path_non_coordinator_layout, 04491_table_readonly_move_partition_to_table), which would have left three tests sharing each prefix after merge. Renamed to the next free prefixes against master (what add-test would assign today):
- 04490_tostartof_hour_interval_extreme_overflow → 04492_tostartof_hour_interval_extreme_overflow
- 04491_tostartof_minute_interval_extreme_overflow → 04493_tostartof_minute_interval_extreme_overflow
The cross-reference comment in the MINUTE test was updated to point at the renamed HOUR test; SQL and reference contents are otherwise unchanged, and the PR description was updated to reference the new names. The master merge is conflict-free and the src/Common/DateLUTImpl.h fix (the addSaturating helper plus the __builtin_mul_overflow divisor guards for both the HOUR and MINUTE paths) is intact.

State: AI Review verdict is ✅ Approve, both review threads (the HOUR and MINUTE blockers) are resolved (0 unresolved), and the PR is MERGEABLE with no conflicts.

The three red jobs on the previous head were all known master-wide flakes, unrelated to this date-arithmetic change:

Stress test (amd_tsan) / Stress test (arm_tsan) — Hung check failed, possible deadlock found (#107941, linked by the CI comment itself); the hung stacks are in QueryAnalyzer / executeASTFuzzerQueries / TCPHandler, with no frames in the changed DateLUTImpl code.
Upgrade check (amd_release) — Error message in clickhouse-server.log, a PostgreSQLConnectionPool connection timeout to the RFC 5737 documentation IP 192.0.2.1 from a DatabasePostgreSQL background pool. This is a master-wide flake hitting ~20 unrelated PRs per day (per CIDB) and cannot be caused by a date-arithmetic UBSan fix.

CI re-runs from scratch on the renumbered head.

alexey-milovidov · 2026-07-01T03:02:30Z

@groeneai, please investigate this unrelated CI failure and provide a fix in a separate PR (or link an in-progress one). It is not caused by this PR (a DateLUTImpl date-arithmetic UBSan fix) — it is a master-wide flake.

Job: Upgrade check (amd_release) — test Error message in clickhouse-server.log (see upgrade_error_messages.txt)
Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=108848&sha=5e1fac428a60be06f0ae02a2b549f16348d72a06&name_0=PR&name_1=Upgrade%20check%20%28amd_release%29
Signature: a PostgreSQLConnectionPool connection timeout to the RFC 5737 documentation IP 192.0.2.1, logged at <Error> level from DB::DatabasePostgreSQL::removeOutdatedTables() (Code: 614 ... Try 2. Connection to 192.0.2.1:5432 failed ... timeout expired).

Findings so far:

The DatabasePostgreSQL at 192.0.2.1 comes from tests/queries/0_stateless/04210_show_remote_databases_in_system_tables.sql (added 2026-05-13, b471f65a25f, unchanged since). Its background removeOutdatedTables cleaner fires while the database exists (between CREATE and the DROP at the end of the test), tries to connect to the unreachable documentation host, times out, and logs the <Error> line the upgrade check flags.
CIDB shows this exact error is a master-wide flake: first seen 2026-06-25 21:40, 53 occurrences through 2026-07-01 across ~20 unrelated PRs per day. The test predates the onset by six weeks, so the trigger is a master change around 2026-06-24/25 (e.g. to the PostgreSQL engine's background cleaner scheduling, its logging level, or the upgrade-check error allowlist), not the test itself.
No open tracking issue was found. Please open one and link it here, and consider either downgrading this expected-in-tests connection failure below <Error> for the PostgreSQL engine, allowlisting it in the upgrade check, or preventing the cleaner from running for a just-created unreachable database.

groeneai · 2026-07-01T03:51:19Z

@alexey-milovidov Confirmed: this is the 04210_show_remote_databases_in_system_tables.sql fixture noise, unrelated to the DateLUTImpl change. The test creates ENGINE = PostgreSQL('192.0.2.1:5432', ...) and ENGINE = MySQL('192.0.2.1:3306', ...) at the RFC 5737 TEST-NET-1 unreachable host; on the upgrade restart the PG removeOutdatedTables cleaner and the MySQL pool connect, time out, and log <Error>, tripping the zero-<Error> assertion. CIDB matches your numbers: 53 hits, first 2026-06-25 21:40, 52 distinct PRs, 0 on master.

Fix is already approved and green: #108560 allowlists both the PG (PostgreSQLConnectionPool / DatabasePostgreSQL::removeOutdatedTables) and MySQL (mysqlxx::Pool / DatabaseMySQL) fixture connection-error lines in tests/docker_scripts/upgrade_runner.sh, each matcher requiring component AND connection-failure symptom together so real engine errors still surface. You approved it on 2026-06-28 and confirmed Upgrade check (amd_release) green after the 2026-06-30 /continue-pr master merge. It just needs to merge; then this line clears for new and rebased runs, including this one.

Tracking issue: since #108560 is an approved, green fix awaiting merge, it already serves as the tracking artifact, so I did not open a duplicate. If you still want a standalone issue to cross-reference the ~52 affected PRs independently of the PR lifecycle, say the word and I will open one.

Agreed on the ~2026-06-25 onset (a master change to the cleaner scheduling/log-level or the upgrade-check allowlist, not the six-week-old fixture). The #108560 matchers are signature-based with no hardcoded IP, so they cover it regardless of the underlying trigger.

Since the last master merge, master gained `04492_regexp_pattern_with_nul_byte`, which took the `04492` prefix this PR's `04492_tostartof_hour_interval_extreme_overflow` was using. Renamed both tests to the next free prefixes against `master` (what `add-test` would assign today): 04492_tostartof_hour_interval_extreme_overflow -> 04493_tostartof_hour_interval_extreme_overflow 04493_tostartof_minute_interval_extreme_overflow -> 04494_tostartof_minute_interval_extreme_overflow SQL and reference contents are unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

alexey-milovidov · 2026-07-01T06:07:36Z