Fix signed integer overflow in DateLUTImpl::toStartOfHourInterval by alexey-milovidov · Pull Request #108848 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix signed integer overflow in DateLUTImpl::toStartOfHourInterval#108848

Merged
alexey-milovidov merged 13 commits into
masterfrom
fix-tostartofhourinterval-overflow
Jul 3, 2026
Merged

Fix signed integer overflow in DateLUTImpl::toStartOfHourInterval#108848
alexey-milovidov merged 13 commits into
masterfrom
fix-tostartofhourinterval-overflow

Conversation

@alexey-milovidov

@alexey-milovidov alexey-milovidov commented Jun 29, 2026

Copy link
Copy Markdown
Member

UndefinedBehaviorSanitizer reported a signed integer overflow in the master Stress test (amd_asan_ubsan) job:

src/Common/DateLUTImpl.h:1148:32: runtime error: signed integer overflow: -2208988800 + -9223372034646541216 cannot be represented in type 'Time' (aka 'long')
    #0 DateLUTImpl::toStartOfHourInterval<long>(long, unsigned long) const src/Common/DateLUTImpl.h:1148:32
    #2 execute<...DataTypeDateTime64...(IntervalKind::Kind)5> src/Functions/toStartOfInterval.cpp:293:82

The AST fuzzer mutated the regression test added in #108766 into a multi-hour toIntervalHour variant and reached the final date + time reconstruction in DateLUTImpl::toStartOfHourInterval, which overflows below the minimum of Int64 for DateTime64 values far outside any valid date range.

Unlike toStartOfMinuteInterval and roundDown (fixed in #108766), toStartOfHourInterval has no fast path, so even UTC goes through this computation. The INTERVAL 1 HOUR case in the existing test short-circuits to toStartOfHour and never reached the overflowing line, so only intervals of more than one hour exercise it.

The fix adds a private static helper addSaturating that performs the final addition saturating at the Time boundaries instead of overflowing. For valid in-range arguments the result is unchanged (verified by a standalone -fsanitize=undefined equivalence check over the valid range); for out-of-range extremes (whose result is meaningless anyway) it saturates rather than invoking undefined behavior. This is a long-standing latent bug, not user-visible in release builds.

A new regression test tests/queries/0_stateless/04498_tostartof_hour_interval_extreme_overflow.sql covers multi-hour intervals over extreme DateTime64 values (the hours != 1 path the existing 04415 test missed).

While addressing review feedback, a second latent issue in the same function was fixed: the hour count is only validated to be positive, so an extreme but positive toIntervalHour makes UInt64 seconds = hours * 3600 wrap. For toIntervalHour(4611686018427387904) it wraps to exactly 0 (2^62 * 3600 ≡ 0 mod 2^64), and the following time / seconds divided by zero before reaching the saturated reconstruction. The conversion now detects the multiplication overflow with __builtin_mul_overflow and saturates the divisor to the maximum, so such meaningless interval counts round down to the start of the day instead of trapping. The regression test was extended with the overflowing interval-count cases.

The same wrap-to-zero affected the MINUTE sibling toStartOfMinuteInterval, where Int64 divisor = 60 * minutes wraps for toIntervalMinute(4611686018427387904) (60 * 2^62 ≡ 0 mod 2^64), and roundDownToMultiple then divided by zero (a trap on x86, undefined behavior under sanitizers; on ARM the hardware division returns zero, masking it). The minute conversion now applies the same __builtin_mul_overflow guard and saturates the divisor to the maximum of Int64. A new regression test tests/queries/0_stateless/04499_tostartof_minute_interval_extreme_overflow.sql covers the overflowing interval-count cases for the MINUTE path. The other interval siblings are safe from this exact wrap-to-zero: WEEK multiplies by 7 (coprime to two), and DAY/MONTH/SECOND divide by the raw positive interval count without a constant multiply.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=f4c91fb747098bd4f69e0889ea3ca47f52322b06&name_0=MasterCI&name_1=Stress%20test%20%28amd_asan_ubsan%29

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

`UndefinedBehaviorSanitizer` reported a signed integer overflow in the
master `Stress test (amd_asan_ubsan)` job: the AST fuzzer mutated the
regression test added in #108766 into a multi-hour `toIntervalHour`
variant and reached the final `date + time` reconstruction in
`DateLUTImpl::toStartOfHourInterval`, which overflows below the minimum
of `Int64` for `DateTime64` values far outside any valid date range.

Unlike `toStartOfMinuteInterval` and `roundDown`, `toStartOfHourInterval`
has no fast path, so even UTC goes through this computation. The
`INTERVAL 1 HOUR` case in the existing test short-circuits to
`toStartOfHour` and never reached the overflowing line, so only intervals
of more than one hour exercise it.

Add a private static helper `addSaturating` that performs the final
addition saturating at the `Time` boundaries instead of overflowing. For
valid in-range arguments the result is unchanged; for out-of-range
extremes (whose result is meaningless anyway) it saturates rather than
invoking undefined behavior.

Regression test:
`tests/queries/0_stateless/04489_tostartof_hour_interval_extreme_overflow.sql`.

CI report:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=f4c91fb747098bd4f69e0889ea3ca47f52322b06&name_0=MasterCI&name_1=Stress%20test%20%28amd_asan_ubsan%29

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-ci label Jun 29, 2026
Comment thread src/Common/DateLUTImpl.h
The `Build (arm_tidy)` job failed because `Time res;` in the new
`addSaturating` helper is declared uninitialized. `clang-tidy` treats
`cppcoreguidelines-init-variables` as an error. Initialize it to `0`
(the value is always overwritten by `__builtin_add_overflow`, this just
satisfies the check).

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=108848&sha=4ad680f167c2e9b50648f6fe3a53cf8073b6460c&name_0=PR&name_1=Build%20%28arm_tidy%29
…st overflow

`toStartOfInterval` only validates that the interval value is positive,
and `IntervalHour` is backed by `Int64`, so a huge but positive hour
count reaches `toStartOfHourInterval`. There `UInt64 seconds = hours * 3600`
wraps; for `toIntervalHour(4611686018427387904)` it wraps to exactly `0`,
and the following `time / seconds` divides by zero (undefined behavior /
`SIGFPE`) before reaching the saturated reconstruction added earlier in
this pull request.

Detect the multiplication overflow with `__builtin_mul_overflow` and
saturate the divisor to the maximum, consistent with the saturating
philosophy of this change: the rounding result for such out-of-range
interval counts is meaningless anyway, so a normal value simply rounds
down to the start of its day instead of trapping.

Extend `04489_tostartof_hour_interval_extreme_overflow` with the
overflowing interval-count cases (found by the AI review).
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Continued work on this PR:

  • Fixed the Build (arm_tidy) red (the only CI failure): clang-tidy flagged the new addSaturating helper's Time res; as cppcoreguidelines-init-variables. Initialized it to 0 (92a92b72ba5).
  • Addressed the AI-review blocker (src/Common/DateLUTImpl.h:1161): toStartOfInterval only rejects non-positive interval values, so a huge but positive toIntervalHour reaches toStartOfHourInterval, where UInt64 seconds = hours * 3600 wraps. toIntervalHour(4611686018427387904) wraps it to exactly 0 and the following time / seconds divided by zero before reaching the saturated reconstruction. The conversion now uses __builtin_mul_overflow and saturates the divisor to the maximum (3f05677c341), consistent with the saturating philosophy of this fix. Extended 04489_tostartof_hour_interval_extreme_overflow with the overflowing interval-count cases and resolved the thread.
  • Merged the latest master (the branch was 154 commits behind).

Verification (full binary rebuild from this widely-included header is infeasible here):

  • src/Functions/toStartOfInterval.cpp (which instantiates toStartOfHourInterval) compiles cleanly with the change.
  • Called the real DateLUTImpl::toStartOfHourInterval against the built clickhouse_common_io: toStartOfHourInterval<Int64>(1624331862, 4611686018427387904) now returns 1624320000 (2021-06-22 00:00:00) instead of trapping; the normal INTERVAL 2/5 HOUR cases return 1624327200/1624320000 as expected, matching the test reference.

Comment thread src/Common/DateLUTImpl.h
…gainst overflow

This is the `MINUTE` sibling of the `toStartOfHourInterval` overflow guarded in the previous commit.
`toStartOfInterval` only validates that the interval value is positive, so a huge but positive
`toIntervalMinute` reaches `toStartOfMinuteInterval`, where `Int64 divisor = 60 * minutes` wraps.
`INTERVAL 4611686018427387904 MINUTE` wraps the divisor to exactly zero (`60 * 2^62 ≡ 0 mod 2^64`),
and the following division by it is undefined behaviour: a division-by-zero trap on x86 and a
`UndefinedBibSanitizer` report under sanitizers (on ARM the hardware returns zero, which is why the
release binary silently produced `1970-01-01 00:00:00`).

The conversion now detects the multiplication overflow with `__builtin_mul_overflow` and also clamps a
product that exceeds the maximum of `Int64`, saturating the divisor to `Int64::max`. For in-range
interval counts the divisor is unchanged; for meaningless extreme counts a normal value simply rounds
down to the start of the epoch instead of trapping.

A new regression test `04489_tostartof_minute_interval_extreme_overflow` covers the overflowing
interval-count cases for the `MINUTE` path that the `HOUR` test missed.
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Continued work on this PR:

  • Addressed the AI-review blocker (src/Common/DateLUTImpl.h:1140, the MINUTE sibling of the previously-guarded HOUR path): toStartOfInterval only rejects non-positive interval values, so a huge but positive minute count reaches toStartOfMinuteInterval, where Int64 divisor = 60 * minutes wraps. INTERVAL 4611686018427387904 MINUTE wraps the divisor to exactly zero (60 * 2^62 ≡ 0 mod 2^64), and roundDownToMultiple(t, divisor) then divides by zero — a trap on x86 and a UndefinedBehaviorSanitizer report under sanitizers (on ARM the hardware division returns zero, which is why the release binary silently produced 1970-01-01 00:00:00). The conversion now uses __builtin_mul_overflow and clamps a product exceeding the maximum of Int64, saturating the divisor to Int64::max (70c34a87b00). Extended coverage with a new regression test 04489_tostartof_minute_interval_extreme_overflow and resolved the thread.
  • Confirmed the other interval siblings are safe from this exact wrap-to-zero: WEEK multiplies by 7 (coprime to two, never wraps to zero), and DAY/MONTH/SECOND divide by the raw positive interval count without a constant multiply.

Verification (full binary rebuild from this widely-included header is infeasible here):

  • The edited src/Common/DateLUTImpl.h parses cleanly with clang++ -std=c++23 -fsyntax-only; both new variables are initialized (no cppcoreguidelines-init-variables regression).
  • The saturating divisor arithmetic was checked against a verbatim copy of roundDownToMultiple under -fsanitize=undefined with -fno-sanitize-recover=all: normal divisors are unchanged and the extreme count produces no undefined behaviour (roundDownToMultiple(1624331862, Int64::max) = 0).
  • The full test runs clean against a real clickhouse local and matches the reference; the unfixed code reaches the division by 0 for the extreme MINUTE count.

The three remaining red Stress test jobs are the known Hung check failed, possible deadlock found flake (#107941, linked by the CI comment itself); the hung stacks are in QueryAnalyzer::resolveQuery / executeASTFuzzerQueries / TCPHandler with no frames in the changed DateLUTImpl code.

`04489_max_threads_auto_parsing_compat` has since merged to `master`,
taking the `04489` prefix this PR's two regression tests were using.
Renumber them to the next free prefixes against `master` so they do not
share a number with an already-merged test (matching what `add-test`
would assign today):

- `04489_tostartof_hour_interval_extreme_overflow`   -> `04490_...`
- `04489_tostartof_minute_interval_extreme_overflow` -> `04491_...`

The cross-reference comment in the `MINUTE` test is updated to point at
the renamed `HOUR` test. The SQL and reference contents are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Continued work on this PR:

  • Renumbered the two regression tests to avoid an 04489 prefix collision on master (5e1fac428a6). Since the previous CI run, 04489_max_threads_auto_parsing_compat merged to master and took the 04489 prefix that this PR's two tests were using, which would have left three tests sharing 04489 after merge. Renamed to the next free prefixes against master (what add-test would assign today):
    • 04489_tostartof_hour_interval_extreme_overflow04490_tostartof_hour_interval_extreme_overflow
    • 04489_tostartof_minute_interval_extreme_overflow04491_tostartof_minute_interval_extreme_overflow
      The cross-reference comment in the MINUTE test was updated to point at the renamed HOUR test; SQL and reference contents are unchanged, and the PR description was updated to reference the new names.

State: AI Review verdict is ✅ Approve, all review threads resolved (0 unresolved), and the PR is MERGEABLE with no conflicts. The branch is 208 commits behind master but the merge-base is from yesterday (< 1 day) and CI is green, so no re-merge was warranted. CI re-runs on the renamed tests; the only previously-red jobs were the known Stress test Hung check flake (#107941, unrelated — no frames in the changed DateLUTImpl code).

alexey-milovidov and others added 3 commits July 1, 2026 02:56
…ster

Since the previous CI run, master gained new stateless tests using the
`04490` and `04491` prefixes that this PR's two regression tests were
occupying (`04490_dict_get_keys_float_signed_zero_consistency`,
`04490_table_readonly_partition_commands`,
`04491_dict_get_keys_const_path_non_coordinator_layout`,
`04491_table_readonly_move_partition_to_table`). Merging this PR as-is
would leave three tests sharing each of those prefixes.

Renamed to the next free prefixes against `master` (what `add-test` would
assign today):

  04490_tostartof_hour_interval_extreme_overflow   -> 04492_...
  04491_tostartof_minute_interval_extreme_overflow -> 04493_...

The cross-reference comment in the `MINUTE` test was updated to point at
the renamed `HOUR` test. SQL and reference contents are otherwise
unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The `MINUTE` regression test's header comment pointed at the old
`04490_tostartof_hour_interval_extreme_overflow` name; update it to the
renamed `04492_tostartof_hour_interval_extreme_overflow`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Continued work on this PR:

  • Re-merged the latest master and renumbered the two regression tests again to avoid a fresh 04490/04491 prefix collision (abdd64cc87d, 52516bcf9ce). Since the last CI run, master gained new stateless tests that took the 04490 and 04491 prefixes this PR's tests were using (04490_dict_get_keys_float_signed_zero_consistency, 04490_table_readonly_partition_commands, 04491_dict_get_keys_const_path_non_coordinator_layout, 04491_table_readonly_move_partition_to_table), which would have left three tests sharing each prefix after merge. Renamed to the next free prefixes against master (what add-test would assign today):

    • 04490_tostartof_hour_interval_extreme_overflow04492_tostartof_hour_interval_extreme_overflow
    • 04491_tostartof_minute_interval_extreme_overflow04493_tostartof_minute_interval_extreme_overflow

    The cross-reference comment in the MINUTE test was updated to point at the renamed HOUR test; SQL and reference contents are otherwise unchanged, and the PR description was updated to reference the new names. The master merge is conflict-free and the src/Common/DateLUTImpl.h fix (the addSaturating helper plus the __builtin_mul_overflow divisor guards for both the HOUR and MINUTE paths) is intact.

State: AI Review verdict is ✅ Approve, both review threads (the HOUR and MINUTE blockers) are resolved (0 unresolved), and the PR is MERGEABLE with no conflicts.

The three red jobs on the previous head were all known master-wide flakes, unrelated to this date-arithmetic change:

  • Stress test (amd_tsan) / Stress test (arm_tsan)Hung check failed, possible deadlock found (#107941, linked by the CI comment itself); the hung stacks are in QueryAnalyzer / executeASTFuzzerQueries / TCPHandler, with no frames in the changed DateLUTImpl code.
  • Upgrade check (amd_release)Error message in clickhouse-server.log, a PostgreSQLConnectionPool connection timeout to the RFC 5737 documentation IP 192.0.2.1 from a DatabasePostgreSQL background pool. This is a master-wide flake hitting ~20 unrelated PRs per day (per CIDB) and cannot be caused by a date-arithmetic UBSan fix.

CI re-runs from scratch on the renumbered head.

@alexey-milovidov

Copy link
Copy Markdown
Member Author

@groeneai, please investigate this unrelated CI failure and provide a fix in a separate PR (or link an in-progress one). It is not caused by this PR (a DateLUTImpl date-arithmetic UBSan fix) — it is a master-wide flake.

Findings so far:

  • The DatabasePostgreSQL at 192.0.2.1 comes from tests/queries/0_stateless/04210_show_remote_databases_in_system_tables.sql (added 2026-05-13, b471f65a25f, unchanged since). Its background removeOutdatedTables cleaner fires while the database exists (between CREATE and the DROP at the end of the test), tries to connect to the unreachable documentation host, times out, and logs the <Error> line the upgrade check flags.
  • CIDB shows this exact error is a master-wide flake: first seen 2026-06-25 21:40, 53 occurrences through 2026-07-01 across ~20 unrelated PRs per day. The test predates the onset by six weeks, so the trigger is a master change around 2026-06-24/25 (e.g. to the PostgreSQL engine's background cleaner scheduling, its logging level, or the upgrade-check error allowlist), not the test itself.
  • No open tracking issue was found. Please open one and link it here, and consider either downgrading this expected-in-tests connection failure below <Error> for the PostgreSQL engine, allowlisting it in the upgrade check, or preventing the cleaner from running for a just-created unreachable database.

@groeneai

groeneai commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@alexey-milovidov Confirmed: this is the 04210_show_remote_databases_in_system_tables.sql fixture noise, unrelated to the DateLUTImpl change. The test creates ENGINE = PostgreSQL('192.0.2.1:5432', ...) and ENGINE = MySQL('192.0.2.1:3306', ...) at the RFC 5737 TEST-NET-1 unreachable host; on the upgrade restart the PG removeOutdatedTables cleaner and the MySQL pool connect, time out, and log <Error>, tripping the zero-<Error> assertion. CIDB matches your numbers: 53 hits, first 2026-06-25 21:40, 52 distinct PRs, 0 on master.

Fix is already approved and green: #108560 allowlists both the PG (PostgreSQLConnectionPool / DatabasePostgreSQL::removeOutdatedTables) and MySQL (mysqlxx::Pool / DatabaseMySQL) fixture connection-error lines in tests/docker_scripts/upgrade_runner.sh, each matcher requiring component AND connection-failure symptom together so real engine errors still surface. You approved it on 2026-06-28 and confirmed Upgrade check (amd_release) green after the 2026-06-30 /continue-pr master merge. It just needs to merge; then this line clears for new and rebased runs, including this one.

Tracking issue: since #108560 is an approved, green fix awaiting merge, it already serves as the tracking artifact, so I did not open a duplicate. If you still want a standalone issue to cross-reference the ~52 affected PRs independently of the PR lifecycle, say the word and I will open one.

Agreed on the ~2026-06-25 onset (a master change to the cleaner scheduling/log-level or the upgrade-check allowlist, not the six-week-old fixture). The #108560 matchers are signature-based with no hardcoded IP, so they cover it regardless of the underlying trigger.

alexey-milovidov and others added 2 commits July 1, 2026 06:04
Since the last master merge, master gained `04492_regexp_pattern_with_nul_byte`,
which took the `04492` prefix this PR's `04492_tostartof_hour_interval_extreme_overflow`
was using. Renamed both tests to the next free prefixes against `master` (what
`add-test` would assign today):

  04492_tostartof_hour_interval_extreme_overflow   -> 04493_tostartof_hour_interval_extreme_overflow
  04493_tostartof_minute_interval_extreme_overflow -> 04494_tostartof_minute_interval_extreme_overflow

SQL and reference contents are unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Continued work on this PR (/continue-pr):

  • Renumbered the two regression tests again to avoid a fresh 04492 prefix collision on master (28c7ceda4cb, 6afd46a3338). Since the last merge, master gained 04492_regexp_pattern_with_nul_byte (from f135d0e70e7, which landed on master after this branch's merge-base, so it wasn't visible during the previous renumber), taking the 04492 prefix this PR's hour test was using. Renamed both tests to the next free prefixes against master (what add-test would assign today):

    • 04492_tostartof_hour_interval_extreme_overflow04493_tostartof_hour_interval_extreme_overflow
    • 04493_tostartof_minute_interval_extreme_overflow04494_tostartof_minute_interval_extreme_overflow

    The cross-reference comment in the MINUTE test was updated to point at the renamed HOUR test; SQL and reference contents are otherwise unchanged, and the PR description was updated to reference the new names. The src/Common/DateLUTImpl.h fix (the addSaturating helper plus the __builtin_mul_overflow divisor guards for both the HOUR and MINUTE paths) is intact.

State: AI Review verdict is ✅ Approve, 0 unresolved review threads, and the PR is MERGEABLE. No re-merge was needed — the merge-base is less than a day old and master has not touched src/Common/DateLUTImpl.h since. CI re-runs from scratch on the renumbered head 6afd46a3338.

The previously-noted red jobs were all master-wide flakes unrelated to this date-arithmetic change (Stress test (amd/arm_tsan) Hung check #107941; Upgrade check (amd_release) PostgreSQLConnectionPool/mysqlxx::Pool fixture noise from 04210, fixed by the approved-and-green #108560 per @groeneai's confirmation above).

@yariks5s yariks5s self-assigned this Jul 1, 2026
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Continued work on this PR (/continue-pr):

The only non-green CI result on the current head (6afd46a3) is one flaky stateless shard, Stateless tests (amd_msan, WasmEdge, parallel, 2/2)00115_shard_in_incomplete_result. It failed with the master-wide distributed desync signature:

Code: 102. DB::Exception: Unexpected packet from server 127.0.0.2:9000
(expected TablesStatusResponse, got ProfileInfo): While executing Remote. (UNEXPECTED_PACKET_FROM_SERVER)

This is unrelated to this PR (a DateLUTImpl date-arithmetic UBSan fix touching only src/Common/DateLUTImpl.h, src/Functions/toStartOfInterval.cpp, and two new regression tests — nothing to do with distributed remote() queries or TablesStatusResponse). It is the known master-wide flake tracked in #93018, caused by #108630, whose fix is the open PR #108854 (not yet merged into master, so a re-merge here would not clear it).

I re-ran the failed shard (workflow run 28497289397, --failed) to try to green the gate. CH Inc sync is pending (private), not failed. Mergeable Check/PR/Report messages are red only because they aggregate the flaky shard above. No source or test changes were needed; the diff remains AI-approved with 0 unresolved review threads. The two regression tests keep their 04493/04494 prefixes — master has no test at those prefixes, so there is no collision to renumber.

@alexey-milovidov

Copy link
Copy Markdown
Member Author

Continued work on this PR (/continue-pr):

The flaky shard I re-ran in the previous update (Stateless tests (amd_msan, WasmEdge, parallel, 2/2)00115_shard_in_incomplete_result, the master-wide UNEXPECTED_PACKET_FROM_SERVER desync #93018 / open fix #108854) passed on the re-run, so the gate cleared. The current head 6afd46a3 is now fully green:

  • Public CI report: 0 failed (148 passed, 22 skipped); every job is OK.
  • CH Inc sync, Mergeable Check, and the overall PR gate are all success.
  • AI Review verdict is ✅ Approve on the current head, with 0 unresolved review threads.
  • mergeable is MERGEABLE; mergeStateStatus is UNSTABLE only because of the non-required Report messages context — the required Mergeable Check gate is success (stamped after Report messages), and the full re-run report has zero non-OK rows, so it does not block.

No source or test changes were needed. No re-merge was warranted — the merge-base is from yesterday (< 1 day old), CI is green, and master has not touched src/Common/DateLUTImpl.h since. The two regression tests keep their 04493/04494 prefixes.

…on on master

Since the last CI run, `master` gained multiple new stateless tests at the
`04493` and `04494` prefixes this PR's two tests were using (e.g.
`04493_clickhouse_local_max_num_to_throw`, `04493_distributed_alias_column_reference`,
`04493_jit_simple_regexp`, `04494_jit_simple_regexp_extract_replace`,
`04494_qbit_stride_dot_product_transposed`), which would have left several tests
sharing each prefix after merge.

Renamed both tests to the next free prefixes against `master` (what `add-test`
would assign today):
  04493_tostartof_hour_interval_extreme_overflow   -> 04498_tostartof_hour_interval_extreme_overflow
  04494_tostartof_minute_interval_extreme_overflow -> 04499_tostartof_minute_interval_extreme_overflow

The cross-reference comment in the MINUTE test was updated to point at the renamed
HOUR test; SQL and reference contents are otherwise unchanged.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Continued work on this PR (/continue-pr):

  • Renumbered the two regression tests again to avoid a fresh 04493/04494 prefix collision on master (d5d03cdb172). Since the last CI run (2026-07-01), master gained several new stateless tests at both prefixes this PR's tests were using — 04493_clickhouse_local_max_num_to_throw, 04493_distributed_alias_column_reference, 04493_docs_web_ui_badge_strip, 04493_jit_simple_regexp, 04493_qbit_distance_nullable_reference, 04494_jit_simple_regexp_extract_replace, and 04494_qbit_stride_dot_product_transposed — which would have left several tests sharing each prefix after merge. Renamed both tests to the next free prefixes against master (what add-test would assign today, highest occupied prefix is 04497):

    • 04493_tostartof_hour_interval_extreme_overflow04498_tostartof_hour_interval_extreme_overflow
    • 04494_tostartof_minute_interval_extreme_overflow04499_tostartof_minute_interval_extreme_overflow

    The cross-reference comment in the MINUTE test was updated to point at the renamed HOUR test; SQL and reference contents are otherwise unchanged (the rename diff is 100% similar apart from that one comment line), and the PR description was updated to reference the new names.

State: the actual fix in src/Common/DateLUTImpl.h (the addSaturating helper plus the __builtin_mul_overflow divisor guards for both the HOUR and MINUTE paths) and src/Functions/toStartOfInterval.cpp is intact — master has not touched either file since the merge-base, so there is no code conflict. No re-merge was warranted: the PR is MERGEABLE with no conflicts and the merge-base is less than a week old (2026-06-30). The last full CI run on 6afd46a3 was green (0 failed), AI Review verdict was ✅ Approve, @yariks5s approved, and there are 0 unresolved review threads. CI re-runs from scratch on the renumbered head d5d03cdb172.

@alexey-milovidov

Copy link
Copy Markdown
Member Author

Continued work on this PR (/continue-pr):

The fresh CI on the current head (d5d03cdb1, the renumbered tests) settled with two red checks, both unrelated to this change (which touches only DateLUTImpl.h and the two regression tests):

  • Fast test (arm_darwin)04319_skip_unavailable_shards_mode_table_missing ("Some queries hung"). This is a master-wide macOS hang that started on 2026-07-03 — it is currently failing on ~12 unrelated PRs (a client socket poll hangs and the harness suspends it). A fix is already in flight: Skip 04319_skip_unavailable_shards_mode_table_missing on macOS fast test #109254 (skip the test on macOS). Since that PR is not yet in master, re-merging here would not clear the red — I'll re-merge to pick up the skip once it lands.
  • Stateless tests (amd_tsan, parallel, 1/2)04312_client_chime_on_slow_query_92718. A genuine flake in this new timing-sensitive terminal-bell test: over the last 7 days it is 16984 OK / 2 FAIL, and both failures are on unrelated PRs (this one and Add test for predicate pushdown through view over Distributed UNION ALL #108943). @groeneai, could you take a look and open a tracking issue/fix in a separate PR? It has no tracking issue yet.

Neither failure exercises toStartOfHourInterval/toStartOfMinuteInterval. The fix code and the yariks5s approval are unchanged; AI Review is still ✅ Approve with no unresolved threads.

I did not re-merge, re-run, or change any code this pass (the darwin fix must land in master first; the chime shard is a flake that a re-merge will re-run anyway).

@groeneai

groeneai commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

@alexey-milovidov Already root-caused and fixed: merged PR #109001 (2026-07-01).

Root cause: cases 4 and 11 ran SELECT sleep(1.5), throwIf(1 = 1, 'expected error') as two sibling projection columns whose evaluation order is unspecified. When throwIf evaluates before sleep, the query throws at ~0s elapsed, below the --chime 1 (1s) threshold (ClientBase.cpp:2677-2683), so the bell is suppressed (correctly) and the pty case prints no BEL while 'expected error' is still present. tsan/asan thread-fuzzer slowdown widens the window. The fix nests the sleep inside the predicate: SELECT throwIf(sleep(1.5) = 0, 'expected error'), so sleep(1.5) always runs before the throw and elapsed is always >= 1.5s, making the chime deterministic while still throwing the expected error.

The FAIL here is stale base. #108848's commit d5d03cdb still carries the old sibling-column form (lines 60 and 124) and does not contain #109001 (diverged from the merge, behind_by 35). Merging master past 2026-07-01 05:51Z clears it. On master, 04312 is 858 OK / 0 FAIL since the merge (2 days), and the only 04312 FAIL anywhere after the merge is that one stale-base run on this PR.

Since the fix is already merged, a separate tracking issue seems unnecessary, but happy to open one linking #109001 if you would like it for the record.

@alexey-milovidov

Copy link
Copy Markdown
Member Author

Re-merged master (d5d03cdb172..2be840fbaa6) now that the two external causes of the prior red CI have landed:

The merge was conflict-free; master touched neither DateLUTImpl.h nor toStartOfInterval.cpp since the merge-base, so the fix diff is unchanged (+110/-3 across DateLUTImpl.h + the two extreme-overflow tests). Fresh CI is running; Mergeable Check should go green.

@clickhouse-gh

clickhouse-gh Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.60% 85.60% +0.00%
Functions 92.70% 92.70% +0.00%
Branches 77.80% 77.70% -0.10%

Changed lines: Changed C/C++ lines covered: 19/20 (95.00%) · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov added this pull request to the merge queue Jul 3, 2026
Merged via the queue into master with commit f687901 Jul 3, 2026
174 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-tostartofhourinterval-overflow branch July 3, 2026 21:47
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants