Add `--chime` option for ASCII BEL on slow query finish (issue #92718) by groeneai · Pull Request #104545 · ClickHouse/ClickHouse · GitHub
Skip to content

Add --chime option for ASCII BEL on slow query finish (issue #92718)#104545

Merged
alexey-milovidov merged 19 commits into
ClickHouse:masterfrom
groeneai:groeneai/issue-92718-bel-chime-on-slow-query
Jun 15, 2026
Merged

Add --chime option for ASCII BEL on slow query finish (issue #92718)#104545
alexey-milovidov merged 19 commits into
ClickHouse:masterfrom
groeneai:groeneai/issue-92718-bel-chime-on-slow-query

Conversation

@groeneai

@groeneai groeneai commented May 10, 2026

Copy link
Copy Markdown
Contributor

Adds the --chime command-line flag to ClientBase so that clickhouse-client, clickhouse-local, and the embedded client emit the ASCII BEL control character (\x07) on stderr when a query finishes after running for at least N seconds. Whether the terminal makes a sound or a visual flash is decided by the terminal's own user preferences — exactly the design @alexey-milovidov suggested in the issue.

Motivation: when a user is waiting on a long-running query and works on something else, the terminal stays silent — the user has to keep checking. Reusing the standard BEL control character lets every terminal emulator handle the alert according to the user's existing audio/visual preferences, without ClickHouse needing to bundle audio assets.

Usage (default-on per reviewer request):

(omitted)        enabled with the default 5-second threshold
--chime          enabled with the default 5-second threshold
--chime N        enabled with a custom threshold of N seconds
--chime 0        disabled (no BEL is emitted)

Behaviour:

  • The BEL is emitted at the end of ClientBase::processParsedSingleQuery, after the existing Elapsed: / Processed rows: reporting.
  • It fires on both success and error paths, so a query that fails after running long still chimes — that is when the user most wants to know.
  • It uses progress_indication.elapsedSeconds, which is reset per query, so each statement in a multi-statement script is timed independently.

Test: tests/queries/0_stateless/04219_client_chime_on_slow_query_92718.sh covers six cases — --chime 1 with a slow query (BEL), --chime 10 with a fast query (no BEL), no --chime flag with a query below the default 5-second threshold (no BEL), --chime 1 with a slow query that ends in an error (BEL), clickhouse-local --chime 1 with a slow query (BEL) to confirm the wiring is in ClientBase rather than the network-bound client only, and --chime 0 with a slow query to confirm the explicit disable path (no BEL).

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Added the --chime command-line option to clickhouse-client, clickhouse-local, and the embedded client. When a query finishes (on success or on error) after running for at least N seconds, the client writes the ASCII BEL control character (\x07) to stderr. Terminals decide whether to make a sound or a visual flash based on the user's preferences. Enabled by default with a 5-second threshold; pass --chime N for a custom threshold or --chime 0 to disable. Closes #92718.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Note

Medium Risk
Changes default CLI behavior for interactive/TTY runs by emitting \x07 after slow queries, which could affect terminal output expectations. Logic is gated by a threshold and TTY detection, but still touches the query-completion path shared by clickhouse-client and clickhouse-local.

Overview
Adds a new --chime[=N] option (default threshold 5s, 0 disables) that writes ASCII BEL (\x07) to stderr when a query finishes after running longer than the configured threshold.

The chime is emitted on both success and error paths, but only when stderr is a TTY to avoid contaminating redirected/captured stderr. Includes a new stateless test validating threshold behavior, TTY suppression, and both clickhouse-client and clickhouse-local coverage.

Reviewed by Cursor Bugbot for commit 80f543a. Bugbot is set up for automated code reviews on this repo. Configure here.

…ouse#92718)

Implements the chime feature requested in ClickHouse#92718: an optional command-line
flag that emits the ASCII `BEL` control character (`\x07`) on stderr after
a query finishes when it ran for at least `--chime N` seconds. This lets a
user attending to other work be alerted when a long-running query completes;
whether the terminal makes a sound or a visual flash is decided by the
terminal's own user preferences.

The option is wired in `ClientBase::addCommonOptions`, so it works in
`clickhouse-client`, `clickhouse-local`, and the embedded client as
@alexey-milovidov requested. Usage:

  --chime         enable with the default 10-second threshold
  --chime N       enable with a custom threshold of N seconds
  (omitted) or 0  disabled (no BEL is emitted)

The BEL is emitted at the end of `processParsedSingleQuery`, so it fires on
both success and error paths.

Test: `tests/queries/0_stateless/04219_client_chime_on_slow_query_92718.sh`
covers the five interesting cases — `--chime 1` with slow query (BEL),
`--chime 10` with fast query (no BEL), no `--chime` (no BEL), `--chime 1`
with slow query that errors (BEL), and the same in `clickhouse-local`.

Closes ClickHouse#92718.
@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

cc @alexey-milovidov @azat — could you review this? It adds the --chime N CLI flag in ClientBase, which emits the ASCII BEL control character (\x07) on stderr when a query finishes after running for at least N seconds. Wired through ClientBase::addCommonOptions, so it applies to clickhouse-client, clickhouse-local, and the embedded client per the request in #92718.

Comment thread src/Client/ClientBase.cpp Outdated
@alexey-milovidov alexey-milovidov self-assigned this May 10, 2026
@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label May 10, 2026
@clickhouse-gh

clickhouse-gh Bot commented May 10, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [79c181b]

Summary:


AI Review

Summary

This PR adds a default-on --chime client option that emits BEL on stderr for slow TTY query completions, wires the threshold through ClientBase, documents the option, and adds focused stateless coverage for TTY emission, redirected-stderr suppression, explicit disable, implicit/default values, and the server-error path. I found no new correctness, safety, compatibility, or documentation findings in the current diff; the prior bot-authored inline findings are addressed in the current code.

Missing context / blind spots
  • ⚠️ The latest workflow for commit 79c181bd3fc27d06054c4eb104bf021358891a4b was still running during this review, and I did not run 04312_client_chime_on_slow_query_92718 locally. A completed CI run covering the new stateless test would close this runtime-validation gap.
Final Verdict

✅ No blockers or majors found. No new inline review comments were posted.

@clickhouse-gh clickhouse-gh Bot added the pr-improvement Pull request with some product improvements label May 10, 2026
Change the `--chime` option on `clickhouse-client`, `clickhouse-local`,
and the embedded client from `implicit_value(10)->default_value(0)` to
`implicit_value(5)->default_value(5)`, per @alexey-milovidov's review on
PR ClickHouse#104545: "Let's set both implicit and default values to 5 seconds."

New semantics:
* omit `--chime`         -> 5-second threshold (chime enabled by default)
* `--chime`              -> 5-second threshold (implicit value)
* `--chime N`            -> N-second threshold (custom)
* `--chime 0`            -> disabled (no BEL emitted)

Also update the option's help text accordingly and extend the regression
test (`tests/queries/0_stateless/04219_client_chime_on_slow_query_92718.sh`)
with a new case 6 that verifies `--chime 0` explicitly disables the chime
even on a query above the default threshold. Case 3's description is
updated to reflect that "no `--chime`" with a 1.5s query stays silent
because 1.5s is still below the 5s default, not because chime is off.

Verified locally:
* `--chime 1`  + `sleep(1.5)`                  -> BEL emitted on stderr
* (no flag)    + `sleep(1.5)` (< default)      -> no BEL
* (no flag)    + `sleepEachRow(2) x 3 = 6s`    -> BEL emitted (default-on)
* `--chime 0`  + `sleep(1.5)`                  -> no BEL (explicit disable)
* full regression test 04219 passes (8.65s).

Related thread: ClickHouse#104545 (comment)
Comment thread tests/queries/0_stateless/04312_client_chime_on_slow_query_92718.sh
groeneai and others added 2 commits May 11, 2026 01:48
In case ClickHouse#4 the previous assertion only checked for `BEL` on stderr; it
never verified that the query actually failed. If `throwIf` semantics
change or the query is rewritten so it no longer errors, case ClickHouse#4 could
silently keep printing `BEL` (since the query runs longer than the
threshold) and the test would still pass — losing the "chime on
failure" coverage this case was added to provide.

Now case ClickHouse#4 explicitly:
  1. asserts a non-zero exit code (query must have failed),
  2. asserts that `'expected error'` is present in stderr (the error
     came from `throwIf` and not from an unrelated client/network
     issue),
  3. only then checks for `BEL`.

Any of the first two conditions failing prints a diagnostic
"FAIL: ..." line which won't match the reference, so the test fails
loudly with a clear reason.

Issue: ClickHouse#92718
PR: ClickHouse#104545
Review thread: ClickHouse#104545 (comment)
The `--chime` option emits ASCII `BEL` (`\x07`) to stderr when a query
runs longer than the threshold. With the default-on behaviour at 5
seconds, this writes BEL bytes into every stderr stream — including
captured ones. The result is 10+ Fast test failures (e.g.
`00156_max_execution_speed_sample_merge`,
`02703_max_local_write_bandwidth`, `03217_primary_index_memory_leak`)
where `clickhouse-test` redirects stderr and rejects any non-empty
content (`Reason: having stderror`).

Add a `stderr_is_a_tty` guard at the emission site in `ClientBase`.
BEL now only fires when stderr is attached to a terminal — which is the
only context where a chime is meaningful anyway. Redirected stderr
(file, pipe, automation) is left untouched, so captured stderr stays
clean. The interactive default-on behaviour requested in the review
is preserved.

Strengthen test `04219_client_chime_on_slow_query_92718.sh`:
- Cases 1, 4, 5: verify the TTY guard suppresses BEL when stderr is
  redirected to a file — exercises the fix directly.
- Cases 7, 8: run the client under `script -qc` to attach stderr to a
  pseudo-terminal and verify BEL still fires on a TTY (positive case)
  and that `--chime 0` overrides it on a TTY (explicit disable).
- Case 4 still asserts the error path actually fires (non-zero exit
  code plus `expected error` in stderr) before checking BEL.

Verified locally:
  - `04219_client_chime_on_slow_query_92718`: all 8 cases pass.
  - `00156_max_execution_speed_sample_merge`: passes (45s query, stderr empty).
  - `02703_max_local_write_bandwidth`: passes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

Fix: suppress BEL when stderr is not a TTY (commit 8cd83de)

CI flagged a PR-caused regression on this branch: 10+ Fast tests started failing with Reason: having stderror and \x07\x07 bytes in their captured stderr. The cause is exactly the trade-off discussed in the default-on review: with the chime enabled by default at 5s, any long-running query writes a BEL into the stderr stream, and clickhouse-test rejects non-empty stderr.

Affected tests (from CIDB): 00156_max_execution_speed_sample_merge, 02703_max_local_write_bandwidth, 03217_primary_index_memory_leak, 02900_buffer_table_alter_race, 03257_async_insert_native_empty_block, 03014_async_with_dedup_part_log_rmt, 03233_async_insert_infile_format, 02072_rdb_recovery_escaped_name, 03006_mv_deduplication_throw_if_async_insert, 03595_set_query_no_eq_set_to_one.

Root cause

ClientBase::processParsedSingleQuery emits '\x07' on error_stream unconditionally when the query exceeds the chime threshold. There is no terminal precondition — so when stderr is redirected to a regular file or pipe (Fast test, integration test, any automation), the BEL bytes contaminate the captured stream.

Fix

Add a stderr_is_a_tty guard at the BEL emission site (member already exists in ClientBase, set in the constructor via isatty(err_fd_)):

if (chime_threshold_seconds > 0
    && stderr_is_a_tty
    && progress_indication.elapsedSeconds() >= ...)
{
    error_stream << '\x07';
    error_stream.flush();
}

A "terminal bell" is only meaningful when there is a terminal — emitting it to a captured stream is by definition useless and only causes harm. The interactive default-on behaviour you requested is preserved: users running clickhouse-client in a real terminal still get the chime by default at 5s.

Test 04219 strengthened

The previous version asserted BEL appears whenever --chime N is set on a slow query. That assertion was actually false in CI (stateless tests redirect stderr — non-TTY) and the test only passed because the BEL bytes were silently accepted as stderr garbage. The new test exercises both halves of the fix:

# Scenario Stderr Expectation
1 --chime 1 + slow query redirected to file no BEL (TTY guard suppresses)
2 --chime 10 + fast query redirected no BEL (below threshold)
3 no flag + 1.5s query redirected no BEL (below default 5s)
4 --chime 1 + slow query then error redirected no BEL + rc!=0 + 'expected error' present
5 clickhouse-local --chime 1 + slow redirected no BEL (same suppression cross-binary)
6 --chime 0 + slow redirected no BEL (explicit disable)
7 --chime 1 + slow under script -qc pty BEL (positive TTY case)
8 --chime 0 + slow under script -qc pty no BEL (explicit disable wins on TTY)

Cases 7 and 8 run the client under script -qc, which allocates a pseudo-terminal and writes the combined pty output to a file we then grep. That's the only way to verify the positive case from a stateless test.

Verification

  • 04219_client_chime_on_slow_query_92718: all 8 cases pass.
  • 00156_max_execution_speed_sample_merge: passes (45s query, captured stderr empty).
  • 02703_max_local_write_bandwidth: passes.
  • Manual clickhouse-client -q 'SELECT sleep(2),sleep(2),sleep(2)' 2>/tmp/err → stderr file is 0 bytes (was contaminated with BEL pre-fix).

Pre-PR Validation Gate

a) Deterministic repro: yes — any query >5s with redirected stderr writes BEL pre-fix.
b) Root cause explained: yes — missing TTY precondition at the emission site.
c) Fix matches root cause: yes — adds the missing precondition; semantically correct for a "terminal bell".
d) Test intent preserved: yes, strengthened — the new test exercises both the TTY-only positive path and the non-TTY suppression path.
e) Both directions demonstrated: yes — affected Fast tests pass; positive TTY case still emits BEL via script -qc.

Status: pushed as a separate commit per the no-amend rule. Existing approvals on prior commits remain visible in the review history; reviewers are welcome to re-approve on the new commit.

Comment thread tests/queries/0_stateless/04312_client_chime_on_slow_query_92718.sh
groeneai and others added 2 commits May 11, 2026 14:55
Adds case 9 to `04219_client_chime_on_slow_query_92718.sh`: under a
pseudo-terminal allocated by `/usr/bin/script -qc`, run `clickhouse-client
--chime 10` with `SELECT sleep(0.1)`. Expects no `BEL` because the query
finishes well under the 10-second threshold.

This is the only path where `BEL` is actually emitted (cases 7-8 also
run under a pty), so it is the only place where a regression of the
threshold gate (`elapsedSeconds() >= chime_threshold_seconds`) would
surface. Cases 1-6 cannot detect such a regression because the TTY
guard suppresses `BEL` regardless of the threshold.

Addresses inline review feedback on PR ClickHouse#104545.
@alexey-milovidov

Copy link
Copy Markdown
Member

@groeneai, investigate the failure: Unit tests (tsan, function_prop_fuzzer) -> FunctionsStress.stress (job) and provide a fix in a separate PR. This is a known flaky test tracked in #104877 and is unrelated to the --chime changes in this PR (which only touch src/Client/ClientBase.cpp and the new 04219_client_chime_on_slow_query_92718 test). The failure surface is gtest_functions_stress reporting 1 threads got stuck while exercising math/SQL functions like roundDown. If a fix is already in progress, link it here.

Comment thread tests/queries/0_stateless/04312_client_chime_on_slow_query_92718.sh
Comment thread tests/queries/0_stateless/04312_client_chime_on_slow_query_92718.sh
@groeneai

Copy link
Copy Markdown
Contributor Author

Confirmed — this is the chronic FunctionsStress.stress "threads got stuck" family, unrelated to the --chime changes here.

The CI hit on commit 7609050917a (2026-05-11 15:26 UTC) is at gtest_functions_stress.cpp:2270 — exactly the May-11 peak shape where the test hit 84 failures across 77 unrelated PRs (asan_ubsan / tsan / msan) in a single day.

Fix status

  • @nikitamikhaylov's PR #104694 ("Wire functions stress test to ProcessList for KILL-QUERY-style timeouts") merged 2026-05-15 14:06 UTC. Post-merge the rate dropped from ~84/day to 1–4/day across all sanitizer builds, with the residual shape moving from line 2270 → 2274 → 2417. The roundDown / math-function shape called out in the directive is the original May-11 surface (line 2270/2274), which Wire functions stress test to ProcessList for KILL-QUERY-style timeouts #104694 substantially addressed.
  • Sibling broken-determinism shapes (caseWithExpression, nullIf(String, FixedString), isDistinctFrom, if(materialize(...)), etc.) under the same FunctionsStress.stress #104877 umbrella are covered by PR #104858 (@Algunenano, OPEN — stringVectorConstantFixedString in FunctionsComparison.h) and PR #104804 (us, OPEN — transform/caseWithExpression wrapper-layer castColumn fix).

The residual line-2417 hangs at ~1 hit/day across all builds in the last 48h are still under investigation — they look like a smaller leftover not addressed by #104694. I'm tracking it in the existing #104877 umbrella; no separate fix PR yet for the post-#104694 residual.

I rebased PR #104545 onto current master (9004589d3c) at 12:36 UTC, so the new CI run should reflect the post-#104694 fix state. Will monitor and re-ping if anything else surfaces.

@alexey-milovidov

Copy link
Copy Markdown
Member

This was fixed by #105146. Let's update the branch.

alexey-milovidov and others added 2 commits May 17, 2026 20:54
clickhouse-gh[bot] inline review on 2026-05-17 flagged two remaining
coverage gaps in `tests/queries/0_stateless/04219_client_chime_on_slow_query_92718.sh`:

1. Default-on threshold is unverified under `pty`. Cases 3 and 5
   redirect stderr, so the TTY guard suppresses `BEL` and they cannot
   tell whether omitting `--chime` correctly picks up the default
   5-second threshold. If `implicit_value` / `default_value` regressed
   to `0` (i.e. accidentally disabled by default) the existing cases
   would still print `no BEL`.

2. Error-path `BEL` is unverified under `pty`. Case 4 captures the
   error contract under redirected stderr where `BEL` is suppressed
   anyway, so a regression that emitted `BEL` only on success would
   still pass.

Add two new TTY cases under `script -qc`:

- Case 10: no `--chime` flag, slow query (~6s) via
  `SELECT sleepEachRow(2) FROM numbers(3) SETTINGS function_sleep_max_microseconds_per_block = 0`,
  expects `BEL`. Catches regressions in the `implicit_value` /
  `default_value` wiring.

- Case 11: `--chime 1`, slow failing query
  `SELECT sleep(1.5), throwIf(1 = 1, 'expected error')` under
  `script -eqc` (propagates child exit code), asserts non-zero
  rc, `'expected error'` in the pty stream, and `BEL`. Catches
  regressions that drop `BEL` on the error path.

Verification (debug build, 5/5 manual runs):
- `clickhouse-test 04219_client_chime_on_slow_query_92718` → OK 20.42s

Bot review IDs: 3254645106 (default-on coverage), 3254645298 (error-path coverage).
@alexey-milovidov

Copy link
Copy Markdown
Member

@groeneai, many tests have failed, and we have no other way than investigating and fixing all of them.

master added several 04303_* and 04304_* stateless tests since the previous
04219 -> 04303 renumber, so the chime test collided again. Renumber it to the
next free prefix 04312.

Also address the reviewer request on the pty matrix:
- Case 8 (`--chime 0` on a TTY) now runs a ~6s query (above the default 5s
  threshold) instead of a 1.5s one, so it genuinely distinguishes explicit
  disable from default-on: if `--chime 0` were ignored and the 5s default used,
  this case would emit `BEL`.
- Add case 12 exercising `--chime` passed without a value (the `implicit_value`
  path, distinct from omitting the flag in case 10) on a slow query over a pty,
  which must emit `BEL`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member

Updated the branch (1b5f31441d7..aa4ac83a746):

  • Merged master (branch was 277 commits behind and red) — zero conflicts. No changes to ClientBase.cpp; the option already matches the requested defaults (implicit_value(5)->default_value(5)).
  • Renumbered the stateless test 04303_client_chime_on_slow_query_9271804312_..., because master has since added several 04303_* and 04304_* tests, recreating a numeric-prefix collision.
  • Addressed the open review thread on the pty matrix:
    • Case 8 (--chime 0 over a pty) now runs a ~6s query (above the default 5s threshold) instead of 1.5s, so it genuinely distinguishes explicit disable from default-on.
    • Added case 12 for --chime passed without a value (the implicit_value(5) path, distinct from omitting the flag in case 10), which must emit BEL.

Verified on a freshly built binary (clang, ARM): the full test passes all 12 cases, and the two new/changed assertions were confirmed independently — --chime (implicit) on a ~6s query over a pty emits BEL, --chime 0 on the same query emits no BEL, and the default-on path (no flag) emits BEL.

Both previously-open review threads are now resolved.

The remaining CI failures (Server died, FunctionsStress #104877, BuzzHouse stack-use-after-scope STID 2518-4199, and the MutateTask splitAndModifyMutationCommands LOGICAL_ERROR STID 5052-4746 #91015) are the chronic, master-wide flakes already triaged in this thread — none of them touch the client-side BEL emission path.

Comment thread src/Client/ClientBase.cpp
Add a --chime [N] entry to the execution-details table in the client
docs, covering the default 5 second threshold, the no-value form using
the default threshold, TTY-only emission, and --chime 0 to disable.

Requested by the docs review on PR ClickHouse#104545 (Issue ClickHouse#92718).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread docs/en/interfaces/client.md Outdated
Align the client docs row with the option help text and the
implementation (stderr_is_a_tty check). The bell is written to stderr,
not stdout, so redirecting stdout does not suppress it while
redirecting stderr does.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

groeneai commented Jun 4, 2026

Copy link
Copy Markdown
Contributor Author

CI Status — covered SHA 395d0b98

CI fully finished. All failures are pre-existing chronic flakies, not PR-caused.

Check Test Disposition
Upgrade check (amd_release) Error message in clickhouse-server.log (see upgrade_error_messages.txt) Chronic upgrade-check flake. The trigger line is a StorageKafka2 / Kafka-test 408 timeout — broker / infra noise, unrelated to the --chime client option. Tracked under task 2026-04-26-ci-p2-cannot-start-clickhouse-server--.
Integration tests (amd_tsan, 1/6) test_refreshable_mat_view_replicated/test.py::test_circular_dependencies_survive_restart Chronic flaky — 14 PRs / 30d, 1 master. Tracked under task 2026-06-04-ci-p2-testcirculardependenciessurviv (P2).
Mergeable Check / PR / CH Inc sync n/a Aggregator/infra rollup.

Code change is purely client-side (--chime option for ASCII BEL on slow-query finish); neither failing area touches client argument parsing or output formatting.

cron:our-pr-ci-monitor:20260604-073000

@alexey-milovidov

Copy link
Copy Markdown
Member

@groeneai, update the branch to see if the upgrade check has been fixed (we had a plenty of fixes recently).

@alexey-milovidov

Copy link
Copy Markdown
Member

@groeneai please look at the comment above ^

…8-bel-chime-on-slow-query

Update branch to latest master per @alexey-milovidov directive
to verify recent fixes resolve the chronic Upgrade check flake.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

groeneai commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

@alexey-milovidov merged master into the branch (commit d92a147, fast-forward push from 395d0b9). 1132 commits caught up, no conflicts. CI is now running on the new HEAD; will report back once it finishes whether the prior chronic Upgrade check / test_circular_dependencies_survive_restart flakies are gone.

@groeneai

groeneai commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

CI Status — covered SHA d92a147f5c1

CI fully finished after the master merge. The single failure is a pre-existing chronic flake on the TPC-DS test infra, not PR-caused.

Check Test Disposition
Stateless tests (arm_asan_ubsan, azure, parallel) 04033_tpc_ds_q61 transient OOM, NOT PR-caused. MEMORY_LIMIT_EXCEEDED on TPC-DS Q61 join over web-disk-resident store_sales (RSS 53.70 GiB at 53.69 GiB cap). The "Failure reproducibility check: Failed 0 out of 3 reruns" line confirms it passed all 3 reruns — pure transient memory pressure on the runner. Heavy TPC-DS query path is completely unrelated to the --chime BEL client option this PR adds (only src/Client/ClientBase.cpp, docs/en/interfaces/client.md, and a stateless test were touched).

PR is client-only — none of the failing code paths can be reached by this diff.

Session: cron:our-pr-ci-monitor:20260608-083000

@alexey-milovidov

Copy link
Copy Markdown
Member

@groeneai, either fix the transient OOM in 04033_tpc_ds_q61 or disable this test, and send a separate PR. Tag @rienath and @rschu1ze in the fix.

@groeneai

groeneai commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

@alexey-milovidov sent a separate PR per your directive: #106797. Disables 04033_tpc_ds_q61 mirroring the existing 04033_tpc_ds_q35 SKIP pattern (host-level OvercommitTracker arbitration on arm_asan_ubsan/azure/parallel is not reachable from any per-query setting). cc'd @rienath and @rschu1ze on the new PR.

@groeneai

Copy link
Copy Markdown
Contributor Author

Follow-up on the q61 OOM directive: I closed the separate PR (#106797) instead of disabling. rienath reviewed it and asked to find the regression rather than skip the test.

Result: there is no q61 memory regression. q61 has 0 master failures across CIDB's full 90-day retention; its only memory-typed failure ever is the single 2026-06-08 OvercommitTracker OOM on this PR's run, which passed all 3 reruns. It was one casualty of a runner-wide memory event on Stateless tests (arm_asan_ubsan, azure, parallel): MEMORY_LIMIT_EXCEEDED hit 14 tests/10 PRs on 06-08, 311/29 on 06-09, 295/57 on 06-10, with ~24 sibling TPC-DS queries OOMing alongside q61. Host-level RSS contention (current RSS 53.70 GiB, maximum 53.69 GiB), not q61's own ~5 GiB.

So neither fix nor disable applies; q61 stays enabled. For this PR's shard flake, a rerun clears it. Full details on #106797.

@alexey-milovidov

Copy link
Copy Markdown
Member

@groeneai, the root cause of the failure of 04033_tpc_ds_q61 test was fixed. Let's update the branch so we will be able to merge this PR.

@groeneai

Copy link
Copy Markdown
Contributor Author

@alexey-milovidov done. Merged latest master into the branch (clean, no conflicts); new HEAD is eda342cee6f805eff96060d34e89e9ab09b44f3f. Net diff vs master is unchanged (the 4 --chime files only). CI is now running on the updated HEAD; I will confirm once it finishes whether the 04033_tpc_ds_q61 failure is gone.

@groeneai

Copy link
Copy Markdown
Contributor Author

CI summary for eda342cee6f805eff96060d34e89e9ab09b44f3f (master-merge to pick up the q61 fix)

CI has fully finished (Finish Workflow: pass, 0 checks pending).

  • 04033_tpc_ds_q61 — the transient OOM that was the prior blocker is now gone on this HEAD: the test runs OK across all stateless jobs (amd_debug, amd_tsan, amd/arm asan_ubsan, llvm_coverage, arm_binary, msan, etc.). The master-side root-cause fix referenced above is picked up by this merge.
  • Sole red: Integration tests (arm_binary, distributed plan, 1/4) / test_refreshable_mv_keeper_loss::test_keeper_session_loss_during_coordinated_refresh — a known flaky fault-injection test (keeper-session-loss on a coordinated refreshable MV), unrelated to this PR's --chime client feature. It fails across 8 distinct unrelated PRs in 30 days with 0 master hits. A dedicated test-only fix is already in flight (PR Fix flaky test_keeper_session_loss_during_coordinated_refresh #107413).
  • Mergeable Check / PR — aggregators reflecting only the flaky integration row above.

No PR-caused failures; the --chime feature diff is unchanged. The branch is up to date with master and ready to merge.

@alexey-milovidov

Copy link
Copy Markdown
Member

@groeneai, as the test for refreshable materialized views was fixed, update the branch.

@groeneai

Copy link
Copy Markdown
Contributor Author

@alexey-milovidov branch updated to 79c181bd3fc27d06054c4eb104bf021358891a4b, now includes the merged #107413 fix for test_keeper_session_loss_during_coordinated_refresh. Clean merge, net diff vs master unchanged (the 4 chime files only). CI running.

@clickhouse-gh

clickhouse-gh Bot commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.90% 84.90% +0.00%
Functions 92.30% 92.30% +0.00%
Branches 77.30% 77.30% +0.00%

Changed lines: Changed C/C++ lines covered by tests: 19/19 (100.00%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@groeneai

Copy link
Copy Markdown
Contributor Author

@alexey-milovidov alexey-milovidov added this pull request to the merge queue Jun 15, 2026
Merged via the queue into ClickHouse:master with commit 252e11e Jun 15, 2026
166 checks passed
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add an optional chime sound to ClickHouse Client that plays when a query finishes over a runtime threshold

3 participants