Make Ctrl+C terminate the output of a result set in the client promptly#106430
Conversation
When the client prints a large result set to a slow or stuck output sink (for example a slow terminal), pressing Ctrl+C had no visible effect: the client was blocked inside a single large `write` call, and the interrupting signal could be delivered to a thread other than the one performing the write, so the write was never interrupted and the cancellation flag was only noticed much later (after the whole buffer had drained). Add an optional cancellation hook to `WriteBufferFromFileDescriptor`. While the hook is installed, the buffer waits for the descriptor to become writable using `poll` with a short timeout, checks the hook in between, and writes only a bounded chunk at a time, so a single `write` cannot block for long. Once cancellation is requested, the buffered data is discarded instead of being written. The hook is installed by the client on its standard output only while a query is being processed, so all other writers are unaffected. Closes #22426 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… cover INTO OUTFILE AND STDOUT Addresses review feedback on the Ctrl+C output cancellation: * `poll` only promises that *some* space is available, not space for the whole chunk, so a blocking `write` of a large chunk could still sleep on a pipe that drained by a single page right before becoming writable. While the cancellation hook is installed, bound each write to `PIPE_BUF` after a successful `poll`: a write of at most `PIPE_BUF` bytes is guaranteed not to block on a pipe (and such a small write does not block on a socket or terminal either), so the cancellation hook is consulted promptly. The bounded-chunk path is only used for descriptors that can actually block (pipes, sockets, terminals); regular files keep using a single large write for throughput. The descriptor kind is determined in `setCancellationHook`, which is now defined out of line. * `SELECT ... INTO OUTFILE ... AND STDOUT` writes stdout through a separate `WriteBufferFromFileDescriptor` inside a `ForkWriteBuffer`, not through the persistent `std_out`. Install the same cancellation hook on that buffer so Ctrl+C also promptly aborts the output in this mode when the stdout sink is blocked. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Renumber the test from `04313` to `04318` to avoid sharing the prefix with `04313_limit_by_sorted_prefix_key`, which landed on master in the meantime. * Make reaching the running/blocked state mandatory: if the client exits before the query is observed running, fail explicitly and print the client's stderr instead of sending `SIGINT` to a gone process (which printed `No such process` to stderr and flagged the test as "having stderror") and then reporting a false `OK`. * Keep the test cheap and immune to the randomized settings used by the flaky check: small blocks, a single thread and disabled memory/result limits so the client cannot error out early (e.g. on a low `max_memory_usage`) before reaching the blocked state we want to exercise. This also avoids stressing the server with a heavy result set across the many flaky-check repetitions. * Use a single `trap` for cleanup and silence expected `kill` failures so the test never leaks processes or writes to stderr. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Merged Code (
Test (
Verified locally against a fresh build: the feature test passes 10/10 with randomized settings and 5/5 without; a single Ctrl+C terminates the blocked client in ~200 ms in both plain and Closes #22426 |
…FILE AND STDOUT in the test Address review feedback: * With `--pager`, the result set is written through the pager's stdin pipe (`pager_cmd->in`) rather than through `std_out`, so the cancellation hook installed on `std_out` was never consulted. A stuck or slow pager could fill its stdin pipe and the first Ctrl+C would appear to have no effect. Install the same hook on `pager_cmd->in` (recreated per query, so no removal needed). * `04318_client_cancel_blocked_output.sh` only exercised the plain `SELECT` path through the persistent `std_out`. Refactor it into a reusable function and add a variant that writes `SELECT ... INTO OUTFILE ... AND STDOUT` to the same stuck FIFO, so the separate `stdout_buf` used by `ForkWriteBuffer` is also covered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Continued work on this PR:
Pushed as 0a1f6f4. CI was green prior to this commit. |
Address review: `setCancellationHook` previously routed every descriptor that is not a regular file through the responsive bounded-chunk path, including non-tty character devices such as `/dev/null`. A common benchmark or batch pattern like `clickhouse-client --query ... > /dev/null` would then regress to one `poll` plus one small `write` per chunk instead of flushing the normal large buffer. Classify only descriptors that can actually block on `write` - pipes/FIFOs (`S_ISFIFO`), sockets (`S_ISSOCK`) and terminals (`isatty`) - as needing the responsive path, keeping regular files and non-tty character devices on the single large write for throughput. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@groeneai, the only CI failure on this PR is an The crash is a A monotonic function in the key is applied to a Could you investigate and provide a fix in a separate PR? If a fix is already in progress, please link it here. |
|
@alexey-milovidov The AST fuzzer crash on this PR ( Reproduced locally with a minimal query on master: SET allow_suspicious_low_cardinality_types = 1;
CREATE TABLE t (id Nullable(UInt32), b LowCardinality(Bool) STATISTICS(minmax))
ENGINE = MergeTree ORDER BY id
SETTINGS index_granularity = 8, allow_nullable_key = 1;
INSERT INTO t SELECT number, multiIf(number<8,true,number<16,false,NULL) FROM numbers(24);
OPTIMIZE TABLE t FINAL;
SELECT count() FROM t WHERE b < toLowCardinality(toNullable(7));Fix in #106793 (separate PR with regression test). Bidirectional repro confirmed: crash without fix, clean with fix. Session id: cron:clickhouse-ci-task-worker:20260608-231400 |
Address review feedback on `04318_client_cancel_blocked_output.sh`: * The cancellation hook installed on `pager_cmd->in` in `ClientBase` was not exercised by the test, which only covered the persistent `std_out` and `INTO OUTFILE ... AND STDOUT` paths. Add a third case that runs the client with `--pager "cat FIFO > /dev/null"`, a pager that blocks on opening a FIFO with no writer and therefore never drains its stdin, so the client fills the pager's stdin pipe and blocks inside `write`. A single Ctrl+C must still terminate the client. * When the setup branch failed, the early `return` skipped the per-case cleanup, leaking the `sleep 1000` holder (and possibly the client) into the stateless worker, after which the `trap` could no longer reap them. Factor the teardown into `reap_case` and call it on the failure path as well. `reap_case` also releases a stuck pager by opening its FIFO read-write (which never blocks even with no reader), so it cannot linger. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Addressed the two remaining review threads in commit
Verified end-to-end against a locally built server: 3/3 cases pass, 5/5 runs stable (~1.1s each), no leaked processes. The reference now has three |
|
The sole CI red on the latest commit ( This one is the chronic DELETE FROM test_qualify WHERE number IN (SELECT number FROM test_qualify__fuzz_46 QUALIFY ...) SETTINGS mutations_sync = 2;The stack is entirely in the analyzer / mutations path — It is a chronic trunk issue: over the last 90 days it reproduces on |
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered by tests: 46/52 (88.46%) | Lost baseline coverage: none · Uncovered code |
alexey-milovidov
left a comment
There was a problem hiding this comment.
Reviewed the change - all good. Though it touches one of the most basic components in the code, it sounds reasonable.
Addressing review feedback on the previous placement: clearing std_out's cancellation hook immediately after output_format.reset() removes the race (the parallel-formatting collector thread is joined by then) but also disables interruptibility for the remaining resetOutput() flushes - out_file_buf and std_out_wrapper finalization and the final std_out->next(). If one of those buffers bytes to a slow or stuck stdout, Ctrl+C could no longer discard the buffer, regressing the blocked-output behavior from ClickHouse#106430. Register the clear as a SCOPE_EXIT right after output_format.reset() instead: it still fires strictly after the collector is joined (so no race), but only at the end of resetOutput(), after the final flushes, so the hook stays active through them and Ctrl+C keeps working. Using a scope guard also keeps the clear on every exit path of resetOutput() (e.g. the SIGPIPE/SIGQUIT restore can throw), matching the original guard's no-leak guarantee. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Reworks the std_out cancellation-hook lifetime to satisfy both the slow-stdout interruptibility of the blocked-output feature (ClickHouse#106430) and the requirement that already-produced output is not silently dropped on exception paths. query_interrupt_handler.cancelled() is true both after a genuine Ctrl+C and after the normal end-of-query stop() (both leave exit_after_signals <= 0), so it cannot by itself distinguish "user cancelled" from "query finished". The earlier revision kept that handler-based hook installed through the resetOutput() teardown flushes; on exception paths processOrdinaryQuery's scope guard has already run stop() before resetOutput() runs (via processParsedSingleQuery's scope guard), making cancelled() unconditionally true and causing the final std_out_wrapper / out_file_buf finalization and std_out->next() to discard output that was already produced. Snapshot whether the interrupt handler is still armed at resetOutput() entry and choose the cancellation source accordingly: armed (normal completion, reached via onEndOfStream while the handler is still active) keeps honoring the handler so a fresh Ctrl+C still discards a flush to a slow/stuck stdout; stopped (exception paths) falls back to the genuine cancellation flag, which is set only by cancelQuery(), so completion is never mistaken for cancellation. The hook is still only mutated after output_format.reset() has joined the parallel-formatting collector that also reads it, so the original data race (TSan, BuzzHouse amd_tsan) remains fixed. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The AST fuzzer found a server crash with stack:
applyFunctionForField (KeyCondition.cpp:1367)
applyMonotonicFunctionsChainToRange (KeyCondition.cpp:4284)
KeyCondition::checkInHyperrectangle (KeyCondition.cpp:4723)
StatisticsPartPruner::checkPartCanMatch (StatisticsPartPruner.cpp:160)
MergeTreeDataSelectExecutor::filterPartsByStatistics (line 783)
Reproducer (also added as a stateless regression test):
SET allow_suspicious_low_cardinality_types = 1;
CREATE TABLE t (id Nullable(UInt32), b LowCardinality(Bool) STATISTICS(minmax))
ENGINE = MergeTree ORDER BY id
SETTINGS index_granularity = 8, allow_nullable_key = 1;
INSERT INTO t SELECT number, multiIf(number < 8, true, number < 16, false, NULL)
FROM numbers(24);
OPTIMIZE TABLE t FINAL;
SELECT count() FROM t WHERE b < toLowCardinality(toNullable(7));
The query crashes the server in a debug build with:
Logical error: Bad cast from type DB::ColumnLowCardinality
to DB::ColumnVector<char8_t>
Root cause:
`KeyCondition::extractAtomFromTree` constructs `monotonic_functions_chain`
against an `LowCardinality`-stripped key expression type (line 3524 calls
`recursiveRemoveLowCardinality(key_expr_type)` before constructing the chain
of casts via `createInternalCast`). At read time, however,
`StatisticsPartPruner::checkPartCanMatch` passes the original column type
(which can still be `LowCardinality(...)`) into
`KeyCondition::checkInHyperrectangle`. When the chain is non-empty,
`checkInHyperrectangle` forwards `data_types[key_column]` to
`applyMonotonicFunctionsChainToRange`, which calls `applyFunctionForField`
with that runtime type. The latter builds
`arg_type->createColumnConst(1, value)` over the LC type, producing a
`ColumnConst` wrapping a `ColumnLowCardinality`. The cast wrapper was prepared
against the LC-stripped type, so its outer `prepareUnpackDictionaries`
LowCardinality unwrap closure was elided (line 2291 returns the inner wrapper
directly when neither side is `LowCardinality`). The inner cast wrapper is
`createUInt8ToBoolWrapper`, which calls
`checkAndGetColumn<ColumnUInt8>(*arguments[0].column)` on the LC column and
throws `Bad cast`.
Fix:
Strip `LowCardinality` from the runtime data type when forwarding it to
`applyMonotonicFunctionsChainToRange`, matching the convention the chain was
built with. The same `recursiveRemoveLowCardinality` is already used at
KeyCondition.cpp:2239 for the symmetric set-index path.
CI report:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106430&sha=0a1f6f4e08a576d96bfcef11818b9d1ce2b7ad23&name_0=PR&name_1=AST%20fuzzer%20%28amd_debug%29
PR: ClickHouse#106430
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Pressing Ctrl+C in
clickhouse-client/clickhouse-localnow promptly terminates the output of a result set, even when the client is blocked writing to a slow or stuck output sink (for example, a slow terminal).Previously, the client could block inside a single large
writeto its output. The interrupting signal could be delivered to a thread other than the one performing the write, so the write was never interrupted and the cancellation flag was only noticed much later, after the whole buffer had drained. As a result, the first Ctrl+C appeared to have no effect.The output buffer now waits for the descriptor to become writable using
pollwith a short timeout (checking for cancellation in between), writes only a bounded chunk at a time, and discards the remaining buffered data once cancellation is requested. The hook is installed by the client on its standard output only while a query is being processed, so all other writers are unaffected.Closes #22426
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Pressing Ctrl+C in
clickhouse-client/clickhouse-localnow promptly stops printing a large result set, even when the output is being written to a slow or stuck destination such as a slow terminal. Previously the first Ctrl+C could appear to have no effect.Documentation entry for user-facing changes
Version info
26.6.1.949