Make Ctrl+C terminate the output of a result set in the client promptly by alexey-milovidov · Pull Request #106430 · ClickHouse/ClickHouse · GitHub
Skip to content

Make Ctrl+C terminate the output of a result set in the client promptly#106430

Merged
alexey-milovidov merged 10 commits into
masterfrom
fix-22426-client-cancel-output
Jun 17, 2026
Merged

Make Ctrl+C terminate the output of a result set in the client promptly#106430
alexey-milovidov merged 10 commits into
masterfrom
fix-22426-client-cancel-output

Conversation

@alexey-milovidov

@alexey-milovidov alexey-milovidov commented Jun 4, 2026

Copy link
Copy Markdown
Member

Pressing Ctrl+C in clickhouse-client/clickhouse-local now promptly terminates the output of a result set, even when the client is blocked writing to a slow or stuck output sink (for example, a slow terminal).

Previously, the client could block inside a single large write to its output. The interrupting signal could be delivered to a thread other than the one performing the write, so the write was never interrupted and the cancellation flag was only noticed much later, after the whole buffer had drained. As a result, the first Ctrl+C appeared to have no effect.

The output buffer now waits for the descriptor to become writable using poll with a short timeout (checking for cancellation in between), writes only a bounded chunk at a time, and discards the remaining buffered data once cancellation is requested. The hook is installed by the client on its standard output only while a query is being processed, so all other writers are unaffected.

Closes #22426

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Pressing Ctrl+C in clickhouse-client/clickhouse-local now promptly stops printing a large result set, even when the output is being written to a slow or stuck destination such as a slow terminal. Previously the first Ctrl+C could appear to have no effect.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Version info

  • Merged into: 26.6.1.949

When the client prints a large result set to a slow or stuck output sink
(for example a slow terminal), pressing Ctrl+C had no visible effect: the
client was blocked inside a single large `write` call, and the interrupting
signal could be delivered to a thread other than the one performing the
write, so the write was never interrupted and the cancellation flag was
only noticed much later (after the whole buffer had drained).

Add an optional cancellation hook to `WriteBufferFromFileDescriptor`. While
the hook is installed, the buffer waits for the descriptor to become
writable using `poll` with a short timeout, checks the hook in between, and
writes only a bounded chunk at a time, so a single `write` cannot block for
long. Once cancellation is requested, the buffered data is discarded instead
of being written. The hook is installed by the client on its standard output
only while a query is being processed, so all other writers are unaffected.

Closes #22426

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Jun 4, 2026
Comment thread tests/queries/0_stateless/04313_client_cancel_blocked_output.sh Outdated
Comment thread src/IO/WriteBufferFromFileDescriptor.cpp
Comment thread src/Client/ClientBase.cpp
alexey-milovidov and others added 3 commits June 5, 2026 14:22
… cover INTO OUTFILE AND STDOUT

Addresses review feedback on the Ctrl+C output cancellation:

* `poll` only promises that *some* space is available, not space for the
  whole chunk, so a blocking `write` of a large chunk could still sleep on a
  pipe that drained by a single page right before becoming writable. While the
  cancellation hook is installed, bound each write to `PIPE_BUF` after a
  successful `poll`: a write of at most `PIPE_BUF` bytes is guaranteed not to
  block on a pipe (and such a small write does not block on a socket or
  terminal either), so the cancellation hook is consulted promptly. The
  bounded-chunk path is only used for descriptors that can actually block
  (pipes, sockets, terminals); regular files keep using a single large write
  for throughput. The descriptor kind is determined in `setCancellationHook`,
  which is now defined out of line.

* `SELECT ... INTO OUTFILE ... AND STDOUT` writes stdout through a separate
  `WriteBufferFromFileDescriptor` inside a `ForkWriteBuffer`, not through the
  persistent `std_out`. Install the same cancellation hook on that buffer so
  Ctrl+C also promptly aborts the output in this mode when the stdout sink is
  blocked.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
* Renumber the test from `04313` to `04318` to avoid sharing the prefix with
  `04313_limit_by_sorted_prefix_key`, which landed on master in the meantime.

* Make reaching the running/blocked state mandatory: if the client exits before
  the query is observed running, fail explicitly and print the client's stderr
  instead of sending `SIGINT` to a gone process (which printed `No such
  process` to stderr and flagged the test as "having stderror") and then
  reporting a false `OK`.

* Keep the test cheap and immune to the randomized settings used by the flaky
  check: small blocks, a single thread and disabled memory/result limits so the
  client cannot error out early (e.g. on a low `max_memory_usage`) before
  reaching the blocked state we want to exercise. This also avoids stressing the
  server with a heavy result set across the many flaky-check repetitions.

* Use a single `trap` for cleanup and silence expected `kill` failures so the
  test never leaks processes or writes to stderr.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Merged master (the branch was ~1337 commits behind and red), addressed all three review threads, and reworked the test.

Code (c4ba78e2777)

  • The poll + bounded-write path could still block: poll only guarantees that some space is available, so a large blocking write on a pipe that drained one page right before becoming writable could sleep. Each write is now bounded to PIPE_BUF after a successful poll, which is guaranteed not to block on a pipe (nor for such a small write on a socket/terminal). The bounded path is used only for descriptors that can actually block (pipes, sockets, terminals); regular files keep a single large write for throughput. setCancellationHook is now defined out of line and determines the descriptor kind via fstat.
  • SELECT ... INTO OUTFILE ... AND STDOUT writes stdout through a separate WriteBufferFromFileDescriptor in a ForkWriteBuffer; the cancellation hook is now installed on it too.

Test (36a30de53a8)

  • Renumbered 0431304318 (the 04313 prefix now collides with 04313_limit_by_sorted_prefix_key on master).
  • Root cause of the CI failures: under the flaky check, the randomized settings (large max_block_size × wide rows × parallel formatting) could trip a randomized max_memory_usage, so the client errored out early; the wait loop never observed the query, then kill -SIGINT hit a gone process (No such process → "having stderror") while the test still printed a false OK. The test now makes reaching the running state mandatory and fails loudly with the client's stderr otherwise, pins small blocks / single thread / disabled limits so randomization cannot kill it early, and silences expected kill failures via a single cleanup trap.

Verified locally against a fresh build: the feature test passes 10/10 with randomized settings and 5/5 without; a single Ctrl+C terminates the blocked client in ~200 ms in both plain and INTO OUTFILE ... AND STDOUT modes.

Closes #22426

Comment thread src/Client/ClientBase.cpp
Comment thread src/Client/ClientBase.cpp
…FILE AND STDOUT in the test

Address review feedback:

* With `--pager`, the result set is written through the pager's stdin pipe
  (`pager_cmd->in`) rather than through `std_out`, so the cancellation hook
  installed on `std_out` was never consulted. A stuck or slow pager could fill
  its stdin pipe and the first Ctrl+C would appear to have no effect. Install
  the same hook on `pager_cmd->in` (recreated per query, so no removal needed).

* `04318_client_cancel_blocked_output.sh` only exercised the plain `SELECT`
  path through the persistent `std_out`. Refactor it into a reusable function
  and add a variant that writes `SELECT ... INTO OUTFILE ... AND STDOUT` to the
  same stuck FIFO, so the separate `stdout_buf` used by `ForkWriteBuffer` is
  also covered.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Continued work on this PR:

  • Pager path (review feedback): the cancellation hook is now also installed on pager_cmd->in, so --pager with a stuck/slow pager is interrupted by the first Ctrl+C, not just the persistent std_out.
  • Test coverage (review feedback): 04318_client_cancel_blocked_output.sh now exercises both the std_out path (plain SELECT) and the separate stdout_buf path (SELECT ... INTO OUTFILE ... AND STDOUT) against the same stuck FIFO. Verified locally — both cases terminate on a single SIGINT.
  • Non-blocking write thread: replied with the rationale for the bounded-chunk approach (an O_NONBLOCK toggle on the shared stdout/pager descriptor would affect the parent shell and other processes sharing the open file description) and left the final call to @alexey-milovidov.

Pushed as 0a1f6f4. CI was green prior to this commit.

Comment thread src/IO/WriteBufferFromFileDescriptor.cpp Outdated
alexey-milovidov and others added 2 commits June 8, 2026 21:52
Address review: `setCancellationHook` previously routed every descriptor
that is not a regular file through the responsive bounded-chunk path,
including non-tty character devices such as `/dev/null`. A common
benchmark or batch pattern like `clickhouse-client --query ... > /dev/null`
would then regress to one `poll` plus one small `write` per chunk instead
of flushing the normal large buffer.

Classify only descriptors that can actually block on `write` - pipes/FIFOs
(`S_ISFIFO`), sockets (`S_ISSOCK`) and terminals (`isatty`) - as needing
the responsive path, keeping regular files and non-tty character devices
on the single large write for throughput.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

@groeneai, the only CI failure on this PR is an AST fuzzer (amd_debug) crash that is unrelated to the change here (this PR only touches client output buffering in WriteBufferFromFileDescriptor and ClientBase).

The crash is a Logical error: Bad cast from type A to B raised while pruning parts by statistics:

DB::applyFunctionForField (KeyCondition.cpp:1367)
DB::applyFunction (KeyCondition.cpp:1377)
DB::KeyCondition::applyMonotonicFunctionsChainToRange (KeyCondition.cpp:4284)
DB::KeyCondition::checkInHyperrectangle (KeyCondition.cpp:4723)
DB::StatisticsPartPruner::checkPartCanMatch (StatisticsPartPruner.cpp:160)
DB::MergeTreeDataSelectExecutor::filterPartsByStatistics (MergeTreeDataSelectExecutor.cpp:783)

A monotonic function in the key is applied to a Field of an unexpected type during statistics-based part pruning (filterPartsByStatistics -> checkInHyperrectangle). It reproduces independently of this PR.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106430&sha=0a1f6f4e08a576d96bfcef11818b9d1ce2b7ad23&name_0=PR&name_1=AST%20fuzzer%20%28amd_debug%29

Could you investigate and provide a fix in a separate PR? If a fix is already in progress, please link it here.

Comment thread src/Client/ClientBase.cpp
Comment thread tests/queries/0_stateless/04318_client_cancel_blocked_output.sh
@groeneai

groeneai commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

@alexey-milovidov The AST fuzzer crash on this PR (AST fuzzer (amd_debug), Bad cast from type DB::ColumnLowCardinality to DB::ColumnVector<char8_t>, STID 1499-6033) is unrelated to your client-output changes. It is a chronic trunk bug in KeyCondition::checkInHyperrectangle / StatisticsPartPruner: the monotonic-functions chain is built against an LowCardinality-stripped key type, but the runtime data type passed at read time can still be LowCardinality(...), causing the cast wrapper to receive a ColumnLowCardinality it does not expect.

Reproduced locally with a minimal query on master:

SET allow_suspicious_low_cardinality_types = 1;
CREATE TABLE t (id Nullable(UInt32), b LowCardinality(Bool) STATISTICS(minmax))
ENGINE = MergeTree ORDER BY id
SETTINGS index_granularity = 8, allow_nullable_key = 1;
INSERT INTO t SELECT number, multiIf(number<8,true,number<16,false,NULL) FROM numbers(24);
OPTIMIZE TABLE t FINAL;
SELECT count() FROM t WHERE b < toLowCardinality(toNullable(7));

Fix in #106793 (separate PR with regression test). Bidirectional repro confirmed: crash without fix, clean with fix.

Session id: cron:clickhouse-ci-task-worker:20260608-231400

Address review feedback on `04318_client_cancel_blocked_output.sh`:

* The cancellation hook installed on `pager_cmd->in` in `ClientBase` was
  not exercised by the test, which only covered the persistent `std_out`
  and `INTO OUTFILE ... AND STDOUT` paths. Add a third case that runs the
  client with `--pager "cat FIFO > /dev/null"`, a pager that blocks on
  opening a FIFO with no writer and therefore never drains its stdin, so
  the client fills the pager's stdin pipe and blocks inside `write`. A
  single Ctrl+C must still terminate the client.

* When the setup branch failed, the early `return` skipped the per-case
  cleanup, leaking the `sleep 1000` holder (and possibly the client) into
  the stateless worker, after which the `trap` could no longer reap them.
  Factor the teardown into `reap_case` and call it on the failure path as
  well. `reap_case` also releases a stuck pager by opening its FIFO
  read-write (which never blocks even with no reader), so it cannot linger.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Addressed the two remaining review threads in commit 074c3e186ab (test-only):

  • Added a third test case exercising the --pager output path with a pager that never reads its stdin (cat FIFO > /dev/null, blocked on a FIFO with no writer), confirming a single Ctrl+C terminates the client when blocked writing to a stuck pager.
  • Refactored the per-case teardown into reap_case and call it on the setup-failure path too, so a flaky setup cannot leak the holder/client into the stateless worker. reap_case also releases a stuck pager by opening its FIFO read-write (never blocks, even with no reader).

Verified end-to-end against a locally built server: 3/3 cases pass, 5/5 runs stable (~1.1s each), no leaked processes. The reference now has three OK lines.

Comment thread src/IO/WriteBufferFromFileDescriptor.cpp
@alexey-milovidov

Copy link
Copy Markdown
Member Author

The sole CI red on the latest commit (3f282d8e) is a different AST fuzzer (amd_debug) failure from the Bad cast ... ColumnLowCardinality one discussed above (that one was fixed in #106793).

This one is the chronic LOGICAL_ERROR: Column identifier ... is already registered (STID 4697-4326). The reproduce query is a mutation with an IN-subquery:

DELETE FROM test_qualify WHERE number IN (SELECT number FROM test_qualify__fuzz_46 QUALIFY ...) SETTINGS mutations_sync = 2;

The stack is entirely in the analyzer / mutations path — MutationsInterpreter::validatebuildSubqueryPlansForSetsAndAddPlanner::buildPlanForQueryNodeCollectSourceColumnsVisitor. It has nothing to do with this PR, which only touches client output buffering (WriteBufferFromFileDescriptor, ClientBase).

It is a chronic trunk issue: over the last 90 days it reproduces on master (pull_request_number = 0) on many days and on dozens of unrelated PRs. It is already tracked by #106649 and being fixed by #106025 (groeneai, branch fix-stid-4697-4326-mutations-column-identifier — same STID). No action needed on this PR for it.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106430&sha=3f282d8ee70d7e13954182f02f56d2faac5ed462&name_0=PR&name_1=AST%20fuzzer%20%28amd_debug%29

@clickhouse-gh

clickhouse-gh Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.10% 85.10% +0.00%
Functions 92.30% 92.30% +0.00%
Branches 77.40% 77.30% -0.10%

Changed lines: Changed C/C++ lines covered by tests: 46/52 (88.46%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the change - all good. Though it touches one of the most basic components in the code, it sounds reasonable.

@alexey-milovidov alexey-milovidov self-assigned this Jun 17, 2026
@alexey-milovidov alexey-milovidov added this pull request to the merge queue Jun 17, 2026
Merged via the queue into master with commit b481713 Jun 17, 2026
166 checks passed
@alexey-milovidov alexey-milovidov deleted the fix-22426-client-cancel-output branch June 17, 2026 23:40
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 18, 2026
groeneai added a commit to groeneai/ClickHouse that referenced this pull request Jun 18, 2026
Addressing review feedback on the previous placement: clearing std_out's
cancellation hook immediately after output_format.reset() removes the race
(the parallel-formatting collector thread is joined by then) but also disables
interruptibility for the remaining resetOutput() flushes - out_file_buf and
std_out_wrapper finalization and the final std_out->next(). If one of those
buffers bytes to a slow or stuck stdout, Ctrl+C could no longer discard the
buffer, regressing the blocked-output behavior from ClickHouse#106430.

Register the clear as a SCOPE_EXIT right after output_format.reset() instead:
it still fires strictly after the collector is joined (so no race), but only
at the end of resetOutput(), after the final flushes, so the hook stays active
through them and Ctrl+C keeps working. Using a scope guard also keeps the
clear on every exit path of resetOutput() (e.g. the SIGPIPE/SIGQUIT restore
can throw), matching the original guard's no-leak guarantee.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
groeneai added a commit to groeneai/ClickHouse that referenced this pull request Jun 18, 2026
Reworks the std_out cancellation-hook lifetime to satisfy both the slow-stdout
interruptibility of the blocked-output feature (ClickHouse#106430) and the requirement
that already-produced output is not silently dropped on exception paths.

query_interrupt_handler.cancelled() is true both after a genuine Ctrl+C and
after the normal end-of-query stop() (both leave exit_after_signals <= 0), so it
cannot by itself distinguish "user cancelled" from "query finished". The earlier
revision kept that handler-based hook installed through the resetOutput() teardown
flushes; on exception paths processOrdinaryQuery's scope guard has already run
stop() before resetOutput() runs (via processParsedSingleQuery's scope guard),
making cancelled() unconditionally true and causing the final std_out_wrapper /
out_file_buf finalization and std_out->next() to discard output that was already
produced.

Snapshot whether the interrupt handler is still armed at resetOutput() entry and
choose the cancellation source accordingly: armed (normal completion, reached via
onEndOfStream while the handler is still active) keeps honoring the handler so a
fresh Ctrl+C still discards a flush to a slow/stuck stdout; stopped (exception
paths) falls back to the genuine cancellation flag, which is set only by
cancelQuery(), so completion is never mistaken for cancellation. The hook is still
only mutated after output_format.reset() has joined the parallel-formatting
collector that also reads it, so the original data race (TSan, BuzzHouse
amd_tsan) remains fixed.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
pull Bot pushed a commit to Lobinson1/ClickHouse that referenced this pull request Jun 22, 2026
The AST fuzzer found a server crash with stack:
  applyFunctionForField (KeyCondition.cpp:1367)
  applyMonotonicFunctionsChainToRange (KeyCondition.cpp:4284)
  KeyCondition::checkInHyperrectangle (KeyCondition.cpp:4723)
  StatisticsPartPruner::checkPartCanMatch (StatisticsPartPruner.cpp:160)
  MergeTreeDataSelectExecutor::filterPartsByStatistics (line 783)

Reproducer (also added as a stateless regression test):

  SET allow_suspicious_low_cardinality_types = 1;
  CREATE TABLE t (id Nullable(UInt32), b LowCardinality(Bool) STATISTICS(minmax))
  ENGINE = MergeTree ORDER BY id
  SETTINGS index_granularity = 8, allow_nullable_key = 1;
  INSERT INTO t SELECT number, multiIf(number < 8, true, number < 16, false, NULL)
    FROM numbers(24);
  OPTIMIZE TABLE t FINAL;
  SELECT count() FROM t WHERE b < toLowCardinality(toNullable(7));

The query crashes the server in a debug build with:
  Logical error: Bad cast from type DB::ColumnLowCardinality
                                to DB::ColumnVector<char8_t>

Root cause:

`KeyCondition::extractAtomFromTree` constructs `monotonic_functions_chain`
against an `LowCardinality`-stripped key expression type (line 3524 calls
`recursiveRemoveLowCardinality(key_expr_type)` before constructing the chain
of casts via `createInternalCast`). At read time, however,
`StatisticsPartPruner::checkPartCanMatch` passes the original column type
(which can still be `LowCardinality(...)`) into
`KeyCondition::checkInHyperrectangle`. When the chain is non-empty,
`checkInHyperrectangle` forwards `data_types[key_column]` to
`applyMonotonicFunctionsChainToRange`, which calls `applyFunctionForField`
with that runtime type. The latter builds
`arg_type->createColumnConst(1, value)` over the LC type, producing a
`ColumnConst` wrapping a `ColumnLowCardinality`. The cast wrapper was prepared
against the LC-stripped type, so its outer `prepareUnpackDictionaries`
LowCardinality unwrap closure was elided (line 2291 returns the inner wrapper
directly when neither side is `LowCardinality`). The inner cast wrapper is
`createUInt8ToBoolWrapper`, which calls
`checkAndGetColumn<ColumnUInt8>(*arguments[0].column)` on the LC column and
throws `Bad cast`.

Fix:

Strip `LowCardinality` from the runtime data type when forwarding it to
`applyMonotonicFunctionsChainToRange`, matching the convention the chain was
built with. The same `recursiveRemoveLowCardinality` is already used at
KeyCondition.cpp:2239 for the symmetric set-index path.

CI report:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106430&sha=0a1f6f4e08a576d96bfcef11818b9d1ce2b7ad23&name_0=PR&name_1=AST%20fuzzer%20%28amd_debug%29
PR: ClickHouse#106430

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Pressing Ctrl+C in client should terminate the output of resultset.

4 participants