Allow KILL QUERY/MUTATION SELECT to return multiple blocks#104927
Allow KILL QUERY/MUTATION SELECT to return multiple blocks#104927groeneai wants to merge 8 commits into
Conversation
`InterpreterKillQueryQuery::getSelectResult` executes an internal
`SELECT` over `system.processes`, `system.mutations`,
`system.part_moves_between_shards`, or `system.transactions` to find
the queries / mutations to act on. The old implementation asserted
that the pipeline emits at most one block:
while (res.empty() && executor.pull(res));
while (executor.pull(tmp_block));
if (!tmp_block.empty())
throw Exception(ErrorCodes::LOGICAL_ERROR, "Expected one block from input stream");
This is wrong: the pipeline can legitimately produce several blocks.
The AST fuzzer found a minimal reproducer (ClickHouse#104857, STID 4569-3618):
KILL MUTATION
WHERE (SELECT 1 WHERE database = command) OR command LIKE '%UPDATE%'
TEST;
The per-row scalar subquery in `WHERE` causes the planner to emit one
block per row of `system.mutations`. A small `max_block_size` from CI
randomization can produce the same shape.
Fix: collect every produced block via `concatenateBlocks`. The result
set is bounded by the size of the corresponding `system.*` table, so
this is cheap. `materializeBlockInplace` is kept so existing
`typeid_cast<const ColumnString &>` callers remain valid.
Closes ClickHouse#104857.
|
cc @KochetovNicolai @alexey-milovidov @PedroTadim — could you review this? Fixes #104857 ( in |
|
Workflow [PR], commit [1ee52df] Summary: ❌
AI ReviewSummaryThe PR fixes Missing context / blind spots
Final Verdict✅ No new findings on the current diff. |
…idation
The regression test `04238_kill_mutation_subquery_multi_block` reproduces the
`LOGICAL_ERROR` in `InterpreterKillQueryQuery::getSelectResult` via
`${CLICKHOUSE_CLIENT}`, which means the `chassert` from the `Expected one block
from input stream` exception lands inside the long-running test server. The
functional-tests runner's hung-check then detects "server is not responding"
and terminates the runner before any test result is written to
`test_result.txt`. `FTResultsProcessor` only treats the exact strings
`Server died, terminating all processes` or `Server does not respond to
health check` as `server_died`; the runner's actual `Stopping tests,
terminating all processes...` message is not matched, so the run falls into
the `not success_finish` branch with status `ERROR` and an empty
`test_results` list. The `Bugfix validation` inversion loop only flips
`FAIL` -> `OK` (it iterates `test_result.results`), so with an empty list
`has_failure` stays `false` and the framework reports
`Failed to reproduce the bug` even though the bug did fire and left a core
dump.
Switch the reproducer to `clickhouse local` (multi-query) so the abort is
contained to a single short-lived subprocess. The test runner then observes
a normal non-zero exit code (134) and an empty stdout; the diff against
`.reference` produces a regular `FAIL` for this test, which `Bugfix
validation` then correctly inverts to `OK`. The same pattern was used for
`04214_iceberg_alter_drop_column_stale_metadata_cache_104295` in commit
c2f8daf for the identical reason.
Verified locally:
- Binary without the fix (old `InterpreterKillQueryQuery::getSelectResult`
that throws `Expected one block from input stream`):
test FAILs with exit 134 and the runner records a normal `FAIL`.
- Binary with the fix (`concatenateBlocks`-based collection): 30/30 passes
with full CI randomization.
CI report for the failing bugfix-validation run on `629e5c9`:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104927&sha=629e5c96e7c1112c15c3ab21855a6d1895bf6b0f&name_0=PR&name_1=Bugfix+validation+%28functional+tests%29
|
Bugfix-validation fix pushed (commit The previous CI run on Same shape as Verified locally:
The KILL MUTATION + per-row scalar subquery + |
|
@groeneai a CI test crashed, look |
|
@PedroTadim — the failing Fast test is chronic, unrelated to this PR. Failure: Stack shows the client blocked in Cross-PR check (last 14d, CIDB): 15 distinct PRs hit this exact test name, plus 3 merge-queue staging hits on master — completely unrelated authors and code areas (#76867, #100276, #99384, #100539, #102485, #104172, #104225, #104418, #104440, #104523, #104778, #104902, #98939, #100752, this PR). The pattern matches the closed family-tracker #66961 ( No overlap with this PR's changes: PR #104927 touches only Verdict: chronic flake on an unrelated test. A Fast-test re-run should clear it. The PR itself (KILL QUERY/MUTATION multi-block fix + bugfix-validation regression) is unchanged at |
…ailure The reproducer for issue ClickHouse#102705 runs `SELECT ... FROM system.detached_tables WHERE uuid = ...` against a table that has already been detached. Without the fix, the planner creates `uuid_column` (because the predicate filters on `uuid`) but the detached branch of `getFilteredTables` leaves it empty while `table_column` has one row. The mismatched column lengths trigger a `LOGICAL_ERROR` which fires `chassert` in debug / sanitizer builds and aborts the process. When run via `clickhouse-client` against the long-running test server, that abort lands inside the server. The runner's hung-check then terminates the test before any result is recorded, so the `Bugfix validation` framework cannot invert the result to `OK` and the check fails on the PR (as observed on PR ClickHouse#104979). The original `.sql` form of `04240_detached_tables_uuid_filter_102705` had this issue. Switching the test to `.sh` and running the reproducer in `clickhouse local` contains the `SIGABRT` to a short-lived subprocess: stdout is truncated before reaching `SELECT 'ok'`, the runner sees the diff against the `.reference`, and reports a normal `FAIL`. Bugfix validation can then invert it to `OK` on the PR branch. The `.sh` uses the same pattern as `04238_kill_mutation_subquery_multi_block.sh` (PR ClickHouse#104927) and `01600_detach_permanently.sh` (Atomic database under `--path` + `${CLICKHOUSE_TEST_UNIQUE_NAME}` working folder). Verified locally on the rebuilt debug binary: - with the fix → `1\nok` → matches reference (30 / 30 deterministic). - without the fix (line 118-119 reverted, rebuilt) → `Aborted (core dumped)` with empty stdout → does not match reference → `FAIL`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
@groeneai merge with master |
…uery-multi-block-104857-wt
|
@PedroTadim master merged in e44c095. CI will re-run automatically. |
There was a problem hiding this comment.
The change is valid, but I have one concern to @groeneai: helper is also used by KILL QUERY, KILL PART_MOVE_TO_SHARD, and KILL TRANSACTION, so those paths still rely on shared-code confidence rather than direct coverage. Let's add a separate PR with test coverage after this PR is merged.
|
@yariks5s thanks for the approval. Will open a follow-up PR after this merges, adding direct multi-block coverage for KILL QUERY (system.processes), KILL PART_MOVE_TO_SHARD (system.part_moves_between_shards), and KILL TRANSACTION (system.transactions). Same pattern as 04238: per-row scalar subquery + max_block_size=1. Session: cron:clickhouse-ci-task-worker:20260604-223800 |
|
@groeneai could you please re-run the failing job? if not please update with master once again |
|
@yariks5s merged master in For the record, neither stale failure was PR-caused: the |
|
CI fully finished on HEAD 22159b0. The remaining red checks are pre-existing chronic trunk failures, not caused by this PR (which only touches
Both are unrelated to the KILL QUERY/MUTATION multi-block fix. |
…/fix-kill-query-multi-block-104857
After master changed Blocks to VectorWithMemoryTracking<Block> and concatenateBlocks to take const Blocks &, the local accumulator in getSelectResult must use the Blocks typedef instead of a plain std::vector<Block> so it converts to the new signature. No behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
@yariks5s pushed 444a8ee. The merge-queue build broke because master changed the |
CI finish ledger — 444a8eeEvery failure below has an owner (a fixing PR). None is PR-caused: this PR's diff touches only
Session id: cron:our-pr-ci-monitor:20260625-233000 |
|
@groeneai, we have fixed these failed unrelated checks. Please update the branch. |
…uery-multi-block-104857
|
@alexey-milovidov done, master merged in |
|
@groeneai, fix the test |
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered: 11/11 (100.00%) · Uncovered code |

Fixes the
Expected one block from input streamLOGICAL_ERROR(STID 4569-3618) inInterpreterKillQueryQuery::getSelectResult.The interpreter executes an internal
SELECToversystem.processes/system.mutations/system.part_moves_between_shards/system.transactionsto find which queries / mutations to act on. The old code asserted that the resulting pipeline emits at most one block:That is wrong — the pipeline can legitimately produce several blocks. The AST fuzzer found a minimal reproducer (originally in @PedroTadim's fiddle https://fiddle.clickhouse.com/468380d6-18b1-4bea-be38-4186d9a9428f):
The per-row scalar subquery in
WHEREcauses the planner to emit one block per row ofsystem.mutations. A smallmax_block_sizefrom CI randomization produces the same shape. Either way the secondexecutor.pullreturns a non-empty block and the old code threw.Fix: collect every produced block via
concatenateBlocksfromCore/Block.h. The result set is bounded by the size of the correspondingsystem.*table (small, in-process state), so this stays cheap.materializeBlockInplaceis preserved so existingtypeid_cast<const ColumnString &>callers inexecuteremain valid.A regression test
tests/queries/0_stateless/04238_kill_mutation_subquery_multi_block.sqlpinsmax_block_size = 1and uses the per-row subquery shape, so the multi-block path is exercised deterministically.Issue: #104857
CI report (vehicle): https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=104419&sha=c8267c5c7e69922ec96a5837cc51e5bc710bf9cb&name_0=PR&name_1=AST%20fuzzer%20%28amd_tsan%29
Closes #104857
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix a
LOGICAL_ERROR"Expected one block from input stream" thrown byKILL QUERY/KILL MUTATION/KILL PART_MOVE_TO_SHARD/KILL TRANSACTIONwhen theirWHEREclause contains a per-row subquery, or whenmax_block_sizeis small enough that the internalSELECTover the relevantsystem.*table emits more than one block.Documentation entry for user-facing changes