Restore subquery plans after failed in-place set build#102308
Restore subquery plans after failed in-place set build#102308alexey-milovidov wants to merge 50 commits into
Conversation
The real bug is in : and consume the source before they know whether the set was actually created. If the in-place path stops early, the delayed build path loses the subquery and sees an empty set. Store a query-plan rebuild callback for analyzer and old-analyzer paths, use a temporary while building in place, and restore the source plan for the fallback path when the in-place attempt does not finish. Add a failpoint-based regression test that covers both and . Ref: ClickHouse#102192
When falls back after or , restoring the subquery from analyzer state can still diverge from the original plan. Keep an exact cloned for retry, and only use if plan cloning is unavailable. Also make preserve held resources so restored plans keep the same execution context, and add a CTE regression for the failpoint-driven fallback path.
When a set subquery is snapshotted for retry, `QueryPlan::clone` could build fresh steps through ordinary constructors. That advanced per-query step ids and changed `EXPLAIN` / processor ids, which broke `03269_explain_unique_ids`. The same retry path could also fail for builder-less plans from `JOIN -> IN` optimization when the source plan contained `ReadFromSystemOneBlock`, which broke `03355_join_to_in_optimization`. Fix it by preserving original step ids while cloning query plans, falling back to restoring the just-built plan when no retry snapshot is available, and adding `clone` for `ReadFromSystemOneBlock`. PR: ClickHouse#102308 CI: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=102308&sha=f28c5bae19e13979cef7801cd0264b78a4c9b578&name_0=PR&name_1=Fast%20test
…ause # Conflicts: # src/Common/FailPoint.cpp # src/Interpreters/PreparedSets.cpp
The `catch (...)` in `FutureSetFromSubquery::createQueryPlanForRetry` silently returned `nullptr` when `source->clone` threw and no `query_plan_builder` fallback was available. The style check (`catch_all`) flagged this as silently swallowing exceptions. Rethrow the original exception when there is no `query_plan_builder` fallback so real clone failures propagate, and document the fall-through to the builder path. Fixes style check failure at https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=102308&sha=747be12badb189f41ba65dc7737c12c6f9c4e751&name_0=PR&name_1=Style%20check Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…_union` parameter After merging master, two `GlobalPlannerContext` constructor invocations in the rebuilt subquery planner code paths were left with three arguments instead of four — the new `parallel_replicas_table_union_` parameter (introduced on master in the parallel-replicas-no-local-preference work) was not added. This broke the `Fast test` and `Build (arm_tidy)` builds with `no matching function for call to '__construct_at'` (the constructor substitution failure from `make_shared`). Pass `nullptr` as the third argument to match the rest of the codebase (see `Planner.cpp:1657`, `StorageMergeTreeAnalyzeIndexes.cpp:171`, etc.). CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=102308&sha=c38d67289313754ff37a1f5e80539ac7309f71f3 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Many `IQueryPlanStep` subclasses do not implement `clone` (`LimitStep`, `SortingStep`, etc). `createQueryPlanForRetry` was rethrowing the `NOT_IMPLEMENTED` exception when both `source->clone` failed and `query_plan_builder` was absent, even though the calling code in `buildSetInplace` and `buildOrderedSetInplace` already has a third fallback that recovers the source plan via `extractSubplan` on the executed plan. Catch only `NOT_IMPLEMENTED` from `clone`, return `nullptr`, and let the caller restore the source through `extractSubplan` after the in-place build (succeeded or not). Other exceptions still propagate. Fixes failures in: - 02949_parallel_replicas_in_subquery - 03525_distributed_product_mode_local_IN_cross_replication_analyzer_bug - 01103_distributed_product_mode_local_column_renames Reports: ClickHouse#102308
The `future_set_from_subquery_skip_inplace_build` failpoint added in this PR is `ONCE`-scoped, which means once enabled it fires on the next call to `fiu_do_on(...)` from any concurrent query. Tests `04093` and `04094` enable this failpoint to exercise the in-place set build fallback, but without `no-parallel` they can race with parallel tests and steal the failpoint trigger. This caused `03620_mergeTreeAnalyzeIndexes_sets` to flake: while running in parallel with `04093`, the index-analysis call into `buildOrderedSetInplace` for `key in keys_1` consumed the global ONCE failpoint, skipping the in-place build and producing the full mark range `[(0,12)]` instead of the expected `[(0,1)]`. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=102308&sha=8ac5000171b9e60f91591e89471b09c0ba5456d9&name_0=PR&name_1=Stateless%20tests%20%28arm_asan_ubsan%2C%20targeted%29 PR: ClickHouse#102308
The local environment's gh CLI token lacks the `workflow` scope, so pushes that change `.github/workflows/*` are rejected. Master moved those files from the long form to the short `--timestamp` form; this commit restores the long form (byte-identical to the previous PR tip) so the merge can be pushed. The downside is the `Config Workflow -> Check Workflows` CI step will fail until someone with the workflow scope re-runs `python3 -m praktika yaml`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit reverted workflow file updates from a master merge to avoid push rejection caused by missing `workflow` token scope, but this made the praktika `Check Workflows` job fail because the workflow files no longer match what `python3 -m praktika yaml` produces from the current configuration. Restoring the workflow files to their current `master` state so the check passes.
When `buildSetInplace` and `buildOrderedSetInplace` could not snapshot `source` for retry (no `query_plan_builder` and `clone` failed), the fallback path used `QueryPlan::extractSubplan` to splice the original subquery node tree out of the executed plan. `extractSubplan` only moves the node tree — it does not carry over `interpreter_context`, table locks, storage holders, or `max_threads`/`concurrency_control`. That left the rebuilt source plan without the resources keeping its storage and contexts alive, which could resurface as `Context has expired` once the delayed build runs. This adds `QueryPlan::detachResources` and threads `interpreter_context`/table locks/storage holders/thread settings from the executed plan into the restored source in both fallback sites in `FutureSetFromSubquery`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`cloneSubplanAndReplace` is used by `materializeQueryPlanReferences` to duplicate the same subplan at multiple reference sites. Preserving `step_index` from the source step here would make every duplicated copy share the same `step_uniq_id`, breaking the uniqueness contract used by diagnostics and `system.processors_profile_log.step_uniq_id`. Restore the original behavior in `cloneSubplanAndReplace` (allocate fresh indices via the constructor) and keep `step_index` preservation only in `cloneInplace`, which is used by `QueryPlan::clone` for retry snapshots where identity must be stable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… sets The in-place build of a `GLOBAL IN` set writes the *deduplicated* set to the external table via `buildExternalTableFromInplaceSet` and then checks `max_rows_to_transfer` / `max_bytes_to_transfer` against that single deduplicated chunk. A subquery of many duplicate rows collapses to a few set elements, so it slips under the byte limit even though the streaming `CreatingSetsTransform` would ship (and reject) the full materialized payload. This bypassed the transfer limits and regressed `04204_global_in_join_max_rows_to_transfer_103333`, where a 1000-row constant `GLOBAL IN` subquery with `max_bytes_to_transfer = 100` stopped throwing `SET_SIZE_LIMIT_EXCEEDED`. When the network transfer limits are active, defer such external-table sets to the streaming path in both `buildSetInplace` and `buildOrderedSetInplace`, leaving `source` intact so the delayed build runs it. The streaming path checks the limits per source block on the raw payload, exactly as the non-in-place path does. The in-place dedup-write optimization is kept for the common unbounded case (`max_rows_to_transfer = max_bytes_to_transfer = 0`). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`04095_future_set_inplace_build_restore_global_in` uses
`remote('127.0.0.2', ...)`, so it must be skipped on environments where the
server cannot listen on `127.0.0.2` — most notably the macOS `arm_darwin`
fast-test build, where the test failed with a `connect timed out:
127.0.0.2:9000` (`SOCKET_TIMEOUT`) network error. `clickhouse-test` already
skips `shard` / `distributed` / `global` tagged tests when the server is not
configured for self-connections, which is exactly the convention the other 26
`remote('127.0.0.2', ...)` tests follow (and the master test
`04204_global_in_join_max_rows_to_transfer_103333` uses the `distributed` tag
for the same reason).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`cloneSubplanAndReplace` is used by `materializeQueryPlanReferences` to duplicate the same common subplan into every `CommonSubplanReferenceStep` site. Most `IQueryPlanStep::clone` implementations are copy-constructor based (e.g. `ExpressionStep::clone` returns `std::make_unique<ExpressionStep>(*this)`), and since `IQueryPlanStep`'s copy constructor copies `step_index`, every duplicated copy kept the source step's `step_index`. That makes `step_uniq_id` non-unique within a plan when a common subplan is referenced more than once, breaking the uniqueness contract relied on by `system.processors_profile_log.step_uniq_id`. Allocate a fresh `step_index` for each cloned step in `cloneSubplanAndReplace`, the same way the `IQueryPlanStep` constructor does. This runs during query plan optimization, so advancing the per-thread plan-step counter is consistent with how every other step gets its index. `cloneInplace` keeps preserving the source `step_index` for retry snapshots, which is a separate concern. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ause # Conflicts: # src/Processors/QueryPlan/QueryPlan.cpp
A `count()` over a distributed `MergeTree` table with a `GLOBAL IN` predicate could raise `Not-ready Set is passed as the second argument` (a `LOGICAL_ERROR`) when `max_rows_to_transfer` is set. For example `00059_shard_global_in_mergetree` failed in the stateless coverage build, where the CI config sets `max_rows_to_transfer = 1G`. With a transfer limit active, the in-place `GLOBAL IN` set build defers to the streaming path so the limit is enforced on the full payload, leaving the set not ready. The synchronous `count()` optimizations -- the min-max-count projection (`getMinMaxCountProjectionBlock`) and trivial count (`totalRowsByPartitionPredicate`) -- have no streaming fallback, so they executed `FunctionIn` against a not-ready set. Add `VirtualColumnUtils::hasUnbuiltSubquerySet` and have both optimizations decline (fall back to a normal read, which builds the set through the pipeline) when the filter references a subquery set that declined its in-place build. The normal read path is unaffected: it uses `buildSetsForDAGExcludingGlobalIn`, which leaves `GLOBAL IN` sets to `ReadFromRemote`. Regression test `05017_count_global_in_transfer_limit` covers both the projection and trivial-count paths under the old and new analyzers. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`04095_future_set_inplace_build_restore_global_in` and the new
`05017_count_global_in_transfer_limit` use `remote('127.0.0.2', ...)`. On
the macOS `arm_darwin` fast-test build the server cannot route `127.0.0.2`,
so the queries fail with `connect timed out 127.0.0.2:9000`. The `shard`
tag does not help: the fast test passes `--shard`, so shard-tagged tests
are not skipped. The macOS-specific list `ci/defs/darwin.skip` is the
mechanism (it already lists `00059_shard_global_in_mergetree`).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Fixed the Crash ( Fix: add
The one open thread ( |
| /// streaming `CreatingSetsTransform` would ship (and reject) the full payload. When these | ||
| /// limits are active, defer to the streaming path, which checks them per source block on | ||
| /// the raw payload. `source` is left intact so the delayed build can run it. | ||
| if (set_and_key->external_table && network_transfer_limits.hasLimits()) |
There was a problem hiding this comment.
buildSetsForDAG is still used by synchronous filter paths that immediately execute the DAG (ReadFromSystemOneBlock::initializePipeline, StorageSystemColumns, StorageFileSource::FilesIterator, etc.). With this new early return, a GLOBAL IN producer with max_rows_to_transfer / max_bytes_to_transfer set leaves the producer unbuilt; the dependent set then returns from the external_table_set && !external_table_set->get guard, but those callers still evaluate the predicate and hit Not-ready Set is passed ... instead of letting the delayed streaming path run. The count optimizations now check hasUnbuiltSubquerySet, but the invariant needs to hold for every synchronous buildSetsForDAG consumer: either make this path report "not built" to callers and skip evaluation, or have those consumers use buildSetsForDAGExcludingGlobalIn / hasUnbuiltSubquerySet before executing the filter.
The new test was numbered `05017`, leaving a gap from the current test number frontier (`Gap (4400, 5017) > 100` reported by `test_numbers_check` in the Style check). Renumber it to `04401`, the next available number, to close the gap. The three sibling tests `04093`/`04094`/`04095` are in a dense region and are left as-is. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Merged Style check ( The merge was textually clean. The core modified files ( The two AI-Review ❌ Block findings are left open as design decisions:
|
When the test was renumbered from 05017_count_global_in_transfer_limit to
04401_count_global_in_transfer_limit, the matching entry in
ci/defs/darwin.skip was not updated. clickhouse-test matches skip entries
by test name, so 04401_count_global_in_transfer_limit would still run on
arm_darwin and fail: it uses remote('127.0.0.2', ...) which the macOS
fast-test build cannot reach.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Triage notes (CI at
|
|
Continued work on this PR:
The single blocking item remains your design call: reconciling |
CI status at
|
|
Re-merged The branch was 620 commits behind (merge-base The merge was conflict-free and the net diff is unchanged (still 26 files, +618/-25; CI status is unchanged; both remaining red classes are known and not introduced by the merge:
Fresh CI is running on |
CI status at
|
…ause # Conflicts: # ci/defs/darwin.skip
Re-merged
|
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered: 282/317 (88.96%) · Uncovered code |

This replaces the approach in #102192 with a source-level fix.
FutureSetFromSubqueryconsumed the sourceQueryPlanduringbuildSetInplaceandbuildOrderedSetInplacebefore it knew whether the in-place build would actually create the set. When the in-place attempt stopped early, the delayed fallback path lost the subquery plan andINcould read an empty set.This patch stores a query-plan rebuild callback for analyzer and old-analyzer paths, uses a temporary
Setduring in-place execution so partial state does not leak into the real set, and adds a failpoint-based regression test for bothenable_analyzer = 1andenable_analyzer = 0.Related: #102192
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
N/A
Documentation entry for user-facing changes