Restore subquery plans after failed in-place set build by alexey-milovidov · Pull Request #102308 · ClickHouse/ClickHouse · GitHub
Skip to content

Restore subquery plans after failed in-place set build#102308

Draft
alexey-milovidov wants to merge 50 commits into
ClickHouse:masterfrom
alexey-milovidov:fix-pr102192-root-cause
Draft

Restore subquery plans after failed in-place set build#102308
alexey-milovidov wants to merge 50 commits into
ClickHouse:masterfrom
alexey-milovidov:fix-pr102192-root-cause

Conversation

@alexey-milovidov

@alexey-milovidov alexey-milovidov commented Apr 10, 2026

Copy link
Copy Markdown
Member

This replaces the approach in #102192 with a source-level fix.

FutureSetFromSubquery consumed the source QueryPlan during buildSetInplace and buildOrderedSetInplace before it knew whether the in-place build would actually create the set. When the in-place attempt stopped early, the delayed fallback path lost the subquery plan and IN could read an empty set.

This patch stores a query-plan rebuild callback for analyzer and old-analyzer paths, uses a temporary Set during in-place execution so partial state does not leak into the real set, and adds a failpoint-based regression test for both enable_analyzer = 1 and enable_analyzer = 0.

Related: #102192

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

N/A

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

The real bug is in :  and  consume the source  before they know whether the set was actually created. If the in-place path stops early, the delayed build path loses the subquery and  sees an empty set.

Store a query-plan rebuild callback for analyzer and old-analyzer paths, use a temporary  while building in place, and restore the source plan for the fallback path when the in-place attempt does not finish.

Add a failpoint-based regression test that covers both  and .

Ref: ClickHouse#102192
@clickhouse-gh

clickhouse-gh Bot commented Apr 10, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Apr 10, 2026
@alexey-milovidov alexey-milovidov marked this pull request as draft April 10, 2026 02:40
alexey-milovidov and others added 5 commits April 10, 2026 07:34
When  falls back after  or , restoring the subquery from analyzer state can still diverge from the original plan. Keep an exact cloned  for retry, and only use  if plan cloning is unavailable.

Also make  preserve held resources so restored plans keep the same execution context, and add a CTE regression for the failpoint-driven fallback path.
When a set subquery is snapshotted for retry, `QueryPlan::clone`
could build fresh steps through ordinary constructors. That advanced
per-query step ids and changed `EXPLAIN` / processor ids, which broke
`03269_explain_unique_ids`.

The same retry path could also fail for builder-less plans from
`JOIN -> IN` optimization when the source plan contained
`ReadFromSystemOneBlock`, which broke `03355_join_to_in_optimization`.

Fix it by preserving original step ids while cloning query plans,
falling back to restoring the just-built plan when no retry snapshot
is available, and adding `clone` for `ReadFromSystemOneBlock`.

PR: ClickHouse#102308
CI: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=102308&sha=f28c5bae19e13979cef7801cd0264b78a4c9b578&name_0=PR&name_1=Fast%20test
…ause

# Conflicts:
#	src/Common/FailPoint.cpp
#	src/Interpreters/PreparedSets.cpp
The `catch (...)` in `FutureSetFromSubquery::createQueryPlanForRetry`
silently returned `nullptr` when `source->clone` threw and no
`query_plan_builder` fallback was available. The style check
(`catch_all`) flagged this as silently swallowing exceptions.

Rethrow the original exception when there is no `query_plan_builder`
fallback so real clone failures propagate, and document the fall-through
to the builder path.

Fixes style check failure at
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=102308&sha=747be12badb189f41ba65dc7737c12c6f9c4e751&name_0=PR&name_1=Style%20check

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/Storages/StorageMergeTreeAnalyzeIndexes.cpp Outdated
alexey-milovidov and others added 15 commits April 27, 2026 08:52
…_union` parameter

After merging master, two `GlobalPlannerContext` constructor invocations in
the rebuilt subquery planner code paths were left with three arguments
instead of four — the new `parallel_replicas_table_union_` parameter
(introduced on master in the parallel-replicas-no-local-preference work)
was not added.

This broke the `Fast test` and `Build (arm_tidy)` builds with
`no matching function for call to '__construct_at'` (the constructor
substitution failure from `make_shared`).

Pass `nullptr` as the third argument to match the rest of the codebase
(see `Planner.cpp:1657`, `StorageMergeTreeAnalyzeIndexes.cpp:171`, etc.).

CI report:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=102308&sha=c38d67289313754ff37a1f5e80539ac7309f71f3

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Many `IQueryPlanStep` subclasses do not implement `clone` (`LimitStep`,
`SortingStep`, etc). `createQueryPlanForRetry` was rethrowing the
`NOT_IMPLEMENTED` exception when both `source->clone` failed and
`query_plan_builder` was absent, even though the calling code in
`buildSetInplace` and `buildOrderedSetInplace` already has a third
fallback that recovers the source plan via `extractSubplan` on the
executed plan.

Catch only `NOT_IMPLEMENTED` from `clone`, return `nullptr`, and let
the caller restore the source through `extractSubplan` after the
in-place build (succeeded or not). Other exceptions still propagate.

Fixes failures in:
  - 02949_parallel_replicas_in_subquery
  - 03525_distributed_product_mode_local_IN_cross_replication_analyzer_bug
  - 01103_distributed_product_mode_local_column_renames

Reports:
  ClickHouse#102308
The `future_set_from_subquery_skip_inplace_build` failpoint added in this
PR is `ONCE`-scoped, which means once enabled it fires on the next call to
`fiu_do_on(...)` from any concurrent query. Tests `04093` and `04094`
enable this failpoint to exercise the in-place set build fallback, but
without `no-parallel` they can race with parallel tests and steal the
failpoint trigger.

This caused `03620_mergeTreeAnalyzeIndexes_sets` to flake: while running
in parallel with `04093`, the index-analysis call into
`buildOrderedSetInplace` for `key in keys_1` consumed the global ONCE
failpoint, skipping the in-place build and producing the full mark range
`[(0,12)]` instead of the expected `[(0,1)]`.

CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=102308&sha=8ac5000171b9e60f91591e89471b09c0ba5456d9&name_0=PR&name_1=Stateless%20tests%20%28arm_asan_ubsan%2C%20targeted%29
PR: ClickHouse#102308
The local environment's gh CLI token lacks the `workflow` scope, so
pushes that change `.github/workflows/*` are rejected. Master moved
those files from the long form to the short `--timestamp` form; this
commit restores the long form (byte-identical to the previous PR tip)
so the merge can be pushed. The downside is the `Config Workflow ->
Check Workflows` CI step will fail until someone with the workflow
scope re-runs `python3 -m praktika yaml`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit reverted workflow file updates from a master merge to
avoid push rejection caused by missing `workflow` token scope, but this
made the praktika `Check Workflows` job fail because the workflow files no
longer match what `python3 -m praktika yaml` produces from the current
configuration.

Restoring the workflow files to their current `master` state so the check
passes.
Comment thread src/Interpreters/PreparedSets.cpp Outdated
alexey-milovidov and others added 2 commits May 10, 2026 11:39
When `buildSetInplace` and `buildOrderedSetInplace` could not snapshot
`source` for retry (no `query_plan_builder` and `clone` failed), the
fallback path used `QueryPlan::extractSubplan` to splice the original
subquery node tree out of the executed plan. `extractSubplan` only
moves the node tree — it does not carry over `interpreter_context`,
table locks, storage holders, or `max_threads`/`concurrency_control`.
That left the rebuilt source plan without the resources keeping its
storage and contexts alive, which could resurface as `Context has
expired` once the delayed build runs.

This adds `QueryPlan::detachResources` and threads
`interpreter_context`/table locks/storage holders/thread settings
from the executed plan into the restored source in both fallback
sites in `FutureSetFromSubquery`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/Processors/QueryPlan/QueryPlan.cpp Outdated
`cloneSubplanAndReplace` is used by `materializeQueryPlanReferences` to
duplicate the same subplan at multiple reference sites. Preserving
`step_index` from the source step here would make every duplicated copy
share the same `step_uniq_id`, breaking the uniqueness contract used by
diagnostics and `system.processors_profile_log.step_uniq_id`.

Restore the original behavior in `cloneSubplanAndReplace` (allocate
fresh indices via the constructor) and keep `step_index` preservation
only in `cloneInplace`, which is used by `QueryPlan::clone` for retry
snapshots where identity must be stable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/Interpreters/PreparedSets.cpp
alexey-milovidov and others added 6 commits June 14, 2026 11:57
… sets

The in-place build of a `GLOBAL IN` set writes the *deduplicated* set to the
external table via `buildExternalTableFromInplaceSet` and then checks
`max_rows_to_transfer` / `max_bytes_to_transfer` against that single
deduplicated chunk. A subquery of many duplicate rows collapses to a few set
elements, so it slips under the byte limit even though the streaming
`CreatingSetsTransform` would ship (and reject) the full materialized payload.
This bypassed the transfer limits and regressed
`04204_global_in_join_max_rows_to_transfer_103333`, where a 1000-row constant
`GLOBAL IN` subquery with `max_bytes_to_transfer = 100` stopped throwing
`SET_SIZE_LIMIT_EXCEEDED`.

When the network transfer limits are active, defer such external-table sets to
the streaming path in both `buildSetInplace` and `buildOrderedSetInplace`,
leaving `source` intact so the delayed build runs it. The streaming path checks
the limits per source block on the raw payload, exactly as the non-in-place
path does. The in-place dedup-write optimization is kept for the common
unbounded case (`max_rows_to_transfer = max_bytes_to_transfer = 0`).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`04095_future_set_inplace_build_restore_global_in` uses
`remote('127.0.0.2', ...)`, so it must be skipped on environments where the
server cannot listen on `127.0.0.2` — most notably the macOS `arm_darwin`
fast-test build, where the test failed with a `connect timed out:
127.0.0.2:9000` (`SOCKET_TIMEOUT`) network error. `clickhouse-test` already
skips `shard` / `distributed` / `global` tagged tests when the server is not
configured for self-connections, which is exactly the convention the other 26
`remote('127.0.0.2', ...)` tests follow (and the master test
`04204_global_in_join_max_rows_to_transfer_103333` uses the `distributed` tag
for the same reason).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`cloneSubplanAndReplace` is used by `materializeQueryPlanReferences` to duplicate
the same common subplan into every `CommonSubplanReferenceStep` site. Most
`IQueryPlanStep::clone` implementations are copy-constructor based (e.g.
`ExpressionStep::clone` returns `std::make_unique<ExpressionStep>(*this)`), and
since `IQueryPlanStep`'s copy constructor copies `step_index`, every duplicated
copy kept the source step's `step_index`. That makes `step_uniq_id` non-unique
within a plan when a common subplan is referenced more than once, breaking the
uniqueness contract relied on by `system.processors_profile_log.step_uniq_id`.

Allocate a fresh `step_index` for each cloned step in `cloneSubplanAndReplace`,
the same way the `IQueryPlanStep` constructor does. This runs during query plan
optimization, so advancing the per-thread plan-step counter is consistent with
how every other step gets its index. `cloneInplace` keeps preserving the source
`step_index` for retry snapshots, which is a separate concern.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ause

# Conflicts:
#	src/Processors/QueryPlan/QueryPlan.cpp
A `count()` over a distributed `MergeTree` table with a `GLOBAL IN`
predicate could raise `Not-ready Set is passed as the second argument`
(a `LOGICAL_ERROR`) when `max_rows_to_transfer` is set. For example
`00059_shard_global_in_mergetree` failed in the stateless coverage build,
where the CI config sets `max_rows_to_transfer = 1G`.

With a transfer limit active, the in-place `GLOBAL IN` set build defers to
the streaming path so the limit is enforced on the full payload, leaving
the set not ready. The synchronous `count()` optimizations -- the
min-max-count projection (`getMinMaxCountProjectionBlock`) and trivial
count (`totalRowsByPartitionPredicate`) -- have no streaming fallback, so
they executed `FunctionIn` against a not-ready set.

Add `VirtualColumnUtils::hasUnbuiltSubquerySet` and have both optimizations
decline (fall back to a normal read, which builds the set through the
pipeline) when the filter references a subquery set that declined its
in-place build. The normal read path is unaffected: it uses
`buildSetsForDAGExcludingGlobalIn`, which leaves `GLOBAL IN` sets to
`ReadFromRemote`.

Regression test `05017_count_global_in_transfer_limit` covers both the
projection and trivial-count paths under the old and new analyzers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`04095_future_set_inplace_build_restore_global_in` and the new
`05017_count_global_in_transfer_limit` use `remote('127.0.0.2', ...)`. On
the macOS `arm_darwin` fast-test build the server cannot route `127.0.0.2`,
so the queries fail with `connect timed out 127.0.0.2:9000`. The `shard`
tag does not help: the fast test passes `--shard`, so shard-tagged tests
are not skipped. The macOS-specific list `ci/defs/darwin.skip` is the
mechanism (it already lists `00059_shard_global_in_mergetree`).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Fixed the Server died / Not-ready Set failure in the stateless coverage build (Stateless tests (amd_llvm_coverage, old analyzer, ...)) and the 04095 failure on Fast test (arm_darwin).

Crash (00059_shard_global_in_mergetree). A count() over a distributed MergeTree table with a GLOBAL IN predicate raised Not-ready Set is passed as the second argument for function 'globalIn' under the old analyzer when max_rows_to_transfer is set (the stateless config sets it to 1G). With a transfer limit active, the in-place GLOBAL IN set build defers to the streaming path (so the limit is enforced on the full payload), leaving the set not ready. The synchronous count() optimizations — the min-max-count projection (getMinMaxCountProjectionBlock, the actual crash site via optimizeUseAggregateProjections) and trivial count (totalRowsByPartitionPredicate) — have no streaming fallback, so they executed FunctionIn against the not-ready set. Master had no deferral, so the set was always built there.

Fix: add VirtualColumnUtils::hasUnbuiltSubquerySet and have both optimizations decline (fall back to a normal read, which builds the set through the pipeline) when the filter references a subquery set that declined its in-place build. The normal read path is unaffected — it uses buildSetsForDAGExcludingGlobalIn, which leaves GLOBAL IN sets to ReadFromRemote. Verified locally: 00059 returns the correct counts under both analyzers and with projections on/off; 04204 still throws SET_SIZE_LIMIT_EXCEEDED; 04093/04094/04095 pass. Added regression test 05017_count_global_in_transfer_limit.

04095 on macOS. It uses remote('127.0.0.2', ...); the macOS arm_darwin fast test cannot route 127.0.0.2. The shard tag does not skip it there (the fast test passes --shard); the macOS list ci/defs/darwin.skip is the mechanism. Added 04095_... and 05017_... to darwin.skip (it already lists 00059_shard_global_in_mergetree).

The one open thread (PreparedSets.cpp:338, max_elements_to_fill for external-table sets during index analysis) is unchanged and left for a separate decision.

/// streaming `CreatingSetsTransform` would ship (and reject) the full payload. When these
/// limits are active, defer to the streaming path, which checks them per source block on
/// the raw payload. `source` is left intact so the delayed build can run it.
if (set_and_key->external_table && network_transfer_limits.hasLimits())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildSetsForDAG is still used by synchronous filter paths that immediately execute the DAG (ReadFromSystemOneBlock::initializePipeline, StorageSystemColumns, StorageFileSource::FilesIterator, etc.). With this new early return, a GLOBAL IN producer with max_rows_to_transfer / max_bytes_to_transfer set leaves the producer unbuilt; the dependent set then returns from the external_table_set && !external_table_set->get guard, but those callers still evaluate the predicate and hit Not-ready Set is passed ... instead of letting the delayed streaming path run. The count optimizations now check hasUnbuiltSubquerySet, but the invariant needs to hold for every synchronous buildSetsForDAG consumer: either make this path report "not built" to callers and skip evaluation, or have those consumers use buildSetsForDAGExcludingGlobalIn / hasUnbuiltSubquerySet before executing the filter.

alexey-milovidov and others added 2 commits June 19, 2026 16:11
The new test was numbered `05017`, leaving a gap from the current test
number frontier (`Gap (4400, 5017) > 100` reported by `test_numbers_check`
in the Style check). Renumber it to `04401`, the next available number, to
close the gap. The three sibling tests `04093`/`04094`/`04095` are in a dense
region and are left as-is.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Merged master (the branch was ~1282 commits behind and red) and renumbered the new test to clear the Style check.

Style check (test_numbers_check) — the failure was Gap (4400, 5017) > 100 from tests/queries/0_stateless/05017_count_global_in_transfer_limit.{sql,reference}. Renamed it to 04401_count_global_in_transfer_limit (the next available number per add-test), which closes the gap. The sibling tests 04093/04094/04095 are in a dense region and were left as-is. This was the only real Style-check failure; the PR and Mergeable Check failures were the umbrella checks gated on it, so the builds/tests that were DROPPED will now run.

The merge was textually clean. The core modified files (PreparedSets.cpp/.h, QueryPlan.cpp/.h, Set.h, CreatingSetsStep.h) had no master changes in the merged range; the two files master did touch (VirtualColumnUtils.cpp, MergeTreeData.cpp) auto-merged into coherent code, and every API the patch depends on (splitFilterDagForAllowedInputs, buildFilterExpression, GlobalPlannerContext's 4-arg ctor, QueryPlan::clone/extractSubplan/addResources, ColumnSet::getData) still matches.

The two AI-Review ❌ Block findings are left open as design decisions:

  1. PreparedSets.cpp — the new "an in-place GLOBAL IN set build may leave the set not-ready" contract is propagated via hasUnbuiltSubquerySet only to the two MergeTreeData count optimizations. Whether other synchronous buildSetsForDAG consumers (ReadFromSystemOneBlock, StorageSystemColumns, file/object-storage path filtering) can reach the same deferred state and raise Not-ready Set is passed ... is unverified.
  2. PreparedSets.cpp (createTemporarySetAndKeyForInplaceBuild) — max_elements_to_fill = 0 is forced for external-table sets (all RHS values are kept so the temporary GLOBAL IN table is complete), which bypasses use_index_for_in_with_subqueries_max_values during index analysis.

Comment thread ci/defs/darwin.skip Outdated
When the test was renumbered from 05017_count_global_in_transfer_limit to
04401_count_global_in_transfer_limit, the matching entry in
ci/defs/darwin.skip was not updated. clickhouse-test matches skip entries
by test name, so 04401_count_global_in_transfer_limit would still run on
arm_darwin and fail: it uses remote('127.0.0.2', ...) which the macOS
fast-test build cannot reach.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member Author

Triage notes (CI at bc91215049b)

CI hygiene fix pushed (fd42a914cb0): the ci/defs/darwin.skip entry for the renumbered test was stale (still 05017_count_global_in_transfer_limit); updated to 04401_count_global_in_transfer_limit so it is actually skipped on arm_darwin (it uses remote('127.0.0.2', ...)). Resolves the darwin.skip review thread.

The 01585 failures are the concrete, CI-verified manifestation of the open PreparedSets.cpp:536 thread (the "synchronous consumers see a not-ready GLOBAL IN set" concern), and they need a design decision:

  • 01585_use_index_for_global_in / 01585_use_index_for_global_in_with_null fail with TOO_MANY_ROWS on the coverage job because KeyCondition/index analysis is another synchronous consumer of the deferred set — exactly the case the bot thread flagged but whose reachability was unverified.
  • Mechanism: the CI stateless config (tests/config/users.d/limits.yaml) sets max_rows_to_transfer = 1G / max_bytes_to_transfer = 1G, so network_transfer_limits.hasLimits() is always true. The deferral in buildOrderedSetInplace (if (set_and_key->external_table && network_transfer_limits.hasLimits()) return nullptr;) then always fires, so the ordered set for the index is never built, the index cannot prune granules, the full table is read, and max_rows_to_read = 6 is exceeded.
  • This is not a flake: on master 01585 is green; on this PR it fails only at bc91215049b, and only because the earlier Style-gate fix finally let the coverage stateless job run (it was dropped before).
  • The tension is genuine, not mechanical: buildOrderedSetInplace also writes the external table via buildExternalTableFromInplaceSet at the end, so simply dropping the deferral there would reintroduce the 04204 dedup-vs-raw transfer under-enforcement that the deferral was added to fix. Resolving this (e.g. building the ordered set for the index while still letting the streaming path enforce transfer limits on the raw payload, vs. enforcing limits inside the in-place build, vs. accepting no index for GLOBAL IN under a transfer limit) is your call — I did not push a speculative fix.

The other two unresolved threads (PreparedSets.cpp:338 max_elements_to_fill cap, and the PreparedSets.cpp:536 contract above) remain open for the same design-decision reason.

@alexey-milovidov

Copy link
Copy Markdown
Member Author

Continued work on this PR:

  • Merged master (branch was 250 behind and red). Master touched none of the PR's C++ files since the merge-base, so the only conflict was ci/defs/darwin.skip (master rewrote it with comment groups and removed many stale entries). Resolved by taking master's version and re-inserting the PR's two skip entries — 04095_future_set_inplace_build_restore_global_in and 04401_count_global_in_transfer_limit (both use remote(...), so they must stay skipped on arm_darwin). New head 37f0fd1fadb, now 0 behind. Re-verified src/Interpreters/PreparedSets.cpp still compiles (-fsyntax-only) against the merged headers.

  • CI triage at d58abd60ed2 — the only PR-caused failures are the documented 01585 design tension; the rest are tracked, master-wide flakes:

    • 01585_use_index_for_global_in / 01585_use_index_for_global_in_with_null — the open design decision (see prior triage). The CI stateless config sets max_rows_to_transfer = 1G, so network_transfer_limits.hasLimits() is always true, the buildOrderedSetInplace deferral always fires, the ordered set for the index is never built, and the local scan is not pruned. Not fixed here — a speculative fix risks silent wrong GLOBAL IN results, and it converges with the three open AI-review blockers into one design choice.
    • Hung check failed, possible deadlock found (amd_tsan, arm_tsan) — chronic flake #107941.
    • ThreadSanitizer: data race (STID: 3336-2c6d) (BuzzHouse amd_tsan) — tracked flake #101319 (39 failures over 30 days, 7 on master, 30+ unrelated PRs; the race is in MergeTreeTransaction* swap during ConcurrentHashJoin/SpillingHashJoin construction — unrelated to GLOBAL IN).

The single blocking item remains your design call: reconciling GLOBAL IN index pruning / ready-set replay with raw-payload max_rows_to_transfer enforcement. The crux is that CreatingSetsTransform::consume enforces the transfer limit on the raw per-block payload while writing the external table in one pass, which the in-place dedup-then-write path (buildExternalTableFromInplaceSet) cannot match without giving up the temporary-set isolation that protects the real set on the retry path.

@alexey-milovidov

Copy link
Copy Markdown
Member Author

CI status at 37f0fd1fadb (post-merge run completed)

The master merge (37f0fd1fadb) has now finished its full CI run (the earlier triage was on the pre-merge head d58abd60ed2). The failure set is unchanged: the merge introduced nothing new and, as expected, fixed nothing — master touched none of this PR's files in the merged range.

The only PR-caused failures remain the 01585 design tension (01585_use_index_for_global_in / 01585_use_index_for_global_in_with_null, coverage job):

  • The CI stateless config (tests/config/users.d/limits.yaml) sets max_rows_to_transfer = 1G / max_bytes_to_transfer = 1G, so network_transfer_limits.hasLimits() is always true. The deferral in buildOrderedSetInplace (if (set_and_key->external_table && network_transfer_limits.hasLimits()) return nullptr;) then always fires, the ordered set for the index is never built in place, the index cannot prune granules, the full table is read, and max_rows_to_read = 6 is exceeded (TOO_MANY_ROWS). Pinning enable_parallel_replicas = 0 in the test does not change this — it is the index/KeyCondition path, not parallel replicas.
  • This is the same single design decision represented by the 3 open AI-review blockers (PreparedSets.cpp :463 truncated-set replay under set_overflow_mode = 'break'; :473 empty SizeLimits bypassing max_rows_to_transfer / max_bytes_to_transfer; :544 not-ready set reaching synchronous buildSetsForDAG consumers) and the 4 open inline threads. They all converge on one choice: whether to build the in-place ordered set (so the index can prune) when transfer limits are configured, at the cost of writing the external table from the deduplicated set rather than streaming the raw RHS payload. Not fixed here — a speculative fix risks silently wrong GLOBAL IN results, and it is your call.

All other failures are unrelated, tracked, master-wide flakes:

  • Hung check failed, possible deadlock found (Stress amd_tsan + arm_tsan) — Hung check failed, possible deadlock found #107941 (open). The full hung_check.log for both arches (1894 lines of thread stacks) contains zero frames from this PR's code (no PreparedSets, FutureSetFromSubquery, buildOrderedSetInplace, setExternalTable, buildSetsForDAG, …). The amd hang is the formatQuery($$SELECT [[[[…$$) deep-nested-literal recursion (Field::dispatch, elapsed ~1565s); the arm hang is in QueryAnalyzer::resolveGroupByNode / resolveQuery under the AST fuzzer.

No re-merge this round: the merge-base (d70da4f4137c, 2026-06-30 00:38Z) is under a day old, master touched none of the PR's files, and the remaining reds are not master-fixable.

@alexey-milovidov

Copy link
Copy Markdown
Member Author

Re-merged master (37f0fd1fadb6db212e32a4).

The branch was 620 commits behind (merge-base d70da4f4137, ~1.3 days) and red, and master had evolved several files this PR touches — most notably src/Storages/MergeTree/MergeTreeData.cpp (44 commits) and src/Storages/VirtualColumnUtils.{cpp,h} (24 commits), which is exactly where this PR adds the hasUnbuiltSubquerySet guards for the synchronous GLOBAL IN consumers. Merging now validates the PR against current master and gets a fresh CI run rather than letting the diff drift further.

The merge was conflict-free and the net diff is unchanged (still 26 files, +618/-25; PreparedSets.cpp +255/-14). The PR's MergeTreeData.cpp hunks (totalRowsByPartitionPredicateImpl and getMinMaxCountProjectionBlock declining the count/projection optimization on a not-ready set) reapplied cleanly, and the functions they call — splitFilterDagForAllowedInputs, buildFilterExpression, filterBlockWithExpression, hasUnbuiltSubquerySet — still match their merged signatures.

CI status is unchanged; both remaining red classes are known and not introduced by the merge:

  • 01585_use_index_for_global_in / _with_null — the deliberate index-pruning design decision (build the in-place ordered set so the primary index can prune GLOBAL IN, rather than shipping the raw payload through streaming for max_rows_to_transfer enforcement). This is the same trade-off the open AI-review threads on PreparedSets.cpp describe, and it is reserved as a design choice, not a regression.
  • Stress test (amd_tsan) / (arm_tsan)Hung check failed, possible deadlock found, the chronic TSan-only flake tracked in Hung check failed, possible deadlock found #107941 (OPEN; server-side TCPHandler::run / query-analyzer stacks, no frames from this PR's code, and it is TSan-specific while every other stress variant is green).

Fresh CI is running on 6db212e32a4.

@alexey-milovidov

Copy link
Copy Markdown
Member Author

CI status at 6db212e32a4 (fresh post-merge run completed)

The full CI run on the re-merged head has now finished (the earlier note was posted while it was still running). Reconciling the current red set:

Sole PR-caused failure — unchanged, the reserved GLOBAL IN design decision:

  • 01585_use_index_for_global_in / 01585_use_index_for_global_in_with_null (Stateless, coverage job). These fail only on this PR (deterministically since 06-19, never on master). The AI re-review re-ran on this exact head and returned ❌ Blocked, but both of its findings are marked [dismissed by author] — the Blocker (PreparedSets.cpp:544, synchronous buildSetsForDAG consumers vs. not-ready set) and the Major (:346, max_elements_to_fill = 0 bypassing use_index_for_in_with_subqueries_max_values). They are two facets of one choice: build the in-place ordered set so the primary index can prune GLOBAL IN, at the cost of writing the external table from the deduplicated set rather than streaming the raw RHS payload for max_rows_to_transfer / max_bytes_to_transfer enforcement. This remains a design decision reserved for a maintainer, not a regression — a speculative fix risks silently wrong GLOBAL IN results. No Hung check on this run (Stress is green here).

New this run, but unrelated — a known integration flake:

No re-merge this round: the head was re-merged ~10h ago, the PR is MERGEABLE, master touched none of this PR's files in the merged range, and neither remaining red is master-fixable.

The PR is therefore blocked solely on: (1) the GLOBAL IN transfer-limit vs. index-pruning design decision above; (2) human approval; (3) the private CH Inc sync.

@alexey-milovidov

Copy link
Copy Markdown
Member Author

Re-merged master to clear a new conflict (6db212e32a43f826ef04dd)

After the previous triage the branch went CONFLICTING/DIRTY again: master advanced ~804 commits (merge-base 61b644e0380, ~1.5 days) and re-introduced the recurring ci/defs/darwin.skip conflict.

Only one file conflicted — ci/defs/darwin.skip (the same file master keeps rewriting). Resolved by keeping both entries in numeric order:

04401_count_global_in_transfer_limit   <- this PR
04402_float_parsing_corner_cases       <- master

All C++ auto-merged cleanly. Verified the PR's contribution is intact after the merge (net 26 files, +619/-26): the failpoint future_set_from_subquery_skip_inplace_build (FailPoint.cpp), hasUnbuiltSubquerySet (VirtualColumnUtils + its two MergeTreeData.cpp call sites), and buildSetsForDAGExcludingGlobalIn (its ReadFromMergeTree / prepareReadingFromFormat / ReadFromObjectStorageStep consumers) all reapplied coherently against the evolved MergeTreeData.cpp / VirtualColumnUtils.{cpp,h}.

The PR is now MERGEABLE again; a fresh CI run will start on 3f826ef04dd.

No code change beyond the merge. The two open PreparedSets.cpp AI-review threads (:346 max_elements_to_fill, :544 synchronous buildSetsForDAG consumers) remain untouched — they are the reserved GLOBAL IN index-pruning vs. transfer-limit design decision (both [dismissed by author]), and a speculative fix risks silently wrong GLOBAL IN results. Still blocked on: (1) that design decision, (2) un-drafting + human approval, (3) the private CH Inc sync.

@clickhouse-gh

clickhouse-gh Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.50% 85.50% +0.00%
Functions 92.60% 92.60% +0.00%
Branches 77.60% 77.70% +0.10%

Changed lines: Changed C/C++ lines covered: 282/317 (88.96%) · Uncovered code

Full report · Diff report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant