Randomize parallel_replicas_min_number_of_rows_per_replica by devcrafter · Pull Request #71028 · ClickHouse/ClickHouse · GitHub
Skip to content

Randomize parallel_replicas_min_number_of_rows_per_replica#71028

Open
devcrafter wants to merge 19 commits into
masterfrom
pr-randomize-rows-to-read-setting
Open

Randomize parallel_replicas_min_number_of_rows_per_replica#71028
devcrafter wants to merge 19 commits into
masterfrom
pr-randomize-rows-to-read-setting

Conversation

@devcrafter

@devcrafter devcrafter commented Oct 24, 2024

Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Details

The setting enables code execution, which can trigger hidden bugs, in particular in GLOBAL JOINs with parallel replicas. Discovered one while doing #70658 within 02967_parallel_replicas_joins_and_analyzer test

@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-not-for-changelog This PR should not be mentioned in the changelog label Oct 24, 2024
@antaljanosbenjamin antaljanosbenjamin self-assigned this Oct 25, 2024
Comment thread tests/clickhouse-test Outdated
@clickhouse-gh

clickhouse-gh Bot commented Dec 31, 2024

Copy link
Copy Markdown
Contributor

@clickhouse-gh

clickhouse-gh Bot commented Feb 11, 2025

Copy link
Copy Markdown
Contributor

Dear @antaljanosbenjamin, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself.

@alexey-milovidov alexey-milovidov self-assigned this Jul 27, 2025
@clickhouse-gh

clickhouse-gh Bot commented Jul 27, 2025

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [62da6d9]

Summary:

job_name test_name status info comment
Stateless tests (arm_asan_ubsan, targeted) FAIL
03452_array_join_global_right_join_parallel_replicas FAIL cidb
01344_min_bytes_to_use_mmap_io_index FAIL cidb
03279_pr_3_way_joins_right_first FAIL cidb
03560_parallel_replicas_projection FAIL cidb
03452_array_join_global_right_join_parallel_replicas FAIL cidb
04071_global_in_dia_no_explicit_set_elements FAIL cidb
04052_distributed_index_analysis_in_subquery_no_quadratic FAIL cidb
03560_parallel_replicas_projection FAIL cidb
03801_autopr_input_bytes_estimation_query_with_subqueries FAIL cidb
Too many test failures FAIL cidb
Stateless tests (amd_asan_ubsan, distributed plan, parallel, 1/2) FAIL
02915_input_table_function_in_subquery FAIL cidb
03031_filter_float64_logical_error FAIL cidb
Stateless tests (amd_asan_ubsan, distributed plan, parallel, 2/2) FAIL
03231_pr_duplicate_announcement FAIL cidb
03254_pr_join_on_dups FAIL cidb
Stateless tests (amd_debug, parallel) FAIL
04266_text_index_tokens_cardinality_order FAIL cidb
01169_old_alter_partition_isolation_stress FAIL cidb
04052_distributed_index_analysis_in_subquery_no_quadratic FAIL cidb
03275_pr_any_join FAIL cidb
Stateless tests (amd_tsan, parallel, 1/2) FAIL
00945_bloom_filter_index FAIL cidb
03275_pr_any_join FAIL cidb
03560_parallel_replicas_projection FAIL cidb
Stateless tests (amd_tsan, parallel, 2/2) FAIL
01168_mutations_isolation FAIL cidb
04071_global_in_dia_no_explicit_set_elements FAIL cidb
03800_autopr_reuse_index_analysis FAIL cidb
01169_old_alter_partition_isolation_stress FAIL cidb
Stateless tests (arm_binary, parallel) FAIL
03560_parallel_replicas_projection FAIL cidb
03275_pr_any_join FAIL cidb
02731_parallel_replicas_join_subquery FAIL cidb
01585_use_index_for_global_in_with_null FAIL cidb
03261_pr_semi_anti_join FAIL cidb
03279_pr_3_way_joins_right_first FAIL cidb
01169_alter_partition_isolation_stress FAIL cidb
04052_distributed_index_analysis_in_subquery_no_quadratic FAIL cidb
03031_filter_float64_logical_error FAIL cidb
01171_mv_select_insert_isolation_long FAIL cidb
Fast test (arm_darwin) DROPPED
Build (arm_release) DROPPED
Build (arm_darwin) DROPPED

AI Review

Summary

This PR still makes the stateless randomized test harness globally generate parallel_replicas_min_number_of_rows_per_replica = 1. In the current code that value still combines with automatic_parallel_replicas_mode = 2 to force real parallel-replicas execution in normal randomized CI, so the change remains blocked on proving that path is green for the full suite or on gating it away from normal runs.

Missing context / blind spots
  • ⚠️ The latest Praktika PR report for commit 62da6d95 is still pending/empty on July 4, 2026, so there is no fresh full-suite evidence after the latest master merge. A completed randomized PR run on this commit (or a descendant with the same behavior) would close that gap.
Findings

❌ Blockers

  • [dismissed by author -- https://github.com/Randomize parallel_replicas_min_number_of_rows_per_replica #71028#discussion_r3382610973] tests/clickhouse-test:1470, tests/clickhouse-test:1528, tests/clickhouse-test:1605-1618 — The randomized-test contract is still violated: adjust_settings_for_autopr forces enable_parallel_replicas, cluster_for_parallel_replicas, and parallel_replicas_local_plan whenever automatic_parallel_replicas_mode randomizes to 2, and this PR now also feeds a nonzero parallel_replicas_min_number_of_rows_per_replica into that path. That means ordinary randomized stateless runs still exercise the real parallel-replicas planner globally, but the branch has no green PR report for commit 62da6d95, and the latest July 4, 2026 triage still expects unresolved wrong-result / WITH ROLLUP / index-analysis / max_rows_to_read failures from exactly this combination. I still consider this real because the code path is unchanged and the required proof of suite-green behavior is still missing.
    Suggested fix: keep this generator at 0 for normal randomized CI, or gate it behind a dedicated opt-in / targeted exclusions until the remaining parallel-replicas correctness gaps are fixed and a full randomized PR run is green.
Final Verdict
  • Status: ⚠️ Request changes
  • Minimum required actions: either gate this randomization away from normal randomized CI, or land the remaining parallel-replicas fixes and show a green randomized PR report for this behavior before merge.

@alexey-milovidov

Copy link
Copy Markdown
Member

@devcrafter, it shows errors.

@clickhouse-gh

clickhouse-gh Bot commented Sep 2, 2025

Copy link
Copy Markdown
Contributor

Dear @alexey-milovidov, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself.

@alexey-milovidov alexey-milovidov self-assigned this Oct 17, 2025

@alexey-milovidov alexey-milovidov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very good change!

@alexey-milovidov

Copy link
Copy Markdown
Member

@devcrafter, it failed.

@clickhouse-gh

clickhouse-gh Bot commented Nov 18, 2025

Copy link
Copy Markdown
Contributor

Dear @alexey-milovidov, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself.

@alexey-milovidov

Copy link
Copy Markdown
Member

@devcrafter, more changes are needed.

@clickhouse-gh

clickhouse-gh Bot commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Dear @alexey-milovidov, you haven't been active on this PR for 30 days. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself.

…o-read-setting

# Conflicts:
#	tests/clickhouse-test
@alexey-milovidov

Copy link
Copy Markdown
Member

Merged current master into the branch (it was ~16k commits / 2 months behind, last updated 2026-03-28). The only conflict was a trivial one in tests/clickhouse-test (master fixed the "harmfull"→"harmful" typo on the adjacent line). Net change vs. master is unchanged in intent — the single randomizer line:

"parallel_replicas_min_number_of_rows_per_replica": lambda: random.randint(0, 1),

Pushed ace3dcab56f..28db0b7a689 to refresh CI on current master.

Note on the prior (2026-03-28) CI failures: I spot-checked two of them against a fresh master (26.6.1.1) and they still reproduce, so they are not stale:

  • 02155_read_in_order_max_rows_to_read — with enable_parallel_replicas=1 + parallel_replicas_min_number_of_rows_per_replica=1, SELECT a FROM t_max_rows_to_read ORDER BY a LIMIT 5 SETTINGS max_rows_to_read = 12 throws TOO_MANY_ROWS (reads all 100 rows). The read-in-order + LIMIT row-count bound isn't applied on the parallel-replicas path.
  • 03261_pr_semi_anti_join — the semi right / anti results are different data, not just reordered, when parallel_replicas_min_number_of_rows_per_replica=1 changes the number of replicas used. Looks like a real correctness issue in semi/anti joins on the parallel-replicas path.

The randomizer line behaves as intended: a value of 1 forces real parallel-replica execution (overriding the statistics-only behavior of automatic_parallel_replicas_mode=2), which is exactly what surfaces these latent bugs. I deliberately did not tag/blacklist the affected tests, since that would hide the very bugs this change is meant to expose. The remaining failures look like genuine product issues for the parallel-replicas path rather than something to fix in this PR — flagging for your call on sequencing (fix-then-randomize vs. land-and-track).

@alexey-milovidov

Copy link
Copy Markdown
Member

Merged master (the branch was only ~2 days behind, but red) to get a fresh CI signal on today's master.

The remaining CI red is not caused by the one-line diff itself — it is the parallel-replicas bugs this PR is meant to surface. The CI randomized-settings diagnosis points at parallel_replicas_min_number_of_rows_per_replica 1 (frequently together with the already-randomized automatic_parallel_replicas_mode 2). The failures split into two kinds:

A. Genuine server bugs (block merge, outside the scope of this test-only change):

  • 03031_filter_float64_logical_errorWITH ROLLUP over an empty filtered set loses the totals row with parallel replicas (one 0\t7 row is dropped).
  • 03279_pr_3_way_joins_right_first — a 3-way RIGHT … INNER JOIN returns different results with enable_parallel_replicas = 1 vs 0 (the test EXCEPTs the two and expects them equal).
  • considerEnablingParallelReplicas.cpp:359chassert(local_replica_plan_reading_step->getAnalyzedResult() == nullptr) fires → server abort ("Server died") in the debug build when the manual min-rows path and the automatic-parallel-replicas optimizer both apply to the same plan.
  • Several GLOBAL JOIN / array-join / projection cases in the tsan/arm reports (02731_parallel_replicas_join_subquery, 03452_array_join_global_right_join_parallel_replicas, 03560_parallel_replicas_projection, 04071_global_in_dia_no_explicit_set_elements, 01585_use_index_for_global_in, …) — exactly the class described in the PR motivation.

B. Single-node "measurement" tests that just need parallel replicas pinned off (contained test fixes):

  • 04051_pk_analysis_stats and 04052_distributed_index_analysis_in_subquery_no_quadratic assert mark / index-analysis accounting that only holds for single-node reads; with parallel replicas the work is split across replicas and the accounting differs.

The (B) tests can be made robust by pinning enable_parallel_replicas = 0 in them, but that alone will not turn CI green — the (A) correctness/stability bugs remain and need fixing in the parallel-replicas code itself before this randomization can be enabled. Leaving those for the parallel-replicas owners since they are the purpose of this bug-finding PR rather than something to mask here.

@alexey-milovidov

Copy link
Copy Markdown
Member

Re-merged current master into the branch (it was 392 commits behind, last pushed 2026-06-06) to refresh the CI signal on today's master. The diff vs. master is still the single randomizer line — no functional change.

I re-triaged the latest CI red and it matches the prior categorization; nothing new is regressed by this PR, the failures are exactly the latent parallel-replicas bugs that parallel_replicas_min_number_of_rows_per_replica 1 (forcing real parallel-replica execution) is meant to surface:

Genuine server bugs (block merge, owned by the parallel-replicas team, out of scope for this test-only change):

  • 03031_filter_float64_logical_errorWITH ROLLUP over an empty filtered set loses the totals row (one 0\t7 row dropped). Culprit minimized to --parallel_replicas_min_number_of_rows_per_replica 1.
  • 03279_pr_3_way_joins_right_first, 03275_pr_any_join, 03254_pr_join_on_dups — different results with parallel replicas (correctness in semi/anti/any/right joins on the PR path).
  • considerEnablingParallelReplicas.cpp:359 chassert(local_replica_plan_reading_step->getAnalyzedResult() == nullptr) → server abort in debug/tsan ("Server died").
  • 03560_parallel_replicas_projection, 03452_array_join_global_right_join_parallel_replicas — the GLOBAL JOIN / array-join / projection class described in the PR motivation.
  • The transaction-stress tests (01169/01171/01173/01174) fail as a cascade of the server aborts above.

Index-analysis / "measurement" tests — divergence is itself likely a real behavioral gap, so NOT safe to pin:

  • 04052_distributed_index_analysis_in_subquery_no_quadratic — "Expected 3-4 queries, got 5". With parallel_replicas_min_number_of_rows_per_replica 1 the nested index-analysis subquery (mergeTreeAnalyzeIndexesUUID(..., in(key, (SELECT ...)))) itself goes distributed, which distributed_index_analysis_only_on_coordinator is supposed to suppress. This looks like the coordinator-only restriction not fully disabling parallel replicas for nested index analysis — i.e. a real finding, not a test artifact.
  • 03800_autopr_reuse_index_analysis, 03801_autopr_input_bytes_estimation_query_with_subqueries — same family; forcing the PR path perturbs the index-analysis-reuse / byte-estimation assertions.

Deliberately did not tag/blacklist or pin parallel replicas off in any of these — pinning the index-analysis tests would mask exactly the kind of gap this PR is designed to expose. None of the remaining red is fixable inside this one-line test change; it needs the underlying parallel-replicas bugs fixed first. Flagging for your sequencing call (fix-then-randomize vs. land-and-track) — the failures are product bugs for the parallel-replicas owners rather than something to mask here.

Comment thread tests/clickhouse-test
"max_parsing_threads": lambda: random.choice([0, 1, 10]),
"optimize_functions_to_subcolumns": lambda: random.randint(0, 1),
"parallel_replicas_local_plan": lambda: random.randint(0, 1),
"parallel_replicas_min_number_of_rows_per_replica": lambda: random.randint(0, 1),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Randomizing this globally violates the randomized-test invariant that generated settings must still keep the stateless suite green. With automatic_parallel_replicas_mode = 2, adjust_settings_for_autopr forces enable_parallel_replicas, cluster_for_parallel_replicas, and parallel_replicas_local_plan; when this new setting is 1, the analyzer still builds the real parallel-replicas plan because buildQueryPlanForAutomaticParallelReplicas keeps this setting and only clears automatic_parallel_replicas_mode inside the candidate plan. The PR discussion already lists failures from that combination, including wrong results and the considerEnablingParallelReplicas assertion. Please keep this at 0 until those parallel-replica bugs are fixed, or gate the randomization behind a dedicated opt-in or affected-test exclusions so normal randomized CI stays green.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — this is exactly the open sequencing question discussed in the PR-level comments (fix-then-randomize vs. land-and-track). The red CI here is deliberate: the failures are the latent parallel-replicas bugs this randomization is designed to surface (see the triage comments for the categorized list, e.g. lost WITH ROLLUP totals, wrong semi/anti/any join results, the considerEnablingParallelReplicas.cpp:359 assertion). Gating or excluding the affected tests would mask the very signal the PR exists to produce, so the deliberate choice so far has been not to do that. The PR will not be merged while these product bugs are unfixed — it stays open as the tracking/bug-finding vehicle until the parallel-replicas owners fix them, or until we decide to gate it.

@alexey-milovidov

Copy link
Copy Markdown
Member

@groeneai, investigate the failure: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=71028&sha=849e3ff013f70628c8daad3a4fda355753bc580b&name_0=PR&name_1=Stateless%20tests%20%28arm_binary%2C%20parallel%29 and provide a fix in a separate PR. If the fix is already in progress, link it here.

The failure is Logical error: Expected 3 to 10 arguments in table function azureBlobStorage, got 1 (STID: 3574-4812, also 2508-4994 in the amd_asan_ubsan run). It is not caused by this PR: CIDB shows it failing across dozens of unrelated PRs and on master itself every day for at least the last week. The stack goes through buildQueryPlanForAutomaticParallelReplicasconsiderEnablingParallelReplicas, i.e. the already-default-randomized automatic_parallel_replicas_mode path — it looks like table-function argument rewriting (likely secret masking) corrupts the azureBlobStorage AST when the automatic-parallel-replicas candidate plan is built. I did not find an open issue tracking it.

@alexey-milovidov

Copy link
Copy Markdown
Member

Re-triaged the latest CI red (commit 849e3ff, run of 2026-06-09). No re-merge of master this time: the branch is only 2 days behind, and none of the underlying parallel-replicas bugs have been fixed on master since (no changes to considerEnablingParallelReplicas.cpp or the related planner code), so a refresh would reproduce the same red. The diff vs. master is still the single randomizer line.

The failures match the prior categorization, with three new members of the same classes (all reproduce only with parallel_replicas_min_number_of_rows_per_replica 1 in the randomized settings):

  • 00945_bloom_filter_indexTOO_MANY_ROWS (max rows: 7, current rows: 8): the max_rows_to_read bound is not honored on the parallel-replicas path. Same class as the previously reported 02155_read_in_order_max_rows_to_read.
  • 04065_direct_join_column_const_mismatchNOT_IMPLEMENTED "Can't execute any of specified algorithms for specified strictness/kind and right storage type" from JoinStepLogicalLookup::optimize when parallel replicas are forced onto a direct join.
  • 04098_asterisk_include_virtual_columns_mergetreeNOT_FOUND_COLUMN_IN_BLOCK for _part: already tracked in NOT_FOUND_COLUMN_IN_BLOCK with asterisk_include_virtual_columns and parallel replicas #106561 (asterisk_include_virtual_columns + parallel replicas).

Independent confirmation that these are latent product bugs rather than an artifact of this diff: #106734 ("More settings to randomize") hits the exact same set of tests (03031, 03275_pr_any_join, 02155, 04065, 04098, 01171, 03801) in its CI.

Still present from the previous rounds: 03031_filter_float64_logical_error (lost WITH ROLLUP totals row), 03275_pr_any_join (wrong results), 04052 / 03801 (index-analysis divergence — likely a real behavioral gap, not pinned on purpose), the Server died aborts (consistent with the considerEnablingParallelReplicas.cpp:359 chassert), and the 01171 transaction-test cascade.

One failure is unrelated to this PR — the azureBlobStorage "Expected 3 to 10 arguments" logical error fails across many PRs and on master daily; pinged @groeneai above.

Status unchanged: the remaining red is the latent parallel-replicas bugs this PR is designed to surface, and the sequencing call (fix-then-randomize vs. land-and-track) is still open.

@groeneai

Copy link
Copy Markdown
Contributor

Investigated. This Logical error: Expected 3 to 10 arguments in table function azureBlobStorage, got 1 (STID 3574-4812 / 2508-4994) is a CI scanner false positive, not a server bug, and it is not caused by this PR. Fix in a separate PR: #107220.

Root cause: the stateless test 04119_object_storage_bad_arg_count_no_logical_error (added by #103544, merged 2026-06-09) is a regression test whose queries throw on purpose. On a thrown query, executeQuery logs the full query text at Error level via toOneLineQuery, which keeps comment tokens verbatim, so the test's leading comment lands in clickhouse-server.err.log. That comment quoted the original pre-fix crash message word for word. ci/jobs/scripts/log_parser.py greps the server log with rg --text -A 10 -o 'Logical error.*', matches the comment, takes its first line as the failure title and a nearby unrelated stack for the STID (hence the buildQueryPlanForAutomaticParallelReplicas stack you saw, which has nothing to do with azureBlobStorage arg parsing).

It started exactly on 2026-06-09 because #103544 made the server throw NUMBER_OF_ARGUMENTS_DOESNT_MATCH instead of LOGICAL_ERROR, so the comment became the only remaining source of the string, on master and across every PR that runs the test. So your secret-masking hypothesis isn't it; the AST is fine. (log_parser.py already documents this SQL-comment false-match class in a comment near the grep.)

Reproduced locally: feeding the comment + SELECT * FROM azureBlobStorage();, the query correctly throws Code 42 NUMBER_OF_ARGUMENTS_DOESNT_MATCH, yet the scanner grep matches the comment. After rewording the comment to drop the verbatim message, the grep no longer matches and the test still passes. #107220 is test-only.

@alexey-milovidov

Copy link
Copy Markdown
Member

Re-merged current master into the branch (it was 1333 commits behind, last pushed 2026-06-09 / red). The diff vs. master is still the single randomizer line — no functional change.

The reason to re-merge now (vs. the 2026-06-11/12 triage, where re-merging would have just re-rolled the same red): the parallel-replicas owners have started landing fixes for the bugs this randomizer surfaces. Two directly relevant commits authored by @devcrafter are now on master and in this branch:

  • 39d95a60ceaTest + fix for parallel_replicas_min_number_of_rows_per_replica > 0 (the exact setting this PR randomizes), plus regression test 04327_pr_view_union_empty_branch.
  • 817d10bbaddFix: PR over view with UNION with empty table.

Re-running CI on today's master will refresh the signal and show which of the previously-red tests these fixes have cleared, narrowing the remaining set of genuine product bugs that still block merge. The PR remains intentionally open as the bug-finding / tracking vehicle until the remaining parallel-replicas bugs are fixed or a sequencing decision is made — no pinning/blacklisting.

@alexey-milovidov

Copy link
Copy Markdown
Member

Re-merged current master into the branch (it was 2374 commits behind, last pushed 2026-06-13 / red). The diff vs. master is still the single randomizer line — no functional change.

The reason to re-merge now (vs. the 2026-06-14..18 triage, where re-merging would have just re-rolled the same red): another directly relevant parallel-replicas JOIN fix has landed on master and is now in this branch:

  • #106338Restrict parallel-replicas join absorption to the leftmost leaf (merged 2026-06-19). It reworks the parallel-replicas + JOIN planner contract in PlannerJoinTree.cpp (the same file as @devcrafter's earlier fixes) so the leftmost leaf explicitly drives join absorption, fixing the empty-left-leaf case that produced LOGICAL_ERROR: Left and right columns have same names.

Re-running CI on today's master will refresh the signal and show which of the previously-red join tests (03254_pr_join_on_dups, 03261_pr_semi_anti_join, 03275_pr_any_join, …) this fix has cleared, narrowing the remaining set of genuine product bugs that still block merge.

The server-abort path (considerEnablingParallelReplicas.cpp chassert(local_replica_plan_reading_step->getAnalyzedResult() == nullptr) → "Server died") is unchanged on master, so that one and the index-analysis / WITH ROLLUP divergences are expected to remain red.

The PR remains intentionally open as the bug-finding / tracking vehicle until the remaining parallel-replicas bugs are fixed or a sequencing decision is made — no pinning/blacklisting.

…o-read-setting

The branch was ~11.7k commits behind (last merge 2026-06-20) and red. The
diff vs. `master` is still the single randomizer line — no functional change:

    "parallel_replicas_min_number_of_rows_per_replica": lambda: random.randint(0, 1),

Reason to re-merge now: the parallel-replicas owners have landed several
directly relevant fixes since 2026-06-20, so refreshing CI on today's
`master` will narrow the remaining set of genuine product bugs this PR is
meant to surface:
- #108451 (`Fix NOT_FOUND_COLUMN_IN_BLOCK for virtual columns under parallel
  replicas`, closes #106561) — should clear the tracked
  `04098_asterisk_include_virtual_columns_mergetree` failure.
- #101434 (`Reimplement reading in order for parallel replicas`) — bears
  directly on the `max_rows_to_read`-not-honored class
  (`02155_read_in_order_max_rows_to_read`, `00945_bloom_filter_index`).
- #109003 (`Fix server abort on GROUPING SETS in a set operation with
  parallel replicas`) — a "Server died" class fix.
- Flaky-test fix for `04051_pk_analysis_stats`.

Conflicts were all in files the branch does not intentionally change (its
only intended change is the one `tests/clickhouse-test` line); they were
resolved by taking `master`'s version. No pinning/blacklisting of the
affected parallel-replicas tests — that would mask the very signal this PR
exists to produce.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member

Re-merged current master into the branch (it was ~11.7k commits behind, last pushed 2026-06-20 / red). The diff vs. master is still the single randomizer line — no functional change.

Reason to re-merge now (vs. just re-rolling the same red): the parallel-replicas owners have landed several directly relevant fixes since 2026-06-20, so refreshing CI on today's master will narrow the remaining set of genuine product bugs that still block merge:

  • #108451Fix NOT_FOUND_COLUMN_IN_BLOCK for virtual columns under parallel replicas (Closes #106561). This is the exact failure previously triaged as 04098_asterisk_include_virtual_columns_mergetree (_part virtual column), so that one should now clear.
  • #101434Reimplement reading in order for parallel replicas (nickitat). Directly reworks the read-in-order + parallel-replicas path, i.e. the max_rows_to_read-not-honored class previously reported as 02155_read_in_order_max_rows_to_read and 00945_bloom_filter_index (TOO_MANY_ROWS) — the refreshed run will show whether the row-count bound is now applied.
  • #109003Fix server abort on GROUPING SETS in a set operation with parallel replicas. Clears that "Server died" class.
  • Flaky-test fix for 04051_pk_analysis_stats.

Expected to remain red (no relevant change on master):

  • The considerEnablingParallelReplicas.cpp:359 chassert(local_replica_plan_reading_step->getAnalyzedResult() == nullptr) → "Server died" path is unchanged on master (the assertion is still there), so that abort and its transaction-stress cascade (01169/01171/01173/01174) should persist.
  • 03031_filter_float64_logical_error (lost WITH ROLLUP totals row) and the semi/anti/any/right-join correctness cases (03254_pr_join_on_dups, 03261_pr_semi_anti_join, 03275_pr_any_join, 03279_pr_3_way_joins_right_first).
  • The index-analysis divergences (04052_distributed_index_analysis_in_subquery_no_quadratic, 03801_autopr_input_bytes_estimation_query_with_subqueries) — likely a real behavioral gap, deliberately not pinned.

As before, I did not tag/blacklist or pin parallel replicas off in any test — pinning would mask exactly the signal this PR is designed to expose. The PR remains intentionally open as the bug-finding / tracking vehicle until the remaining parallel-replicas bugs are fixed or a sequencing decision is made.

@alexey-milovidov

Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants