More fuzzing in CI#102197
Conversation
|
The The bug was reverted by PR #102247 (merged at ~14:48 UTC). All nessie/lakekeeper test failures between these two merges are unrelated to this PR. |
|
The flaky check failure is fixed in #102148, let's update the branch. |
`test_parallel_replicas_with_view` (added in ClickHouse#100958, merged 2026-04-19) asserts strict equality on the `ParallelReplicasUnavailableCount` and `ParallelReplicasUsedCount` ProfileEvents collected from `system.query_log`. The strict equality on `ParallelReplicasUnavailableCount` is racy. Old replicas are filtered at connection time inside `RemoteQueryExecutor`'s `create_connections` lambda: the version check disconnects entries whose `parallel_replicas_version` is below `DBMS_PARALLEL_REPLICAS_MIN_VERSION_WITH_STREAM_ID`. The `ParallelReplicasUnavailableCount` ProfileEvent is only incremented later in `RemoteQueryExecutor::sendQueryUnlocked`, when it observes the resulting empty `MultiplexedConnections`. With `parallel_replicas_local_plan = 1` (the default) the local plan can serve all required data quickly and cancel the pipeline, in which case the `RemoteSource` for some old replicas may be cancelled BEFORE `tryGenerate` ever calls `sendQuery` (see `RemoteSource::prepare` short-circuit on `isCancelled`). For those replicas the version check never runs and `ParallelReplicasUnavailableCount` is not incremented. CIDB confirms the racy pattern. Last 14 days on master + PRs: * 2026-04-29 PR ClickHouse#100412 — line 129 `parallel_replicas_allow_view_over_mergetree=0`, expected 4,2 got 0,2 * 2026-04-29 PR ClickHouse#100399 — line 129, expected 4,2 got 0,2 * 2026-04-27 master — line 129, expected 4,2 got 1,2 * 2026-04-27 PR ClickHouse#102197 — line 125 `parallel_replicas_local_plan=1`, expected 2,1 got 0,1 In every case the sum result was correct and the `ParallelReplicasUsedCount` matched the expected value — only the `ParallelReplicasUnavailableCount` was below the strict-equality target. This commit relaxes the assertion to: * `ParallelReplicasUsedCount == expected_used` (kept strict — this is deterministic, and a higher-than-expected value would mean an old replica was actually used, which is the real bug we want to catch). * `0 <= ParallelReplicasUnavailableCount <= expected_unavailable` — bounded above by the maximum possible (catches spurious unavailability) but tolerant of the cancel-before-`sendQuery` race. The primary correctness check — `SELECT sum(value) FROM v` matches the expected sum — is unchanged. If any old replica were ever actually used as a replica, the sum would be wrong (or `ParallelReplicasUsedCount` would exceed `expected_used`). Refs: - Test added in ClickHouse#100958 - Reports on which the analysis is based: * https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100412&sha=614dbdeb3cb9742d3f48e0f9bdb3b37a7146890e&name_0=PR&name_1=Integration%20tests%20%28arm_binary%2C%20distributed%20plan%2C%203%2F4%29 * https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=529f4ede&name_0=MasterCI Verified locally: the modified test passes (1 passed in 43.90s) under `Integration tests (amd_tsan, 1/6)`.
LLVM Coverage ReportChanged lines: 7.38% (9/122) | lost baseline coverage: 70 line(s) · Uncovered code |

Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
...
Documentation entry for user-facing changes
Improve codec generation in BuzzHouse based on the column type. New settings.
Missing features in AST fuzzer: TTL definitions, codecs, type casts, table constraints.
Make 03167_base64_url_functions_sh.sh faster.
Documentation is written (mandatory for new features)
Version info
26.5.1.226