More fuzzing in CI by PedroTadim · Pull Request #102197 · ClickHouse/ClickHouse · GitHub
Skip to content

More fuzzing in CI#102197

Merged
PedroTadim merged 15 commits into
masterfrom
fuzz-upt
May 1, 2026
Merged

More fuzzing in CI#102197
PedroTadim merged 15 commits into
masterfrom
fuzz-upt

Conversation

@PedroTadim

@PedroTadim PedroTadim commented Apr 9, 2026

Copy link
Copy Markdown
Member

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Improve codec generation in BuzzHouse based on the column type. New settings.

  • Missing features in AST fuzzer: TTL definitions, codecs, type casts, table constraints.

  • Make 03167_base64_url_functions_sh.sh faster.

  • Documentation is written (mandatory for new features)

Version info

  • Merged into: 26.5.1.226

@clickhouse-gh

clickhouse-gh Bot commented Apr 9, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-ci label Apr 9, 2026
Comment thread src/Client/BuzzHouse/Generator/SessionSettings.cpp Outdated
Comment thread src/Common/QueryFuzzer.h Outdated
@PedroTadim PedroTadim enabled auto-merge April 9, 2026 11:17
@alexey-milovidov

Copy link
Copy Markdown
Member

The test_database_iceberg_nessie_catalog and test_database_iceberg_lakekeeper_catalog failures in this PR's CI are false positives caused by a bug in PR #100583 (merged to master on 2026-04-09 at 11:19 UTC). That PR broke the metadata file path resolution in getMetadataLocation for Iceberg catalog tests — it produced a doubled S3 key like table_path/warehouse-rest/table_path/metadata/... instead of table_path/metadata/..., causing MinIO to return 403.

The bug was reverted by PR #102247 (merged at ~14:48 UTC). All nessie/lakekeeper test failures between these two merges are unrelated to this PR.

@alexey-milovidov

Copy link
Copy Markdown
Member

The flaky check failure is fixed in #102148, let's update the branch.

pull Bot pushed a commit to AnotherGenZ/ClickHouse that referenced this pull request Apr 30, 2026
`test_parallel_replicas_with_view` (added in ClickHouse#100958, merged 2026-04-19)
asserts strict equality on the `ParallelReplicasUnavailableCount` and
`ParallelReplicasUsedCount` ProfileEvents collected from `system.query_log`.
The strict equality on `ParallelReplicasUnavailableCount` is racy.

Old replicas are filtered at connection time inside `RemoteQueryExecutor`'s
`create_connections` lambda: the version check disconnects entries whose
`parallel_replicas_version` is below
`DBMS_PARALLEL_REPLICAS_MIN_VERSION_WITH_STREAM_ID`. The
`ParallelReplicasUnavailableCount` ProfileEvent is only incremented later in
`RemoteQueryExecutor::sendQueryUnlocked`, when it observes the resulting empty
`MultiplexedConnections`. With `parallel_replicas_local_plan = 1` (the default)
the local plan can serve all required data quickly and cancel the pipeline,
in which case the `RemoteSource` for some old replicas may be cancelled
BEFORE `tryGenerate` ever calls `sendQuery` (see `RemoteSource::prepare`
short-circuit on `isCancelled`). For those replicas the version check never
runs and `ParallelReplicasUnavailableCount` is not incremented.

CIDB confirms the racy pattern. Last 14 days on master + PRs:
* 2026-04-29 PR ClickHouse#100412 — line 129 `parallel_replicas_allow_view_over_mergetree=0`,
  expected 4,2 got 0,2
* 2026-04-29 PR ClickHouse#100399 — line 129, expected 4,2 got 0,2
* 2026-04-27 master       — line 129, expected 4,2 got 1,2
* 2026-04-27 PR ClickHouse#102197   — line 125 `parallel_replicas_local_plan=1`, expected 2,1 got 0,1

In every case the sum result was correct and the `ParallelReplicasUsedCount`
matched the expected value — only the `ParallelReplicasUnavailableCount`
was below the strict-equality target.

This commit relaxes the assertion to:
* `ParallelReplicasUsedCount == expected_used` (kept strict — this is
  deterministic, and a higher-than-expected value would mean an old replica
  was actually used, which is the real bug we want to catch).
* `0 <= ParallelReplicasUnavailableCount <= expected_unavailable` — bounded
  above by the maximum possible (catches spurious unavailability) but
  tolerant of the cancel-before-`sendQuery` race.

The primary correctness check — `SELECT sum(value) FROM v` matches the
expected sum — is unchanged. If any old replica were ever actually used as
a replica, the sum would be wrong (or `ParallelReplicasUsedCount` would
exceed `expected_used`).

Refs:
- Test added in ClickHouse#100958
- Reports on which the analysis is based:
  * https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100412&sha=614dbdeb3cb9742d3f48e0f9bdb3b37a7146890e&name_0=PR&name_1=Integration%20tests%20%28arm_binary%2C%20distributed%20plan%2C%203%2F4%29
  * https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=529f4ede&name_0=MasterCI

Verified locally: the modified test passes (1 passed in 43.90s) under
`Integration tests (amd_tsan, 1/6)`.
Comment thread src/Common/QueryFuzzer.cpp
@clickhouse-gh

clickhouse-gh Bot commented May 1, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.10% 84.10% +0.00%
Functions 91.10% 91.10% +0.00%
Branches 76.60% 76.60% +0.00%

Changed lines: 7.38% (9/122) | lost baseline coverage: 70 line(s) · Uncovered code

Full report · Diff report

@PedroTadim PedroTadim added this pull request to the merge queue May 1, 2026
Merged via the queue into master with commit 03d0559 May 1, 2026
165 checks passed
@PedroTadim PedroTadim deleted the fuzz-upt branch May 1, 2026 05:59
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants