Fix flaky test 02770_async_buffer_ignore by groeneai · Pull Request #107298 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix flaky test 02770_async_buffer_ignore#107298

Open
groeneai wants to merge 1 commit into
ClickHouse:masterfrom
groeneai:groeneai/fix-flaky-02770-async-buffer-ignore
Open

Fix flaky test 02770_async_buffer_ignore#107298
groeneai wants to merge 1 commit into
ClickHouse:masterfrom
groeneai:groeneai/fix-flaky-02770-async-buffer-ignore

Conversation

@groeneai

Copy link
Copy Markdown
Contributor

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Description

Fixes the flaky test 02770_async_buffer_ignore. Investigated at the request of @alexey-milovidov on #100391.

The test asserted exact S3 read profile-event counts for a narrow-range read:

ProfileEvents['S3ReadRequestsCount'], ProfileEvents['ReadBufferFromS3Bytes'], ProfileEvents['ReadCompressedBytes']

against the reference 4 66446 66446. Those exact counts are non-deterministic across CI jobs and the test cannot pin them away (it is already no-random-settings). Over the last 90 days CIDB shows 212 failures across 12 unrelated PRs (only 1 on master), first seen 2026-03-26.

Observed failure modes (vs reference 4 66446 66446):

  • 4 830 66446 x174 (82%) - filesystem cache hit, ReadBufferFromS3Bytes drops to ~830
  • 6 67909 67909 x22 (10%) - parallel replicas
  • 4 7223260 66446 x5 - S3 read-ahead reads most of the file
  • 4 66446 679xx x8 - async read-ahead pulls one extra compressed block

The test verifies that a narrow read (2001 of 1000000 rows) does not pull the whole part. The fix keeps that intent but asserts bounds instead of exact values:

ProfileEvents['S3ReadRequestsCount'] < 100, ProfileEvents['ReadCompressedBytes'] < 1000000

A narrow read uses few requests and decompresses ~66 KB; the full column is ~4 MB compressed, so an over-read regression still trips ReadCompressedBytes < 1000000. ReadBufferFromS3Bytes is dropped because it is the unstable raw-transfer count (830 on cache hit, 7.2 MB on read-ahead). This mirrors 03164_s3_settings_for_queries_and_merges, which asserts ratios for the same reason.

Verified locally over MinIO s3_disk: narrow read passes 15/15 with fresh DBs; warm re-reads that broke the old exact assertion (3 66031 66031, 2 65616 65616) pass the new one; a full-column scan (4 MB) is still caught (1 0).

This flake is unrelated to #100391: the query has no ORDER BY, so the read-in-order path that PR changes is not exercised.

The test asserted exact S3 read profile-event counts (S3ReadRequestsCount,
ReadBufferFromS3Bytes, ReadCompressedBytes) for a narrow-range read. Those
counts are non-deterministic across CI jobs: filesystem cache hits drop
ReadBufferFromS3Bytes to ~830, parallel replicas raise the request count,
and S3 read-ahead inflates the byte counts. Over 90 days this produced 212
failures across 12 unrelated PRs.

Assert bounds instead: a narrow read of 2001 of 1000000 rows must not pull
the whole part (~4 MB compressed per column), so S3ReadRequestsCount stays
small and ReadCompressedBytes stays far below the full-column size. This
still catches the over-read regression the test guards against while
tolerating cache, read-ahead, and parallel-replica variance. Mirrors the
approach in 03164_s3_settings_for_queries_and_merges.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

cc @kssenii — could you review this? It is a test-only fix for the flaky 02770_async_buffer_ignore: the test asserted exact S3 read profile-event counts that vary with filesystem-cache warmth, S3 read-ahead, and parallel replicas (212 fails / 12 unrelated PRs / 90d). It now asserts bounds (S3ReadRequestsCount < 100, ReadCompressedBytes < 1000000), keeping the over-read guard while tolerating the variance, mirroring your 03164_s3_settings_for_queries_and_merges.

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Jun 12, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [6ce2bae]

Summary:

job_name test_name status info comment
Stateless tests (amd_llvm_coverage, ParallelReplicas, s3 storage, parallel) FAIL
04322_106474_session_timezone_lowcardinality_datetime_csv_roundtrip FAIL cidb
03707_statistics_cache NOT_FAILED cidb
02345_implicit_transaction NOT_FAILED cidb
Stateless tests (amd_msan, WasmEdge, parallel, 1/2) FAIL
04051_pk_analysis_stats FAIL cidb

AI Review

Summary
  • This PR is a test-only flake fix for 02770_async_buffer_ignore: it replaces exact S3 profile-event values with bounded assertions that preserve the test contract that a narrow range read must not decompress the whole column. I did not find any unresolved correctness or review issues in the changed lines.
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-ci label Jun 12, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.60% 84.60% +0.00%
Functions 92.40% 92.40% +0.00%
Branches 77.30% 77.30% +0.00%

Changed lines: No C/C++ source files changed — skipping uncovered code analysis.

Newly covered by added/modified tests: 279 line(s), 45 function(s) across 113 file(s) · Details

Top files
  • src/Storages/MaterializedView/RefreshTask.cpp: 15 line(s)
  • src/AggregateFunctions/TimeSeries/AggregateFunctionTimeseriesChanges.h: 14 line(s), 32 function(s)
  • src/IO/S3/copyS3File.cpp: 10 line(s)
  • src/Functions/FunctionsNumericIndexedVector.h: 9 line(s), 4 function(s)
  • src/IO/AzureBlobStorage/PocoHTTPClient.cpp: 9 line(s)

Full report

@groeneai

Copy link
Copy Markdown
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants