Fix flaky 01710_projections by pinning index_granularity by groeneai · Pull Request #107698 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix flaky 01710_projections by pinning index_granularity#107698

Merged
Algunenano merged 2 commits into
ClickHouse:masterfrom
groeneai:groeneai/fix-01710-projections-pin-index-granularity
Jun 17, 2026
Merged

Fix flaky 01710_projections by pinning index_granularity#107698
Algunenano merged 2 commits into
ClickHouse:masterfrom
groeneai:groeneai/fix-01710-projections-pin-index-granularity

Conversation

@groeneai

@groeneai groeneai commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

No open issue found specifically for 01710_projections. Related (different test, closed): #104219.

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Description

Fixes flaky 01710_projections on Stateless tests (amd_tsan, s3 storage, parallel). The table is created without an explicit index_granularity, so --random-merge-tree-settings injects a small value (the CI minimizer isolated --index_granularity 3). Small granules let the projection part's pre-aggregated states be read across multiple concurrent streams (frequent on s3), changing the summation order of float aggregates like avg(block_count / duration), which yields a trailing-digit difference from the reference (1.2352100180287104 vs ...118).

Pinning index_granularity = 8192 keeps the projection in a single granule so the read/merge order is deterministic. The randomizer skips settings already present in the CREATE, so the pin wins.

0 master hits in 30d; flaky across 5 unrelated PRs. CI report for the exact failure: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=107634&sha=5a342b69fe28bf9c7cedb4b0a5ace9fa8160c469&name_0=PR&name_1=Stateless%20tests%20%28amd_tsan%2C%20s3%20storage%2C%20parallel%2C%202%2F2%29

Version info

  • Merged into: 26.6.1.929

The test creates projection_test without an explicit index_granularity,
so CI's --random-merge-tree-settings injects a small value (e.g. 3). With
many small granules the projection part's pre-aggregated states are read
across multiple concurrent streams (notably on s3 storage), so the float
aggregates avg(block_count / duration) etc. are summed in a different
order, producing a trailing-digit difference from the reference
(1.2352100180287104 vs ...118). Pinning index_granularity = 8192 keeps the
data in a single granule, making the read and merge order deterministic.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

cc @CurtizJ — could you review this? It pins index_granularity = 8192 in the 01710_projections CREATE so the random-merge-tree-settings injection of a small granularity can't split the projection part across concurrent read streams, which was causing trailing-digit float differences in the aggregates (flaky on the amd_tsan s3 shard, 0 master hits).

@Algunenano Algunenano self-assigned this Jun 17, 2026
@Algunenano Algunenano added the can be tested Allows running workflows for external contributors label Jun 17, 2026
@clickhouse-gh

clickhouse-gh Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [3c9ef2a]

Summary:

job_name test_name status info comment
Stateless tests (amd_tsan, s3 storage, parallel, 2/2) FAIL
04337_quorum_insert_honors_time_limit FAIL cidb IGNORED

AI Review

Summary
  • This PR pins index_granularity = 8192 in 01710_projections and adds an inline comment explaining why: random merge tree settings can otherwise shrink granules, split projection aggregate states across concurrent read streams, and perturb trailing digits in float aggregate output. I found no remaining correctness, safety, metadata, or test-contract issue in this test-only change.
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-ci label Jun 17, 2026

@Algunenano Algunenano left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@groeneai When linking failures, please provide full links. The one you shared is invalid: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=107634 does not resolve to a html page, and even if it did it would require finding the exact commit and check that triggered the fail:

Instead share https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=107634&sha=5a342b69fe28bf9c7cedb4b0a5ace9fa8160c469&name_0=PR&name_1=Stateless%20tests%20%28amd_tsan%2C%20s3%20storage%2C%20parallel%2C%202%2F2%29 This points to the exact failure.

drop table if exists projection_test;

create table projection_test (`sum(block_count)` UInt64, domain_alias UInt64 alias length(domain), datetime DateTime, domain LowCardinality(String), x_id String, y_id String, block_count Int64, retry_count Int64, duration Int64, kbytes Int64, buffer_time Int64, first_time Int64, total_bytes Nullable(UInt64), valid_bytes Nullable(UInt64), completed_bytes Nullable(UInt64), fixed_bytes Nullable(UInt64), force_bytes Nullable(UInt64), projection p (select toStartOfMinute(datetime) dt_m, countIf(first_time = 0) / count(), avg((kbytes * 8) / duration), count(), sum(block_count) / sum(duration), avg(block_count / duration), sum(buffer_time) / sum(duration), avg(buffer_time / duration), sum(valid_bytes) / sum(total_bytes), sum(completed_bytes) / sum(total_bytes), sum(fixed_bytes) / sum(total_bytes), sum(force_bytes) / sum(total_bytes), sum(valid_bytes) / sum(total_bytes), sum(retry_count) / sum(duration), avg(retry_count / duration), countIf(block_count > 0) / count(), countIf(first_time = 0) / count(), uniqHLL12(x_id), uniqHLL12(y_id) group by dt_m, domain)) engine MergeTree partition by toDate(datetime) order by toStartOfTenMinutes(datetime) settings index_granularity_bytes = 10000000, add_minmax_index_for_numeric_columns=0;
create table projection_test (`sum(block_count)` UInt64, domain_alias UInt64 alias length(domain), datetime DateTime, domain LowCardinality(String), x_id String, y_id String, block_count Int64, retry_count Int64, duration Int64, kbytes Int64, buffer_time Int64, first_time Int64, total_bytes Nullable(UInt64), valid_bytes Nullable(UInt64), completed_bytes Nullable(UInt64), fixed_bytes Nullable(UInt64), force_bytes Nullable(UInt64), projection p (select toStartOfMinute(datetime) dt_m, countIf(first_time = 0) / count(), avg((kbytes * 8) / duration), count(), sum(block_count) / sum(duration), avg(block_count / duration), sum(buffer_time) / sum(duration), avg(buffer_time / duration), sum(valid_bytes) / sum(total_bytes), sum(completed_bytes) / sum(total_bytes), sum(fixed_bytes) / sum(total_bytes), sum(force_bytes) / sum(total_bytes), sum(valid_bytes) / sum(total_bytes), sum(retry_count) / sum(duration), avg(retry_count / duration), countIf(block_count > 0) / count(), countIf(first_time = 0) / count(), uniqHLL12(x_id), uniqHLL12(y_id) group by dt_m, domain)) engine MergeTree partition by toDate(datetime) order by toStartOfTenMinutes(datetime) settings index_granularity = 8192, index_granularity_bytes = 10000000, add_minmax_index_for_numeric_columns=0;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave a comment when you add settings, explaining why you did it. See the example add_minmax_index_for_numeric_columns above

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@groeneai This is still missing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 3c9ef2a. Added an inline comment above the setting matching the add_minmax_index_for_numeric_columns style: index_granularity = 8192 is pinned because --random-merge-tree-settings otherwise injects a small value that splits the projection part across concurrent read streams, changing float accumulation order and making the result flaky.

@groeneai

Copy link
Copy Markdown
Contributor Author

Add an inline comment next to the index_granularity = 8192 setting,
matching the existing add_minmax_index_for_numeric_columns note, per
review feedback on PR ClickHouse#107698.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 85.20% 85.20% +0.00%
Functions 92.30% 92.30% +0.00%
Branches 77.50% 77.50% +0.00%

Changed lines: No C/C++ source files changed — skipping uncovered code analysis.

Newly covered by added/modified tests: 415 line(s), 42 function(s) across 123 file(s) · Details

Top files
  • src/Storages/StorageReplicatedMergeTree.cpp: 73 line(s), 1 function(s)
  • src/AggregateFunctions/TimeSeries/AggregateFunctionTimeseriesChanges.h: 14 line(s), 32 function(s)
  • src/AggregateFunctions/AggregateFunctionWindowFunnel.cpp: 14 line(s), 6 function(s)
  • src/AggregateFunctions/AggregateFunctionQuantileGK.cpp: 14 line(s)
  • src/Interpreters/HashJoin/HashJoin.cpp: 10 line(s)

Full report

@groeneai

Copy link
Copy Markdown
Contributor Author

@Algunenano Algunenano enabled auto-merge June 17, 2026 16:41
@Algunenano Algunenano added this pull request to the merge queue Jun 17, 2026
Merged via the queue into ClickHouse:master with commit 229abc8 Jun 17, 2026
163 of 165 checks passed
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants