Remove redundant checksum calculation during vertical merge of parts with projections by amosbird · Pull Request #92866 · ClickHouse/ClickHouse · GitHub
Skip to content

Remove redundant checksum calculation during vertical merge of parts with projections#92866

Merged
tiandiwonder merged 11 commits into
ClickHouse:masterfrom
amosbird:remove-useless-code-in-projection-1
Jun 8, 2026
Merged

Remove redundant checksum calculation during vertical merge of parts with projections#92866
tiandiwonder merged 11 commits into
ClickHouse:masterfrom
amosbird:remove-useless-code-in-projection-1

Conversation

@amosbird

@amosbird amosbird commented Dec 23, 2025

Copy link
Copy Markdown
Collaborator

Changelog category (leave one):

  • Not for changelog (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Note

Medium Risk
Touches MergeTree part checksum generation around projections, which is integrity-critical; a missed or duplicated checksum could surface as corrupted parts during mutation/merge workflows.

Overview
Removes redundant projection checksum bookkeeping during column-only writes. MergedColumnOnlyOutputStream::fillChecksums no longer adds projection-part entries to the part checksums, avoiding duplicate work during vertical merges where projections are accounted for later.

Preserves mutation correctness by adding projection checksums at mutation finalize time. MutateTask now explicitly appends checksums for rebuilt projection parts after fillChecksums, ensuring mutations that rebuild projections still persist the required *.proj checksum records.

Reviewed by Cursor Bugbot for commit debe999. Bugbot is set up for automated code reviews on this repo. Configure here.

Version info

  • Merged into: 26.6.1.491

@clickhouse-gh

clickhouse-gh Bot commented Dec 23, 2025

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-not-for-changelog This PR should not be mentioned in the changelog label Dec 23, 2025

@tiandiwonder tiandiwonder left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix test failure: 01710_projection_fetch_long

Comment thread src/Storages/MergeTree/MergedColumnOnlyOutputStream.cpp
…-in-projection-1

# Conflicts:
#	src/Storages/MergeTree/MergedColumnOnlyOutputStream.cpp
The previous commit removed the projection-checksums loop from
`MergedColumnOnlyOutputStream::fillChecksums`. That removal is correct
for the vertical merge path (where `MergedBlockOutputStream::finalizePartAsync`
re-adds projection checksums later), but for `MutateSomePartColumnsTask`
there is no subsequent step that adds them, so the resulting
`checksums.txt` was missing `*.proj` entries for projections rebuilt
during a mutation (e.g. `MATERIALIZE PROJECTION`).

This caused stateless test `01710_projection_fetch_long` to fail because
the replicated part's checksums no longer contained the projection entry
after `materialize projection`, and replicas observed inconsistent state.

Restore the loop in the mutation finalize path so that projection
checksums end up in `new_data_part->checksums` before
`finalizeMutatedPart` writes them to disk.

CI: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=92866&sha=0e81c92c1f24f056d3672f0e1f34603dffb55596&name_0=PR
PR: ClickHouse#92866
@alexey-milovidov

Copy link
Copy Markdown
Member

@groeneai, provide a fix for FileCacheTest.

@groeneai

Copy link
Copy Markdown
Contributor

@alexey-milovidov @amosbirdFileCacheTest.SLRUFreeSpaceKeepingProtectedOnly is not caused by this PR, it's a chronic flake hitting many unrelated PRs.

Cross-PR data (CIDB, last 30 days)

The same test fails on 21 distinct PRs plus 3 master commits in just 2 days:

Where Hits Build
Master (74e1c5ed, 407d6cca, 6d2c8bd3) 3 Unit tests (amd_llvm_coverage)
20 unrelated PRs (this one + 19 others) 20 mostly amd_llvm_coverage, also one asan_ubsan, one tsan

This PR (#92866) modifies only MergedColumnOnlyOutputStream.cpp and MutateTask.cpp — neither touches FileCache code paths. The test failure is a timing race on slow builds, not a regression here.

Root cause

The test was added in #104313 to cover #104307. It drives the bug condition through FileCache::freeSpaceRatioKeepingThreadFunc with keep_free_space_elements_ratio = 1.0, which causes cache->initialize() at src/Interpreters/FileCache/FileCache.cpp:448 to schedule the asynchronous background eviction task. That task races with the test's read_and_check and intermediate assertProbationary / assertProtected calls. On slow builds (notably Unit tests (amd_llvm_coverage)) the background thread evicts entries between the populate and assert steps, so getFileSegmentsNum() reports 3 instead of the expected 0 at line 1626.

Fix is already in flight

PR #104418 ("Fix flaky FileCacheTest.SLRUFreeSpaceKeepingProtectedOnly", open since 2026-05-08) replaces the racy integration test with a focused unit-level test that calls SLRUFileCachePriority::collectEvictionInfo directly with three protected-only entries via addForRestore(... QueueEntryType::SLRU_Protected ...). That reproduces the exact chassert precondition (probationary empty, protected populated, is_total_space_cleanup=true) without involving FileCache's asynchronous machinery, making it deterministic on every build flavor.

@kssenii was already cc'd as the filesystem-cache owner. PR is MERGEABLE and currently green except for an unrelated Fast test flake on 02932_refreshable_materialized_views_2 (separate chronic timeout, 11 failures / 10 PRs in 30d, not introduced by #104418).

TL;DR for #92866

Re-running Unit tests (amd_llvm_coverage) here will pass intermittently. The test will become deterministic once #104418 merges. No action needed from @amosbird.

CI report for the failure on this PR:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=92866&sha=b82648604f3e517985d455be11e7939bfe265704&name_0=PR&name_1=Unit%20tests%20%28amd_llvm_coverage%29

(session: cron:clickhouse-ci-task-worker:20260510-004500)

@alexey-milovidov

Copy link
Copy Markdown
Member

CI status note for the two non-success checks on commit fccb42d:

  • AST fuzzer (amd_debug, targeted, old_compatibility)Inconsistent AST formatting (STID: 1941-1bfa). This is an existing fuzzer-found bug unrelated to this PR (the failing query is ALTER TABLE ... ADD PROJECTION ... GROUP BY ... AS alias26) where the alias placement inside a function call mis-prints). It is being fixed in Fix inconsistent AST formatting for aliased IN inside a function call (STID 1941-1bfa) #105091.
  • Code Review — the copilot subprocess exited non-OK (Copilot review failed after 3 attempts); this is a transient infra failure and is what currently makes "Mergeable Check" fail.

@alexey-milovidov

Copy link
Copy Markdown
Member

Merged master into the branch to re-run CI. The two failures look unrelated to this PR:

  1. Code Review — Copilot review subprocess infra failure (ERROR: Copilot review failed after 3 attempts: copilot subprocess exited with non-OK status).
  2. AST fuzzer (amd_debug, targeted, old_compatibility)Logical error: 'Inconsistent AST formatting: the query: ...' — a roundtrip formatting issue for an ADD PROJECTION expression containing <=> (isNotDistinctFrom) with an AS alias. The same error reproduces on master and several unrelated PRs (queried from play.clickhouse.com):
pull_request_number  date
0                    2026-05-12   ← master
92866                2026-05-12
103234               2026-05-11
90043                2026-05-10
92340                2026-04-27

The PR itself only touches projection-checksum bookkeeping in vertical merge / mutation, with no parser/formatter changes — it cannot affect AST roundtrip behavior.

Workflow files in this branch were stale or had been deleted as a
workaround for GitHub's workflow-scope check. Restoring them to match
`master` so the PR diff no longer contains those unrelated changes.
Comment thread src/Storages/MergeTree/MutateTask.cpp
@tiandiwonder tiandiwonder enabled auto-merge June 3, 2026 00:25
@alexey-milovidov

Copy link
Copy Markdown
Member

Merged the latest master (the branch was ~1400 commits behind). The merge was clean — only MutateTask.cpp auto-merged, and the PR's two changes are intact. I re-verified the correctness invariant on current master: the vertical-merge path still adds the projection checksums via MergedBlockOutputStream::finalizePartAsync (MergedBlockOutputStream.cpp:222), so removing the duplicate addition in MergedColumnOnlyOutputStream::fillChecksums remains correct, and the mutation path is covered by the new loop in MutateSomePartColumnsTask::finalize. Both changed files compile cleanly against current master. CI was green before the merge; all review threads are resolved.

@clickhouse-gh

clickhouse-gh Bot commented Jun 6, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.50% 84.50% +0.00%
Functions 92.30% 92.30% +0.00%
Branches 77.00% 77.10% +0.10%

Changed lines: Changed C/C++ lines covered by tests: 14/16 (87.50%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 1 line(s) · Uncovered code

Full report · Diff report

@alexey-milovidov

Copy link
Copy Markdown
Member

@tiandiwonder tiandiwonder added this pull request to the merge queue Jun 8, 2026
Merged via the queue into ClickHouse:master with commit 38129a8 Jun 8, 2026
166 checks passed
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-not-for-changelog This PR should not be mentioned in the changelog pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants