Fix several optimizations after lightweight deletes by CurtizJ · Pull Request #101212 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix several optimizations after lightweight deletes#101212

Merged
alesapin merged 5 commits into
ClickHouse:masterfrom
CurtizJ:fix-lwd-optimizations
Apr 8, 2026
Merged

Fix several optimizations after lightweight deletes#101212
alesapin merged 5 commits into
ClickHouse:masterfrom
CurtizJ:fix-lwd-optimizations

Conversation

@CurtizJ

@CurtizJ CurtizJ commented Mar 30, 2026

Copy link
Copy Markdown
Member

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fix minmax_count_projection and trivial COUNT(*) optimizations being permanently disabled after a lightweight delete, even after all parts with a mask of lightweight delete were merged away.

Version info

  • Merged into: 26.4.1.645

@clickhouse-gh

clickhouse-gh Bot commented Mar 30, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Mar 30, 2026
Comment thread src/Storages/MergeTree/MergeTreeData.cpp
…-free readers

Reorder counter increments before decrements so that `hasLightweightDeletedMask`
(which reads the counter with `memory_order_relaxed`) never sees a transient
false-negative. Previously, covered parts' counters were decremented before the
new part's counter was incremented, creating a window where the counter could be
zero while an active masked part already existed.

ClickHouse#101212 (comment)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@CurtizJ CurtizJ force-pushed the fix-lwd-optimizations branch from 018be70 to 4cd012b Compare March 30, 2026 16:23
Comment thread tests/queries/0_stateless/04068_lwd_minmax_projection_recovery.sql
Comment thread src/Storages/MergeTree/MergeTreeData.cpp Outdated
Comment thread src/Storages/MergeTree/MergeTreeData.cpp
…lete` counter

In `removePartsFromWorkingSet` and `forcefullyMovePartToDetachedAndRemoveFromMemory`,
the counter was decremented while the part was still in `Active` state, creating a
false-negative window for lock-free readers of `hasLightweightDeletedMask`.
In `swapActivePart`, the counter was incremented after the part was set to `Active`,
creating the same kind of false-negative window.

Fix: transition state before decrementing counters (deactivation paths) and
increment counters before transitioning to `Active` (activation paths).

Also add `SELECT count()` assertions to the test to verify that trivial count
optimization is also correctly disabled/re-enabled across the lightweight delete
lifecycle.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@alexey-milovidov

Copy link
Copy Markdown
Member

The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix.

@CurtizJ CurtizJ changed the title Fix several optimization after lightweight deletes Fix several optimizations after lightweight deletes Apr 7, 2026
@alexey-milovidov

Copy link
Copy Markdown
Member

The test 02859_replicated_db_name_zookeeper is fixed in #101952

{
addPartContributionToDataVolume(res.part);
addPartContributionToUncompressedBytesInPatches(res.part);
addPartContributionToTableCounters(res.part);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hasLightweightDeletedMask is now used as a correctness gate for minmax_count_projection / trivial COUNT(*), but loadDataPart still makes an active part visible before updating total_parts_with_lightweight_delete.

In this function, res.part->setState(to_state) runs before insertion, so an Active part with a lightweight-delete mask can be observable while the counter is still 0 until addPartContributionToTableCounters runs. That is a transient false-negative for lock-free readers and can re-enable these optimizations too early.

Please keep activation monotonic for lock-free readers on this path as well (no temporary false-negatives), e.g. by ensuring the counter reflects the part before it can be observed as Active, or by validating under readLockParts when the fast-path counter says 0.

@clickhouse-gh

clickhouse-gh Bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 83.90% 84.00% +0.10%
Functions 90.90% 90.90% +0.00%
Branches 76.40% 76.40% +0.00%

Changed lines: 94.77% (145/153) | lost baseline coverage: 100 line(s) · Uncovered code

Full report · Diff report

@alesapin alesapin self-assigned this Apr 8, 2026
@alesapin alesapin added this pull request to the merge queue Apr 8, 2026
Merged via the queue into ClickHouse:master with commit 04cd81c Apr 8, 2026
163 checks passed
@alesapin alesapin deleted the fix-lwd-optimizations branch April 8, 2026 12:28
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants