Fix `LOGICAL_ERROR` "Part X intersects previous part Y" during ReplicatedMergeTree startup by groeneai · Pull Request #103537 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix LOGICAL_ERROR "Part X intersects previous part Y" during ReplicatedMergeTree startup#103537

Merged
tuanpach merged 2 commits into
ClickHouse:masterfrom
groeneai:fix-empty-unexpected-parts-intersect-checkpartsimpl-2352-49be
May 1, 2026
Merged

Fix LOGICAL_ERROR "Part X intersects previous part Y" during ReplicatedMergeTree startup#103537
tuanpach merged 2 commits into
ClickHouse:masterfrom
groeneai:fix-empty-unexpected-parts-intersect-checkpartsimpl-2352-49be

Conversation

@groeneai

@groeneai groeneai commented Apr 25, 2026

Copy link
Copy Markdown
Contributor

Fix a rare LOGICAL_ERROR "Part X intersects previous part Y" that aborts
ReplicatedMergeTree startup when two empty unexpected parts on disk have
intersecting block ranges without one fully containing the other.

Failure

Logical error: 'Part all_2_11_3 intersects previous part all_0_5_2.
It is a bug or a result of manual intervention in the ZooKeeper data.'
(STID: 2352-49be)

Stack trace:

DB::ActiveDataPartSet::add (ActiveDataPartSet.cpp:83)
  ← StorageReplicatedMergeTree::checkPartsImpl (StorageReplicatedMergeTree.cpp:1909)
  ← StorageReplicatedMergeTree::checkParts (StorageReplicatedMergeTree.cpp:1847)
  ← ReplicatedMergeTreeAttachThread::runImpl (ReplicatedMergeTreeAttachThread.cpp:195)

Hit by the BuzzHouse fuzzer twice in 76 days (rare chronic trunk bug):

Maintainer directive to file a separate fix PR for this finding:
#100943 (comment)

Root cause

StorageReplicatedMergeTree::checkPartsImpl builds a covering set of empty
unexpected parts:

ActiveDataPartSet set_of_empty_unexpected_parts(format_version);
for (const auto & load_state : unexpected_data_parts)
{
    if (load_state.is_broken || load_state.part->rows_count || !load_state.uncovered)
        continue;

    set_of_empty_unexpected_parts.add(load_state.part->name);  // <-- throws
}

The uncovered flag is computed in MergeTreeData::loadDataParts
(MergeTreeData.cpp:2479) as: "no other unexpected part's
MergeTreePartInfo contains this one". Containment is strict — a part
can be unexpected, uncovered, and still intersect another uncovered
unexpected part:

  • all_0_5_2: blocks 0..5, level 2
  • all_2_11_3: blocks 2..11, level 3

Neither contains the other (min and max block numbers don't satisfy
the containment relation), but they intersect at blocks 2..5. Both pass
the uncovered == true filter; ActiveDataPartSet::add rejects the
second one with LOGICAL_ERROR and aborts the table-startup attach
thread. This typically arises in fuzzer workloads (concurrent
DROP/merges/INSERT) that leave overlapping empty drop-marker parts
on disk.

Fix

Use ActiveDataPartSet::tryAdd and log a warning instead of throwing.
The set is only consulted as a heuristic — getContainingPart against
unexpected parts that look like leftovers of dropped ranges. If two
empty markers intersect, either one is an equally valid covering
candidate; any unexpected part that the skipped marker would have
covered falls through to the existing handling path further down in
the same loop, where every entry of unexpected_data_parts is
ultimately renamed to detached/ignored after the sanity check.

A unit test (gtest_active_data_part_set) is added to document the
contract: add throws on intersection without containment while
tryAdd returns HasIntersectingPart without throwing.

Relation to PR #100992

PR #100992 by @tuanpach addresses the analogous "Part intersects" issue
in MergeTreeData::PartLoadingTree::add — a different code path
(loadDataParts rather than the replication-startup checkPartsImpl).
That fix builds a transaction-aware enum (RollbackStatus) to decide
which intersecting transactional part to keep. That sophistication is
appropriate for PartLoadingTree, which feeds the live data_parts
set; the empty unexpected parts examined here, by contrast, are already
non-transactional drop markers slated to be detached as ignored, so
choosing arbitrarily between two intersecting candidates is sound.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed a rare LOGICAL_ERROR "Part X intersects previous part Y" raised during ReplicatedMergeTree table startup when two empty unexpected parts on disk had overlapping but non-containing block ranges. The exception aborted the table-attach thread and prevented the table from coming up.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Version info

  • Merged into: 26.5.1.204

…atedMergeTree startup

`StorageReplicatedMergeTree::checkPartsImpl` builds a covering set of empty
unexpected parts (`set_of_empty_unexpected_parts`) by adding every empty
unexpected part whose `uncovered` flag is `true`. The `uncovered` flag is set
in `MergeTreeData::loadDataParts` to `true` when no other unexpected part's
`MergeTreePartInfo` *contains* the current one.

Two empty unexpected parts on disk can have intersecting block ranges without
one fully containing the other (for example, `all_0_5_2` and `all_2_11_3`).
Both parts pass the `uncovered = true` check, but `ActiveDataPartSet::add`
calls `addImpl`, which throws `LOGICAL_ERROR` with the "Part X intersects
previous part Y" message when the second part is added. The exception is
thrown on the `BackgroundSchedulePool` thread running
`ReplicatedMergeTreeAttachThread::runImpl`, aborting startup of the table.

Replace `add` with `tryAdd` and log a warning when the conflicting empty
marker is skipped. The set is only consulted later as a heuristic — when an
unexpected part is matched against `getContainingPart` to decide whether it
looks like a leftover of a dropped part — so keeping just one of two
intersecting empty markers is sufficient. Any unexpected part that would
have been covered by the skipped marker simply falls through to the regular
handling path that detaches it as `ignored` further down in the same loop.

Add a small `gtest_active_data_part_set` unit test that documents the
contract: `add` throws on intersection without containment while `tryAdd`
returns `HasIntersectingPart` without throwing.

Reproduction comes from the BuzzHouse fuzzer (STID `2352-49be`):
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100943&sha=3308ce056bdfd5672c3c300ccae03968beac9846&name_0=PR&name_1=BuzzHouse%20%28arm_asan_ubsan%29

Original PR with the BuzzHouse finding: ClickHouse#100943
@groeneai

Copy link
Copy Markdown
Contributor Author

@groeneai

Copy link
Copy Markdown
Contributor Author

Pre-PR validation gate (cron task-worker session cron:clickhouse-ci-task-worker:20260425-171500)

# Question Status
a Deterministic repro? ✅ The original test version (add instead of tryAdd) deterministically aborts the process with the exact CI error message — Logical error: 'Part all_2_11_3 intersects previous part all_0_5_2. ...' — and the same stack trace (ActiveDataPartSet::add at line 83). Reproduces in-process in < 1 second.
b Root cause explained? ✅ The uncovered flag in MergeTreeData::loadDataParts (MergeTreeData.cpp:2479) is computed as "no other unexpected part contains this one". Containment and intersection are independent relations: all_0_5_2 and all_2_11_3 neither contains the other, but they intersect at blocks 2..5. Both pass the uncovered == true filter, then ActiveDataPartSet::add throws LOGICAL_ERROR on the second insert and aborts the table-startup attach thread.
c Fix matches root cause? tryAdd returns HasIntersectingPart cleanly instead of throwing. The set is only consulted as a heuristic by getContainingPart; either of two intersecting empty markers is an equally valid covering candidate, and any unexpected part that the skipped marker would have covered falls through to the regular handling path that detaches it as ignored further down in the same loop.
d Test intent preserved? ✅ The covering-set heuristic continues to function — only the exception path is removed. Every entry of unexpected_data_parts is ultimately renamed to detached/ignored after the sanity check, so skipping one of two intersecting empty markers does not change the final outcome. Added 4 new unit tests documenting the contract.
e Demonstrated in both directions? ✅ Without fix (using add): test deterministically aborts with the exact CI error/stack. With fix (using tryAdd): all 4 unit tests pass cleanly (PASSED 4 tests in 0 ms).
f Fix is general, not a narrow patch? ✅ Three call sites that follow the same ActiveDataPartSet::add(part->name) pattern have been mapped: PartLoadingTree::add (loadDataParts; covered by PR #100992 with transaction-aware logic), tryLoadPartsToAttach (ALTER ATTACH PARTITION; uses add with a covered-parts list, different semantics), checkPartsImpl (replication startup, this PR). Each path has different semantics and requires a different fix; this PR is the targeted fix for the empty-unexpected-parts heuristic, which is the only call site where intersection without containment is a legitimate state that should not abort startup.

Local builds: clickhouse and unit_tests_dbms link cleanly with the change. gtest_active_data_part_set reports 4/4 tests passing.

@alexey-milovidov alexey-milovidov added the can be tested Allows running workflows for external contributors label Apr 26, 2026
@clickhouse-gh

clickhouse-gh Bot commented Apr 26, 2026

Copy link
Copy Markdown
Contributor

Workflow [PR], commit [6b1f736]

Summary:


AI Review

Summary

This PR fixes a real startup exception path in ReplicatedMergeTree (Part X intersects previous part Y) by replacing the fragile single-set handling of empty unexpected parts with OverlappingPartCovering, which preserves coverage checks even when empty markers intersect without containment. The change is targeted, includes focused unit tests for both ActiveDataPartSet behavior and the new overlapping-covering logic, and I did not find blocker/major correctness, safety, or performance issues in the patch.

ClickHouse Rules
Item Status Notes
Deletion logging
Serialization versioning
Core-area scrutiny
No test removal
Experimental gate
No magic constants
Backward compatibility
SettingsChangesHistory.cpp
PR metadata quality
Safe rollout
Compilation time
No large/binary files
Final Verdict
  • Status: ✅ Approve

@clickhouse-gh clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Apr 26, 2026
/// covering candidate is sufficient — any unexpected part that the skipped marker
/// would have covered will fall through to the regular handling path below.
String reason;
if (set_of_empty_unexpected_parts.tryAdd(load_state.part->name, &reason)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching from add to tryAdd avoids the immediate LOGICAL_ERROR, but skipping one intersecting empty marker can still change startup behavior in a harmful way.

Concrete case:

  • markers: all_0_5_2 and all_2_11_3 (intersect, neither contains the other),
  • unexpected part: all_9_10_1.

If all_2_11_3 is the one skipped, getContainingPart(all_9_10_1) no longer finds an empty covering marker, so this part is reclassified as uncovered and contributes to uncovered_unexpected_parts_rows, which can still trip the later sanity-check exception (TOO_MANY_UNEXPECTED_DATA_PARTS) during startup.

So the intersecting markers are not always interchangeable as covering candidates for downstream checks.

…ates

The first round used `ActiveDataPartSet::tryAdd` and silently skipped
the second of two intersecting empty markers. Bot review on PR
[ClickHouse#103537](ClickHouse#103537) flagged
a downstream regression: an unexpected non-empty part covered ONLY by
the skipped marker is reclassified as uncovered and contributes to
`uncovered_unexpected_parts_rows`, which can still trip the later
`TOO_MANY_UNEXPECTED_DATA_PARTS` exception during startup.

Concrete failure mode (from the bot's example):
- markers: `all_0_5_2` (blocks 0..5) and `all_2_11_3` (blocks 2..11)
- unexpected part: `all_9_10_1` (blocks 9..10)

`all_9_10_1` is contained ONLY by `all_2_11_3`. If `all_2_11_3` is the
marker that lands in the auxiliary list, `getContainingPart(all_9_10_1)`
must still find it — otherwise startup that previously failed with
`LOGICAL_ERROR` now fails with `TOO_MANY_UNEXPECTED_DATA_PARTS` instead.

Fix: introduce `OverlappingPartCovering` (header-only). It holds a primary
`ActiveDataPartSet` for fast lookup of non-overlapping markers and a
small auxiliary list for the rest. `getContainingPart` consults both,
so coverage detection is order-independent and every marker contributes
to coverage regardless of the order in which it was added. Replaces the
inline `tryAdd` in `StorageReplicatedMergeTree::checkPartsImpl` with this
helper.

Unit tests in `gtest_active_data_part_set.cpp` extended with the bot's
exact scenario (`all_0_5_2` + `all_2_11_3` covering `all_9_10_1`) in
both insertion orders, plus mixed-overlap, disjoint, contained, and
empty-state cases. The existing `ActiveDataPartSet` contract tests are
preserved.
@groeneai

Copy link
Copy Markdown
Contributor Author

@clickhouse-gh — thanks for the catch. Pushed 6b1f736e addressing the downstream concern.

The original tryAdd + skip approach was incomplete. As you noted, dropping the second of two intersecting markers can still trip TOO_MANY_UNEXPECTED_DATA_PARTS for any unexpected part that was covered ONLY by the skipped marker.

Reworked fix: new header-only helper OverlappingPartCovering (in src/Storages/MergeTree/OverlappingPartCovering.h). It keeps a primary ActiveDataPartSet for the fast non-overlapping case and stores intersecting markers in a small auxiliary list. getContainingPart consults both, so every marker contributes to coverage regardless of insertion order.

Walking through your concrete example with the new helper:

  • markers all_0_5_2 and all_2_11_3 — one lands in the primary set, the other in the auxiliary list (which one depends on iteration order, doesn't matter)
  • unexpected part all_9_10_1getContainingPart returns all_2_11_3 regardless, because all_2_11_3 is consulted whether it's in the primary set or the auxiliary list
  • result: all_9_10_1 is correctly classified as covered, no uncovered_unexpected_parts_rows increment, no downstream TOO_MANY_UNEXPECTED_DATA_PARTS

Unit tests in gtest_active_data_part_set.cpp extended with your exact scenario in both insertion orders (IntersectingMarkersStillCoverDownstreamParts + IntersectingMarkersOrderIndependent), plus mixed-overlap, three-marker, contained, and disjoint cases. All 12 tests pass.

Pre-PR validation gate (rework):

(a) Deterministic repro? Yes — OverlappingPartCovering.IntersectingMarkersStillCoverDownstreamParts deterministically reproduces the bot scenario.
(b) Root cause explained? Yes — the previous tryAdd fix dropped one marker from the covering set, so any unexpected part covered ONLY by that marker was silently reclassified as uncovered, inflating uncovered_unexpected_parts_rows and tripping TOO_MANY_UNEXPECTED_DATA_PARTS.
(c) Fix matches root cause? Yes — the helper keeps every marker as a covering candidate via the primary set + auxiliary list, so no marker can be silently dropped from coverage detection. Not a band-aid; the new structure is correct by construction for the overlap-without-containment case.
(d) Test intent preserved? Yes — original ActiveDataPartSet contract tests retained. New tests cover the bot's scenario in both orders, plus three-marker mixed-overlap, contained, disjoint, and empty cases.
(e) Both directions demonstrated? Yes — without the auxiliary list, IntersectingMarkersOrderIndependent fails order-dependently (returns "" for all_9_10_1 when all_2_11_3 is added second). With the auxiliary list, all 12 tests pass deterministically.
(f) Fix is general, not a narrow patch? Yes — checked all ActiveDataPartSet callers in src/Storages. Only set_of_empty_unexpected_parts operates on parts with arbitrary overlap (drop markers from concurrent traffic); other callers (local_expected_parts_set, active_parts, independent_ranges_set, current_parts/virtual_parts in queue) work with parts that maintain non-overlap by ZK validation or merge-selection invariants.

Build + test:

  • ninja -C build clickhouse → clean
  • ninja -C build unit_tests_dbms → clean
  • unit_tests_dbms --gtest_filter=\"ActiveDataPartSet.*:OverlappingPartCovering.*\" → 12/12 PASSED

Session: cron:clickhouse-ci-task-worker:20260426-224500

@clickhouse-gh

clickhouse-gh Bot commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

LLVM Coverage Report

Metric Baseline Current Δ
Lines 84.00% 84.00% +0.00%
Functions 91.10% 91.10% +0.00%
Branches 76.50% 76.50% +0.00%

Changed lines: 99.24% (130/131) | lost baseline coverage: 25 line(s) · Uncovered code

Full report · Diff report

@tuanpach tuanpach self-assigned this Apr 30, 2026
@tuanpach tuanpach added this pull request to the merge queue May 1, 2026
Merged via the queue into ClickHouse:master with commit 4adfb71 May 1, 2026
165 checks passed
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label May 1, 2026
@clickgapai

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

can be tested Allows running workflows for external contributors pr-bugfix Pull request with bugfix, not backported by default pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants