iframe-proxy

groeneai · 2026-06-26T03:01:18Z

Related: #42120
Related: #28848
Related: #56524

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed a LOGICAL_ERROR exception ("Part with name ... is already written by concurrent request") on INSERT into a ReplicatedMergeTree table when the ZooKeeper block-number counter is reset while a local part at a higher block number still exists (for example after Keeper metadata loss or replica re-creation). This LOGICAL_ERROR aborts the server in debug and sanitizer builds and is a catchable exception in release builds. The insert now fails with a regular DUPLICATE_DATA_PART error and the conflicting part is enqueued for a check.

Description

ReplicatedMergeTreeSink::commitPart allocates a new block number, builds the part name from it, and adds the part to the working set via renameTempPartAndAdd. If that throws DUPLICATE_DATA_PART/PART_IS_TEMPORARILY_LOCKED (the freshly allocated name already exists locally), the sink treated it as an impossible condition and threw LOGICAL_ERROR. In debug and sanitizer builds (abort_on_logical_error) this aborts the server; in release builds it is a catchable exception.

This is reachable without any bug: the ZooKeeper block-number sequential counter can be reset out from under a still-present local part (Keeper metadata loss, replica re-creation, or a DROP/lost-replica race). The counter then restarts and hands out a number whose part survived locally, so the next INSERT collides with it. SYSTEM RESTORE REPLICA avoids this by moving all parts to detached/ before recreating ZooKeeper metadata (see the comment in StorageReplicatedMergeTree::restoreMetadataInZooKeeper), but the collision is still reachable through other paths, and it was reported historically (#28848, #42120, #56524) as a "race with DROP".

The fix handles it as a recoverable replica inconsistency: it enqueues a part check to reconcile the surviving part and fails the INSERT with a regular DUPLICATE_DATA_PART error instead of raising the LOGICAL_ERROR.

Found by the BuzzHouse fuzzer (BuzzHouse (amd_msan)), STID 3744-4a1b:
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=108549&sha=09e554ccedbce6d2dced89414ae256d516fb7c72&name_0=PR&name_1=BuzzHouse%20%28amd_msan%29

The crash was reproduced locally on a single-node ReplicatedMergeTree by inserting a row, recreating the partition's /block_numbers/<partition> ZooKeeper node (which resets its sequential counter) while the part stays in the working set, and inserting again. The added stateless test 04411_replicated_insert_block_number_counter_reset reproduces this and asserts the INSERT fails cleanly while the server stays alive. Verified that the test triggers the LOGICAL_ERROR (aborting the server in debug/sanitizer builds) before the fix and passes after it.

🤖 Generated with Claude Code

groeneai · 2026-06-26T03:01:57Z

Pre-PR validation gate (click to expand)

#	Question	Answer
a	Deterministic repro?	Yes. On a single-node ReplicatedMergeTree: insert 6 rows (parts `all_0_0_0`..`all_5_5_0`), then `rmr` + `touch` the partition's `/block_numbers/all` ZooKeeper node via `keeper-client` to reset its sequential counter, then `INSERT` once more. Crashes the server every time on the unfixed binary.
b	Root cause explained?	`commitPart` allocates a block number from the (now reset) ZK counter → gets `0` again → builds name `all_0_0_0` → `renameTempPartAndAdd` → `checkPartDuplicate` finds the surviving in-memory `all_0_0_0` → throws `DUPLICATE_DATA_PART` → the catch block rethrows `LOGICAL_ERROR`, which aborts the server in debug/sanitizer builds. The counter reset (Keeper metadata loss / replica re-creation / DROP-race) under a surviving local part is the trigger.
c	Fix matches root cause?	Yes. The collision is a recoverable replica inconsistency, not an impossible bug. The fix stops aborting: it enqueues a part check to reconcile the stale part and fails the INSERT with a normal `DUPLICATE_DATA_PART` instead of `LOGICAL_ERROR`.
d	Test intent preserved / new tests added?	New stateless test `04411_replicated_insert_block_number_counter_reset` added, reproducing the crash and asserting the INSERT fails cleanly (`DUPLICATE_DATA_PART`) while the server stays alive and data is intact.
e	Both directions demonstrated?	Yes. Unfixed binary (Build ID `84462c90…`): test FAILS — server crashes (`Connection refused`), output `NO_EXPECTED_ERROR`. Fixed binary (Build ID `e58fd7c4…`): test PASSES — output `DUPLICATE_DATA_PART` then `6`, server alive.
f	Fix is general across code paths?	The throw site is the single place in `ReplicatedMergeTreeSink::commitPart` that converts this collision into `LOGICAL_ERROR`. The plain-`MergeTree` insert path (`MergeTreeSink`) does not abort on a duplicate; only this replicated path did. No sibling replicated-sink path has the same abort.
g	Fix generalizes across inputs (params/datatypes/wrappers)?	N/A — the change is in control flow on the part-collision error path; it is independent of column types, partition expressions, and insert settings (the repro and behavior do not depend on data types or wrappers).
h	Backward compatible? (maintainer-approved exception only)	Yes. No setting, on-disk, wire, or metadata format change. Behavior changes only on the previously-fatal path: a server abort becomes a regular query error. No `SettingsChangesHistory.cpp` change needed.
i	Invariants and contracts preserved?	Yes. The `MergeTreeData::Transaction` still rolls the new part out of the working set on the thrown exception (unchanged); `enqueuePartForCheck` only appends to a mutex-guarded in-memory queue (no synchronous ZK ops, does not throw). The retry loop's `catch (DB::Exception &)` still routes the thrown `DUPLICATE_DATA_PART` to the ERROR stage.

Session id: cron:clickhouse-worker-slot-1:20260626-015400

groeneai · 2026-06-26T03:02:50Z

cc @tavplubix @CheSema — could you review this? It stops ReplicatedMergeTreeSink::commitPart from aborting the server with a LOGICAL_ERROR ("already written by concurrent request") when the ZooKeeper block-number counter is reset under a surviving local part (Keeper metadata loss / replica re-creation / DROP-race); the INSERT now fails with a regular DUPLICATE_DATA_PART and the conflicting part is enqueued for a check. @tavplubix, this is the same class you diagnosed on #42120 as a "race with DROP".

clickhouse-gh · 2026-06-26T09:35:39Z

Workflow [PR], commit [de9d136]

Summary: ❌

Performance Comparison: Performance dashboard

job_name	test_name	status	info
Stress test (amd_debug)		FAIL
	Cannot start clickhouse-server	FAIL	cidb
	Logical error: 'Unexpected exception in refresh scheduling' (STID: 2508-34af)	FAIL	cidb
	Check failed	FAIL	cidb
Stress test (amd_asan_ubsan)		FAIL
	Hung check failed, possible deadlock found	FAIL	cidb, issue
Stress test (amd_tsan)		FAIL
	Hung check failed, possible deadlock found	FAIL	cidb, issue

AI Review

Summary

This PR changes ReplicatedMergeTreeSink::commitPart so a final-name collision after allocating a new replicated block number is handled as a recoverable replica inconsistency: the conflicting part is queued for checking and the insert fails with DUPLICATE_DATA_PART instead of rethrowing LOGICAL_ERROR. The regression test now rewinds the ZooKeeper counter to the surviving part's actual block number and disables insert Keeper fault injection on the two inserts that must be deterministic. I did not find correctness issues requiring new inline review comments.

Missing context / blind spots

⚠️ There is no local build* directory in this checkout, so I did not run 04411_replicated_insert_block_number_counter_reset locally. At review time, the Praktika report had no failed checks, but several GitHub/Praktika jobs were still pending.

Final Verdict

Status: ✅ Approve
No code changes requested from this review; wait for the pending CI jobs before merge.

ReplicatedMergeTreeSink::commitPart allocates a new block number, builds the part name from it, and adds the part to the working set via renameTempPartAndAdd. If that throws DUPLICATE_DATA_PART/PART_IS_TEMPORARILY_LOCKED (the freshly allocated name already exists locally), the sink treated it as an impossible condition and threw LOGICAL_ERROR, which aborts the server in debug and sanitizer builds. This is reachable without any bug: the ZooKeeper block-number counter can be reset out from under a still-present local part - Keeper metadata loss, replica re-creation, or a DROP/lost-replica race. The counter then restarts and hands out a number whose part survived locally, so the next INSERT collides with it. SYSTEM RESTORE REPLICA avoids this by moving all parts to detached/ before recreating ZooKeeper metadata (see the comment in restoreMetadataInZooKeeper), but the collision is still reachable through other paths, and it was reported historically (issues ClickHouse#28848, ClickHouse#42120, ClickHouse#56524) as a "race with DROP". Handle it as a recoverable replica inconsistency: enqueue a part check to reconcile the surviving part and fail the INSERT with a normal DUPLICATE_DATA_PART error instead of aborting the server. Found by the BuzzHouse fuzzer (amd_msan), STID 3744-4a1b. The added stateless test resets the partition block-number counter under a surviving local part and asserts the INSERT fails cleanly while the server stays alive. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

groeneai · 2026-06-26T14:08:02Z

Fixed the Style check failure: the regression test built the ReplicatedMergeTree zookeeper path through a shell variable, so the check requiring $CLICKHOUSE_TEST_ZOOKEEPER_PREFIX on the ReplicatedMergeTree(...) line did not match. The prefix is now inlined on the CREATE line (parallel-safe), and the keeper-client commands derive the path from system.replicas.zookeeper_path so there is one source of truth. No C++ change; the fix and the reset-counter repro are unchanged.

The test inserted six parts (expecting blocks 0..5), then reset the ZK block-number counter and relied on the next insert re-issuing block 0 to collide with the surviving all_0_0_0. Under CI load this was flaky: a background merge could fire during the six inserts, and block-number allocation was not guaranteed to start at 0, so block 0 was sometimes never occupied by a surviving part and the forced collision missed (NO_EXPECTED_ERROR, 7 rows). Insert a single row (deterministically all_0_0_0) with SYSTEM STOP MERGES so the part keeps block 0 and cannot be merged away, guard the precondition by printing the surviving part name, then reset and re-insert. The collision is now deterministic. Verified: crashes the pre-fix server at commitPart, passes 150/150 flaky-check iterations with the fix. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

groeneai · 2026-06-26T15:53:47Z

Fixed the flaky check failure on the test this PR adds (amd_msan/amd_tsan, 4+1 FAILs).

Root cause (from the failing run's part_log): the test inserted six parts expecting block numbers 0..5, reset the ZK counter, and relied on the next insert re-issuing block 0 to collide with the surviving all_0_0_0. Under CI load that was racy:

a background merge fired during the six inserts (all_1_5_1 merging blocks 1..5), and
block-number allocation was not guaranteed to start at 0 — in the failing iteration the six setup inserts produced all_1_1_0..all_6_6_0, so block 0 was never occupied by a surviving part. After the reset the 7th insert got block 0, hit no collision, and succeeded → NO_EXPECTED_ERROR, 7 rows.

Fix (test-only): insert a single row (deterministically all_0_0_0) with SYSTEM STOP MERGES so the part keeps block 0 and can't be merged away, assert the surviving part name as a precondition guard, then reset and re-insert. The forced collision is now deterministic.

Verified locally on a debug build:

crashes the pre-fix server (LOGICAL_ERROR at ReplicatedMergeTreeSink::commitPart), and
passes 150/150 --flaky-check iterations with randomized settings on the fixed binary.

The source fix and its intent (must crash before, must fail with DUPLICATE_DATA_PART after) are unchanged.

…izer builds Reword the test description and inline comment so they do not claim a generic server abort/crash. The LOGICAL_ERROR is a catchable exception in release builds and only aborts the server in debug/sanitizer builds (abort_on_logical_error). Comment-only change; no logic or behavior change. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…sion The test resets the partition block-number counter (rmr + touch of block_numbers/all) so the next INSERT re-issues block number 0 and collides with the surviving local part. keeper-client can transiently fail under parallel CI load, in which case the single-shot reset silently does not take effect (its errors are redirected to /dev/null): the counter stays advanced, the next INSERT gets block 1, no collision happens, and the test fails with NO_EXPECTED_ERROR (observed ~1/29 in the flaky check). Retry the reset until system.zookeeper confirms the counter cversion is back at zero before the colliding INSERT. The block number handed out is derived from that cversion, so verifying it is 0 makes the collision deterministic without weakening the assertion (still asserts DUPLICATE_DATA_PART and that the server stays alive). Reading cversion does not bump it, and with merges stopped on a single quiescent part nothing else touches the node. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

groeneai · 2026-06-26T19:31:28Z

Made the regression test reliable. The flaky-check failures (NO_EXPECTED_ERROR, ~1/29) came from the counter reset itself: keeper-client can transiently fail under parallel load, the single-shot rmr+touch silently did not take effect (errors went to /dev/null), so the next INSERT got block 1 and did not collide. Now the reset retries until system.zookeeper confirms block_numbers/all cversion is back at 0 before the colliding INSERT. The block number is derived from that cversion, so the collision is deterministic; the assertion is unchanged (DUPLICATE_DATA_PART + server stays alive). Validated against a locally rebuilt unfixed binary (test FAILS, server aborts) and the fixed binary (180/180 runs pass with randomization on).

… block The previous fix reset the counter to 0 and verified cversion == 0, on the assumption that the single setup INSERT always produces all_0_0_0 (block 0). It does not: block-number allocation on a fresh ReplicatedMergeTree is not guaranteed to start at 0 under load, so ~3% of runs the surviving part lands on block 1 (all_1_1_0). Resetting the counter to 0 then made the colliding INSERT get block 0, which does NOT name-collide with all_1_1_0, so no DUPLICATE_DATA_PART was raised and the test failed with NO_EXPECTED_ERROR / 2 rows. CIDB confirmed this: the failing runs' first output line itself diffed all_0_0_0 -> all_1_1_0, with max_insert_threads=1. Read the surviving part's actual block number and advance the recreated counter to it (cversion == BLOCK) so the next allocation re-issues exactly that block number and the collision is deterministic regardless of where the setup INSERT landed. The cversion-verify retry loop (handling transient keeper-client failures) is kept. The non-deterministic part-name print is dropped from the output (it was the line that diffed); the test still asserts DUPLICATE_DATA_PART and that the server stays alive with no data loss. Verified locally: deterministic block-0 and forced-block-1 scenarios both collide (DUPLICATE_DATA_PART), and 50/50 runs pass with full randomization + thread fuzzer. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

groeneai · 2026-06-26T20:18:51Z

Pushed ec42119 correcting the test fix. The previous commit's root-cause was incomplete.

Corrected root cause (proven deterministically locally): the single setup INSERT is NOT guaranteed to produce all_0_0_0. Block-number allocation on a fresh ReplicatedMergeTree does not always start at 0 under load, so ~3% of runs the surviving part lands on block 1 (all_1_1_0). The previous fix reset the counter to 0 (verifying cversion == 0), so the colliding INSERT got block 0, which does NOT name-collide with all_1_1_0 → NO_EXPECTED_ERROR / 2 rows. CIDB confirms it: the failing runs' first output line itself diffed all_0_0_0 → all_1_1_0, with max_insert_threads 1 (so this is not an insert-thread split). Verifying cversion == 0 could never fix this — when the survivor is at block 1 you need the counter at 1, not 0.

New fix: read the surviving part's actual min_block_number B, then advance the recreated counter to B (cversion == B) so the next allocation re-issues exactly B and collides regardless of where the setup INSERT landed. The cversion-verify retry loop (for transient keeper-client failures) is kept. The non-deterministic part-name print is dropped from the output (it was the diffing line); the test still asserts DUPLICATE_DATA_PART + server-alive + no data loss, and still exercises the commitPart fix (server log shows the new DUPLICATE_DATA_PART path + enqueuePartForCheck for both all_0_0_0 and all_1_1_0 survivors).

Validated on a local debug build (external keeper, thread fuzzer on): deterministic block-0 and forced-block-1 scenarios both collide; 50/50 runs pass with full randomization.

groeneai · 2026-06-26T20:19:08Z

Pre-PR validation gate (click to expand)

#	Question	Answer
a	Deterministic repro?	Yes. Forcing the surviving part to block 1 (insert twice, `DROP PART all_0_0_0`) + the old reset-to-0 logic reproduces `NO_EXPECTED_ERROR` / count=2 every run.
b	Root cause explained?	The single setup INSERT is not guaranteed to land on block 0; under load a background allocation consumes block 0 first, so the survivor is `all_1_1_0`. Resetting the counter to 0 then makes the colliding INSERT get block 0 (`all_0_0_0`), which does not name-collide with `all_1_1_0`, so no `DUPLICATE_DATA_PART`.
c	Fix matches root cause?	Yes. Reads the survivor's actual block B and advances the recreated counter to B so the next allocation re-issues exactly B and collides regardless of where the setup INSERT landed.
d	Test intent preserved / new tests added?	Yes. Still asserts `DUPLICATE_DATA_PART` + server-alive + no data loss, and still drives `commitPart`'s new path (verified via server log: new error msg + `enqueuePartForCheck`). Only the non-deterministic part-name print (the diffing line) was removed. Assertion not weakened.
e	Both directions demonstrated?	Yes. Old reset-to-0 logic with survivor at block 1 → `NO_EXPECTED_ERROR`/2 (the flake); new align-to-B logic → `DUPLICATE_DATA_PART`/1. Pre-fix sink binary aborts on the same path (LOGICAL_ERROR).
f	Fix is general across code paths?	N/A (test-only change; the source fix in `ReplicatedMergeTreeSink.cpp` is unchanged in this commit).
g	Fix generalizes across inputs?	Yes for the test: it no longer assumes a fixed block number, so it is correct whether the survivor lands on block 0, 1, or higher.
h	Backward compatible?	N/A (test-only change, no behavior/format/setting change).
i	Invariants and contracts preserved?	N/A (test-only change). The collision uses the documented invariant that the next allocated block number equals the `cversion` of `block_numbers/<partition>`.

Session id: cron:clickhouse-worker-slot-0:20260626-181100

…laky check The test forces a block-number collision: insert one row (surviving part), roll the ZooKeeper block-number counter back, then insert again so the new part re-issues an already-used block number and fails with DUPLICATE_DATA_PART. It flaked ~1-2% because stateless tests run with insert_keeper_fault_injection_probability=0.01 by default (tests/config/users.d/insert_keeper_retries.xml). A simulated Keeper fault during an insert's block-number allocation triggers a retry, and the retry allocates a HIGHER block number. That breaks the forced collision two ways: a fault on the setup insert moves the surviving part off its expected block, and a fault on the colliding insert skips it past the surviving block. Either way no exact-name collision happens, the insert succeeds, and the test sees NO_EXPECTED_ERROR with an extra row. Pin the setting to 0 on both inserts so block-number allocation is deterministic. This targets exactly the setting responsible rather than disabling all randomization. The surviving part's block is still read and the counter aligned to it, so the collision holds even if that block is non-zero. The source fix and the .reference are unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

groeneai · 2026-06-26T22:36:54Z

The flaky check kept recurring (last on ec421199) because the earlier fixes targeted the wrong cause. Found and proved the real one.

Root cause: stateless tests run with insert_keeper_fault_injection_probability=0.01 by default (tests/config/users.d/insert_keeper_retries.xml, installed unconditionally). A simulated Keeper fault during an INSERT's block-number allocation triggers a retry, and the retry allocates a higher block number (the number = the parent node's sequential counter, bumped on every child-create). That breaks the forced collision two ways: a fault on the setup INSERT moves the surviving part off its expected block (the all_1_1_0 survivor seen in failing reports), and a fault on the colliding INSERT skips it past the surviving block. Either way no exact-name collision -> the INSERT succeeds -> NO_EXPECTED_ERROR + extra row.

The previous "background allocator consumes block 0" theory was refuted locally: with fault injection off, the setup INSERT lands on block 0 in 40/40 runs.

Fix (test-only, source unchanged): pin insert_keeper_fault_injection_probability=0 on both INSERTs. This targets exactly the responsible setting (not a blanket no-random-settings); the surviving part's block is still read and the counter aligned to it, so the collision holds even if that block is non-zero. .reference unchanged.

Pre-PR validation gate (click to expand)

#	Question	Answer
a	Deterministic repro?	Yes. With `--insert_keeper_fault_injection_probability 0.1` on the colliding INSERT: 7/40 fail, signature `surviving=all_0_0_0 BLOCK=0 res=NO_EXPECTED_ERROR cnt=2 parts=all_0_0_0,all_1_1_0` (matches CI).
b	Root cause explained?	Yes. Block number = parent seqNum() (KeeperStorage.cpp:1681), bumped per child-create. A faulted+retried allocation creates a new ephemeral-sequential node -> higher seqNum -> higher block -> no collision.
c	Fix matches root cause?	Yes. Disabling fault injection on both INSERTs removes the retry-to-higher-block, so allocation is deterministic.
d	Test intent preserved?	Yes. Still forces the collision, still asserts DUPLICATE_DATA_PART + server-alive + 1 row. No assertion weakened; the align-to-B reset logic is kept.
e	Both directions demonstrated?	Yes. Pre-fix binary aborts at commitPart (verified earlier on this PR). Fixed test: real `.sh` file 50/50 PASS with CI-default fault 0.01 active; 29/29 PASS at stressed fault 0.2 (manual server then externally stopped, no crash).
f	Fix general across code paths?	N/A (flaky test setting fix, not a code bug).
g	Fix generalizes across inputs?	N/A (flaky test setting fix, not a code bug).
h	Backward compatible?	N/A (flaky test setting fix, not a code bug).
i	Invariants and contracts preserved?	N/A (flaky test setting fix, not a code bug).

Session id: cron:clickhouse-worker-slot-3:20260626-213900

clickhouse-gh · 2026-06-27T02:00:11Z

LLVM Coverage Report

Metric	Baseline	Current	Δ
Lines	85.40%	85.40%	+0.00%
Functions	92.60%	92.60%	+0.00%
Branches	77.60%	77.60%	+0.00%

Changed lines: Changed C/C++ lines covered by tests: 8/8 (100.00%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

groeneai · 2026-06-27T02:55:56Z

CI finish ledger - `de9d136`

Every failure below has an owner: a fixing PR (ours or external). Only CH Inc sync is exempt. None are PR-caused: our new test 04411_replicated_insert_block_number_counter_reset is GREEN on this head (the insert_keeper_fault_injection_probability=0 pin held); the Stress failures are pre-existing chronic trunk crashes, and this branch was forked at 474a3c12c67 and predates the merged refresh-scheduling fix.

Check / test	Reason	Owner / fixing PR
Stress test (amd_tsan) / Hung check, possible deadlock	deadlock (chronic)	#105905 / #101680 (ours, open)
Stress test (amd_asan_ubsan) / Hung check, possible deadlock	deadlock (chronic)	#105905 / #101680 (ours, open)
Stress test (amd_debug) / Unexpected exception in refresh scheduling (STID 2508-34af)	crash (chronic RefreshTask family, 143 hits / 116 PRs / 11 master, 30d)	#105588 (merged 2026-06-26; this build predates it - diverged base) / #108441 (ours, open)
Stress test (amd_debug) / Cannot start clickhouse-server, Check failed	server died from the 2508-34af LOGICAL_ERROR above (collateral)	#105588 / #108441
CH Inc sync / Mergeable Check / PR	rollups	CH Inc sync (private, not actionable)

Session id: cron:our-pr-ci-monitor:20260627-023000

tavplubix · 2026-06-29T11:27:43Z

+                                "Part with name {} already exists locally for a newly allocated block number. "
+                                "The ZooKeeper block-number counter is inconsistent with local parts "
+                                "(it may have been reset by Keeper metadata loss or replica re-creation). "
+                                "The conflicting part will be checked and reconciled; retry the insert",


@groeneai this should not be possible, and the LOGICAL_ERROR is supposed to catch bugs like this. Please always try to find the root cause instead of removing assertions that check invariants. If it's a "Keeper metadata loss" - you must investigate the metadata loss.

You are right, I will stop softening the assertion (you added it in 00c9e50). I traced the BuzzHouse repro and it is a real metadata inconsistency, not external Keeper loss. Single node, one table UUID e061cf95, in-server SQL only:

ENGINE=MergeTree (plain), local parts accumulate; all_74_74_0 written on disk.

ATTACH TABLE ... AS REPLICATED -> read-only (No metadata in ZooKeeper), old parts incl. all_74_74_0 present.

SYSTEM RESTORE REPLICA -> detaches all parts, resets the counter, re-attaches them compacted to all_0_0_0 .. all_7_7_0 (the old all_74_74_0 is re-attached as all_7_7_0).

Background ReplicatedMergeTreePartCheckThread: Part all_74_74_0 in zookeeper: false, locally: false ... No replica has part covering all_74_74_0 ... Going to replace lost part all_74_74_0 with empty part -> creates an empty all_74_74_0 (block 74).

Later INSERT allocates block 74 from the reset counter, renames its part to all_74_74_0, and hits your assertion.

So RESTORE REPLICA leaves a stale reference to a part name (all_74_74_0) it just renamed away; the part-check thread then "restores" that name as a lost part at a block the reset counter will re-hand-out -> collision. The inconsistency originates in the restore/attach path not reconciling those block references (and, incidentally, my enqueuePartForCheck change fed the very thread that re-creates the phantom part). I will drop the sink change and look at fixing the restore path so the counter and ZK part references can't diverge like this. Does that match your expectation, or would you prefer the duplicate-name lost-part restoration be blocked instead?

groeneai mentioned this pull request Jun 26, 2026

Fix "Part X intersects part Y" in plain MergeTree merge selector #108549

Closed

CheSema self-assigned this Jun 26, 2026

clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Jun 26, 2026

nikitamikhaylov added the can be tested Allows running workflows for external contributors label Jun 26, 2026

groeneai force-pushed the groeneai/fix-rmt-insert-block-number-counter-reset-crash branch from b0d09f2 to 0076e74 Compare June 26, 2026 14:07

clickhouse-gh Bot reviewed Jun 26, 2026

View reviewed changes

Comment thread tests/queries/0_stateless/04411_replicated_insert_block_number_counter_reset.sh Outdated

groeneai and others added 2 commits June 26, 2026 16:29

tavplubix reviewed Jun 29, 2026

View reviewed changes

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

groeneai commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Description

Uh oh!

groeneai commented Jun 26, 2026

Uh oh!

groeneai commented Jun 26, 2026

Uh oh!

clickhouse-gh Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Missing context / blind spots

Final Verdict

Uh oh!

groeneai commented Jun 26, 2026

Uh oh!

groeneai commented Jun 26, 2026

Uh oh!

Uh oh!

groeneai commented Jun 26, 2026

Uh oh!

groeneai commented Jun 26, 2026

Uh oh!

groeneai commented Jun 26, 2026

Uh oh!

groeneai commented Jun 26, 2026

Uh oh!

clickhouse-gh Bot commented Jun 27, 2026

LLVM Coverage Report

Uh oh!

groeneai commented Jun 27, 2026

CI finish ledger - de9d136

Uh oh!

tavplubix Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

groeneai Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

groeneai commented Jun 26, 2026 •

edited

Loading

clickhouse-gh Bot commented Jun 26, 2026 •

edited

Loading

CI finish ledger - `de9d136`