Fix logical error exception in getConsistentMetadataSnapshotImpl when log pointer goes backwards#100651
Fix logical error exception in getConsistentMetadataSnapshotImpl when log pointer goes backwards#100651alexey-milovidov wants to merge 85 commits into
Conversation
…en log pointer goes backwards The `chassert(max_log_ptr == new_max_log_ptr)` assertion in `getConsistentMetadataSnapshotImpl` fires when the database is recreated between the initial `max_log_ptr` read and a subsequent loop iteration, causing the log pointer to go backwards. Handle this case gracefully by resetting `max_log_ptr` to the new (smaller) value and retrying. https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100641&sha=c7c8447120affec1e60b8c7c6380dab230aa9439&name_0=PR&name_1=Stateless%20tests%20%28amd_debug%2C%20parallel%29 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test concurrently runs backups and drops/recreates a `DatabaseReplicated`, triggering the code path where `max_log_ptr` goes backwards in `getConsistentMetadataSnapshotImpl`. Before the fix this caused a logical error exception in debug builds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test was failing because `BACKUP ... ASYNC` outputs a result table (id, status) to stdout, and `SYSTEM UNFREEZE` can also produce output. Only stderr was being suppressed, so the extra stdout was mismatching the reference file which expects only "1". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…p IDs per worker The `CREATE DATABASE ... ENGINE = Replicated(...)` outputs shard/replica info to stdout, which leaked into the test output causing a reference mismatch. Redirect both stdout and stderr to /dev/null for all DDL commands. Also include `$BASHPID` in backup IDs to avoid collisions between the two concurrent `do_backups` workers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The test now checks that `system.backups` contains at least one entry matching the test prefix, ensuring the stress scenario actually exercises the target race condition rather than silently passing when all backup attempts fail. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… failure Under TSan, the concurrent `DROP DATABASE` prevents all backups from being recorded in `system.backups` within the timeout. The real assertion of this test is server liveness (no logical error exception), not that backups succeeded, so downgrade the "no backups attempted" check to a warning. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…trying When `max_log_ptr` moves backwards in `getConsistentMetadataSnapshotImpl`, it means the database was dropped and recreated as another instance. Silently retrying with the new pointer would back up a different database than the caller asked for, which is not what we want. Fail the operation cleanly with `CANNOT_GET_REPLICATED_DATABASE_SNAPSHOT`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ilure The test wrote "Backups attempted: $BACKUP_COUNT" to stderr, which causes `clickhouse-test` to mark the test as failed regardless of exit code. Since the real assertion of this test is server liveness (via the final `SELECT 1`), the informational echo is not needed. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100651&sha=696e74894edee8cefebb95f235644482350ef933&name_0=PR&name_1=Stateless%20tests%20%28arm_asan_ubsan%2C%20flaky%20check%29 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…n backups The test previously only asserted server liveness (`SELECT 1`), so a regression of the original `chassert(max_log_ptr == new_max_log_ptr)` failure could go undetected: the `chassert` raises a `LOGICAL_ERROR` exception that is silently swallowed by `BACKUP ... ASYNC`, and the test could pass even if the regression path was never exercised at all. Now the test: - counts successful `DROP DATABASE` cycles in `drop_and_recreate` and asserts that at least one full recreate completed, otherwise `max_log_ptr` cannot have moved backwards and the regression path is untested - queries `system.backups` for any backup whose `error` contains `LOGICAL_ERROR` and asserts the count is `0` Addresses review feedback on #100651
…reate, wait for backups Under randomized settings (flaky check), the previous concurrent `drop_and_recreate` background loop never completed a full `DROP DATABASE`+recreate cycle within the 10 second timeout, so the test deterministically printed `not recreated: 0` instead of the expected `recreated`. Restructure the test: - Drive recreates in the foreground, synchronously. Backup workers stay in the background, so the race against `BACKUP ... ASYNC` is preserved while recreate cycles are guaranteed to make progress. - Drop the 5-table `create_and_populate` loop down to a single table; one table is enough to advance `max_log_ptr`. - Bump `TIMEOUT` from 10s to 30s. - After the stress loops, poll `system.backups` until no row for this test's prefix is in `CREATING_BACKUP` before checking for `LOGICAL_ERROR`. `BACKUP ... ASYNC` only waits for submission, so the previous code could read `system.backups` while in-flight backups still had a chance to hit the regression path. Addresses reviewer feedback. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100651&sha=cfdb1b3242dcfd23a58aaf77cd2eeb22e3cfab82&name_0=PR&name_1=Stateless%20tests%20%28amd_asan_ubsan%2C%20flaky%20check%29 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Merged current The 4 reds on the previous run (
None are related to this change, which touches only |
|
The only red on this PR is Root cause is benign teardown noise: on macOS the This is a low-frequency,
Failed job: https://github.com/ClickHouse/ClickHouse/actions/runs/27466341242/job/81199164563 @groeneai, could you investigate this |
|
@alexey-milovidov Confirmed and fixed in a separate PR: #107438. Root cause is exactly as you described. CIDB shows the same stack on all three Fix downgrades only |
|
Merged current
Verified: all three changed translation units ( The previous run on |
test_concurrent_reads runs 25 concurrent SELECT * queries against one accumulating Iceberg table while 15 Spark writers commit in parallel, for 10 rounds. By default each SELECT fans out to ~max_threads (number of CPU cores) internal read threads, so 25 simultaneous readers request hundreds of threads at once. On a loaded CI shard this oversubscribes the global thread pool: a single SELECT either gets starved past the 300s client receive_timeout (Return code 159, seen on the asan/db-disk/old-analyzer 5/6 shard for PRs ClickHouse#100651 and ClickHouse#108049 and on master) or the server rejects the query outright with CANNOT_SCHEDULE_TASK (Code 439, threads=1000, jobs=1000). Pin max_threads=4 on the concurrent SELECT so the readers' aggregate thread demand stays bounded. This does not change what the test verifies: all 25 readers still race all 15 writers across every round and the exact row-count assertion is unchanged. Only the per-query intra-read parallelism is capped, which is irrelevant to concurrent-read snapshot consistency. Locally reproduced both directions on a debug build: pristine test fails at round 2 with CANNOT_SCHEDULE_TASK; with the cap it passes all 10 rounds (~150s).
…ptr-assertion # Conflicts: # src/Databases/DatabaseReplicated.cpp
|
Merged current Build: all three changed translation units ( CI failure (unrelated): The only red check is A fresh CI run is now in progress on |
|
Merged current Build: all three changed translation units ( CI failures (unrelated): the previous run ( Resolved the remaining AI Review verdict on |
|
Re-checked CI and reviews on CI: No AI Review verdict (
Full walk-through posted on the inline thread. The AI verdict on the byte-identical net diff one commit earlier ( No code pushed. |
|
Re-checked CI and reviews on No The single open review thread ( |
The AI review flagged a residual window in `getTablesForBackup`: if the backup collector retained a stale `DatabaseReplicated` object and a `DROP DATABASE`+recreate completes before the first `/max_log_ptr` read, the captured `czxid` belongs to the new database and all in-function identity checks compare new-to-new. This window cannot make the *written* backup substitute a different database, because `BackupEntriesCollector` bounds it across attempts: it clears `database_infos`/`table_infos` and re-fetches a fresh `DatabasePtr` from `DatabaseCatalog` on every attempt (so a stale object never carries across attempts), and `compareWithPrevious` requires two consecutive attempts to agree on both the database `CREATE` text and the full table set. A transitional attempt mixing the stale object's old `CREATE` (different UUID) with the recreated subtree's new tables disagrees with the next all-new attempt and is discarded; the loop converges to a consistent snapshot or fails with `INCONSISTENT_METADATA_FOR_BACKUP` at the deadline. Add a code comment capturing this invariant so the `czxid` pinning's scope (the narrower intra-operation window) is clear to future readers. No behavioral change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Addressed the AI review's " The new comment in CI: the only red was |
LLVM Coverage Report
Changed lines: Changed C/C++ lines covered: 108/163 (66.26%) · Uncovered code |

The
chassert(max_log_ptr == new_max_log_ptr)ingetConsistentMetadataSnapshotImplfires when aDatabaseReplicatedis dropped and recreated during a backup operation. The initialmax_log_ptris read from the old database, but after recreation the new database has a smaller log pointer, causing the assertion to fail.Throw
CANNOT_GET_REPLICATED_DATABASE_SNAPSHOTinstead, since the database that was asked to back up no longer exists; the new database (with the same name but different identity) should not be silently substituted.CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100641&sha=c7c8447120affec1e60b8c7c6380dab230aa9439&name_0=PR&name_1=Stateless%20tests%20%28amd_debug%2C%20parallel%29
#100641
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fixed a
LOGICAL_ERRORexception during backup of aReplicateddatabase that is being dropped and recreated concurrently; such backups now fail cleanly withCANNOT_GET_REPLICATED_DATABASE_SNAPSHOT.Documentation entry for user-facing changes
Note
Medium Risk
Changes snapshotting logic used by
DatabaseReplicatedbackup and recovery to detect concurrentDROP DATABASE+recreate via Keeperczxid/log-pointer checks; incorrect handling could cause backup/recovery failures or wrong metadata selection, so it needs careful concurrency coverage but is scoped to this race.Overview
Prevents
DatabaseReplicatedmetadata snapshots (used by backups and lost-replica recovery) from crashing or silently reading from a recreated database instance when a concurrentDROP DATABASEremoves and re-creates the same Keeper subtree.getConsistentMetadataSnapshotImplnow pins database identity using/max_log_ptr’sczxid, converts multiple KeeperZNONODE/rollback situations into a consistentCANNOT_GET_REPLICATED_DATABASE_SNAPSHOTerror (replacing the previouschassert/retry-exhaustion behavior), and forwards an optional expectedczxidfrom callers (getTablesForBackup,DatabaseReplicatedWorker::initializeReplication). A newPAUSEABLE_ONCEfailpoint plus a stateless test were added to deterministically reproduce the drop+recreate race during backup.Reviewed by Cursor Bugbot for commit 3439122. Bugbot is set up for automated code reviews on this repo. Configure here.