Fix flaky 03518_alter_logical_race: unthrottle the MODIFY COLUMN mutation#108060
Conversation
…tion
The test stresses concurrent `ALTER`/`INSERT` against a `ReplicatedMergeTree`
table. Its `MODIFY COLUMN ... UInt64` step is a data-rewriting mutation, and on a
busy CI host that mutation gets starved by the two MergeTree throttles that defer
mutations in favour of merges: `max_replicated_mutations_in_queue` (default 8) and
`number_of_free_entries_in_pool_to_execute_mutation` (default 20). Under the test's
heavy concurrent `INSERT` load there are always merges to do, so the data part of
the very first `MODIFY` never gets a turn.
Replicated metadata alters must finish in strict version order, and an alter does
not leave the alter-version chain until its data mutation completes. So a starved
data mutation at the head of the chain blocks every later `MODIFY`'s metadata
change, which then waits (with the default `alter_sync = 1`) until the table is
dropped at teardown. The `timeout` wrapping each statement fires first and fails
the test with a spurious `TIMEOUT` / `NO ALTER PROGRESS`, even though nothing is
hung - the mutation just never gets scheduled.
The server log of the failing run shows this exactly: mutation `0000000000` (alter
version 3) is created but never executes ("won't select new parts to merge or
mutate" because queued mutations reached `max_replicated_mutations_in_queue (8)`),
and the next `MODIFY`'s metadata entry is rejected 309044 times with "Cannot
execute alter metadata ... because another alter 3 must be executed before".
Raise `max_replicated_mutations_in_queue` and set
`number_of_free_entries_in_pool_to_execute_mutation = 0` on the test table so the
data mutation is no longer deferred. The two `ALTER` workers issue their `MODIFY`s
synchronously, so in-flight mutations stay naturally low; this only removes the
head-of-chain stall. Verified by running the test repeatedly under saturating CPU
load: it now completes real `ALTER` cycles in ~12-15s instead of hanging to the
180s hard cap.
CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=2a3a502d2ed0af65bb0a5c91223c91c1b2e28047&name_0=MasterCI&name_1=Stateless%20tests%20%28arm_binary%2C%20parallel%29
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
| $CLICKHOUSE_CLIENT -q "DROP TABLE IF EXISTS alter_table" | ||
| $CLICKHOUSE_CLIENT -q "CREATE TABLE alter_table (a UInt8, b UInt8, c UInt8, d UInt8, e UInt8, f UInt8, g UInt8) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{database}/test_03518/alter_table', 'r1') ORDER BY a PARTITION BY b % 10 SETTINGS old_parts_lifetime = 1" | ||
| # `max_replicated_mutations_in_queue` and `number_of_free_entries_in_pool_to_execute_mutation` | ||
| # both defer the data-rewriting mutation that `MODIFY COLUMN ... UInt64` creates, in favour of |
There was a problem hiding this comment.
It's a comment pretty much from the AI to AI.
A short description would be more useful. Nobody reads such comments.
There was a problem hiding this comment.
Done in 823f09f — replaced the long explanation with a short description (kept the CI report link).
| # `MODIFY`s synchronously, so in-flight mutations stay naturally low; disabling these throttles only | ||
| # removes the head-of-chain stall, it does not let mutations pile up. See | ||
| # https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=2a3a502d2ed0af65bb0a5c91223c91c1b2e28047&name_0=MasterCI&name_1=Stateless%20tests%20%28arm_binary%2C%20parallel%29 | ||
| $CLICKHOUSE_CLIENT -q "CREATE TABLE alter_table (a UInt8, b UInt8, c UInt8, d UInt8, e UInt8, f UInt8, g UInt8) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{database}/test_03518/alter_table', 'r1') ORDER BY a PARTITION BY b % 10 SETTINGS old_parts_lifetime = 1, max_replicated_mutations_in_queue = 100000, number_of_free_entries_in_pool_to_execute_mutation = 0" |
There was a problem hiding this comment.
A nice trick is structured SQL heredoc
Easier to understand diffs in long queries
| $CLICKHOUSE_CLIENT -q "CREATE TABLE alter_table (a UInt8, b UInt8, c UInt8, d UInt8, e UInt8, f UInt8, g UInt8) ENGINE = ReplicatedMergeTree('/clickhouse/tables/{database}/test_03518/alter_table', 'r1') ORDER BY a PARTITION BY b % 10 SETTINGS old_parts_lifetime = 1, max_replicated_mutations_in_queue = 100000, number_of_free_entries_in_pool_to_execute_mutation = 0" | |
| $CLICKHOUSE_CLIENT << SQL | |
| CREATE TABLE alter_table (a UInt8, b UInt8, c UInt8, d UInt8, e UInt8, f UInt8, g UInt8) | |
| ENGINE = ReplicatedMergeTree('/clickhouse/tables/{database}/test_03518/alter_table', 'r1') | |
| ORDER BY a | |
| PARTITION BY b % 10 | |
| SETTINGS old_parts_lifetime = 1, | |
| max_replicated_mutations_in_queue = 100000, | |
| number_of_free_entries_in_pool_to_execute_mutation = 0 | |
| SQL |
There was a problem hiding this comment.
Done in 823f09f — rewrote the CREATE TABLE as a structured SQL heredoc as suggested. Thanks!
Per @Felixoid's review on PR #108060: - Replace the long explanatory comment above the `CREATE TABLE` with a short description (the verbose version read as "AI to AI"; nobody reads it). - Rewrite the `CREATE TABLE` as a structured SQL heredoc so the settings and clauses produce a readable diff. No behavior change: the `ReplicatedMergeTree` definition and the `max_replicated_mutations_in_queue` / `number_of_free_entries_in_pool_to_execute_mutation` settings are identical. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LLVM Coverage ReportChanged lines: No C/C++ source files changed — skipping uncovered code analysis. Newly covered by added/modified tests: 523 line(s), 30 function(s) across 135 file(s) · Details Top files
|

The flaky test
03518_alter_logical_racestresses concurrentALTER/INSERTagainst aReplicatedMergeTreetable. ItsMODIFY COLUMN ... UInt64step is a data-rewriting mutation. On a busy CI host (it fails onStateless tests (arm_binary, parallel), ~7 times/day on master) that mutation gets starved by the two MergeTree throttles that defer mutations in favour of merges:max_replicated_mutations_in_queue(default 8) andnumber_of_free_entries_in_pool_to_execute_mutation(default 20). Under the test's heavy concurrentINSERTload there are always merges to run, so the data part of the very firstMODIFYnever gets a turn.Replicated metadata alters must finish in strict version order, and an alter does not leave the alter-version chain until its data mutation completes. So a starved data mutation at the head of the chain blocks every later
MODIFY's metadata change, which then waits (with the defaultalter_sync = 1) until the table is dropped at teardown. Thetimeoutwrapping each statement fires first and fails the test with a spuriousTIMEOUT/NO ALTER PROGRESS, even though nothing is hung — the mutation just never gets scheduled.The server log of the failing run shows this exactly: mutation
0000000000(alter version 3) is created but never executes (won't select new parts to merge or mutatebecause queued mutations reachedmax_replicated_mutations_in_queue (8)), and the nextMODIFY's metadata entry is rejected 309044 times withCannot execute alter metadata ... because another alter 3 must be executed before.The fix raises
max_replicated_mutations_in_queueand setsnumber_of_free_entries_in_pool_to_execute_mutation = 0on the test table so the data mutation is no longer deferred. The twoALTERworkers issue theirMODIFYs synchronously, so in-flight mutations stay naturally low; this only removes the head-of-chain stall, it does not let mutations pile up. Verified by running the test repeatedly under saturating CPU load: it now completes realALTERcycles in ~12–15s instead of hanging to the 180s hard cap.CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=2a3a502d2ed0af65bb0a5c91223c91c1b2e28047&name_0=MasterCI&name_1=Stateless%20tests%20%28arm_binary%2C%20parallel%29
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Not for changelog — fixes flakiness of the stateless test
03518_alter_logical_race.Documentation entry for user-facing changes
🤖 Generated with Claude Code