{{ message }}
Fix flaky test_startup_without_zookeeper: retry the recursive ZooKeeper delete#108791
Merged
Merged
Conversation
`test_replication_without_zookeeper/test.py::test_startup_without_zookeeper` intermittently fails with `kazoo.exceptions.NotEmptyError` inside `drop_zk`, which calls `zk.delete(path="/clickhouse", recursive=True)`. The drop happens while `node1` (ClickHouse) is still running, so its `ReplicatedMergeTree` background threads keep re-creating ZooKeeper nodes under `/clickhouse`. kazoo's recursive delete races with that: between `get_children` and `delete` a new child node can appear, and kazoo raises `NotEmptyError` without retrying. The test already used `cluster.run_kazoo_commands_with_retries`, but with the default `repeats=1`. In that helper the retry loop is `for i in range(repeats - 1)`, so `repeats=1` performs zero retries — the callback runs exactly once with no exception handling. Pass `repeats=5` (matching the existing usages in `helpers/cluster.py`) so the drop is retried on the transient `NotEmptyError`. This is a pre-existing flaky test (same failure seen on unrelated PRs, e.g. #103540 and #101757), not a regression. CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=d40ea5d0da4103b54971ac01a9960ef9153242bb&name_0=MasterCI&name_1=Integration%20tests%20%28amd_asan_ubsan%2C%20db%20disk%2C%20old%20analyzer%2C%203%2F6%29 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Contributor
Contributor
LLVM Coverage ReportChanged lines: No C/C++ source files changed — skipping uncovered code analysis. Newly covered by added/modified tests: 1022 line(s), 97 function(s) across 155 file(s) · Details Top files
|
Algunenano
approved these changes
Jun 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Fixes a flaky integration test.
test_replication_without_zookeeper/test.py::test_startup_without_zookeeperintermittently fails withkazoo.exceptions.NotEmptyErrorwhile executingzk.delete(path="/clickhouse", recursive=True)indrop_zk.The recursive delete runs while ClickHouse (
node1) is still alive, so itsReplicatedMergeTreebackground threads keep re-creating ZooKeeper nodes under/clickhouse. kazoo's recursive delete races with that: betweenget_childrenanddeletea new child node can appear, and kazoo raisesNotEmptyErrorwithout retrying.The test already called
cluster.run_kazoo_commands_with_retries, but with the defaultrepeats=1. In that helper the retry loop isfor i in range(repeats - 1), sorepeats=1performs zero retries — the callback runs exactly once with no exception handling. Passrepeats=5(matching the existing usages inhelpers/cluster.py) so the drop is retried on the transientNotEmptyError.This is a pre-existing flaky test, not a regression — the same
NotEmptyErrorindrop_zkhas been observed on unrelated pull requests.Related: #103540
Related: #101757
CI report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?REF=master&sha=d40ea5d0da4103b54971ac01a9960ef9153242bb&name_0=MasterCI&name_1=Integration%20tests%20%28amd_asan_ubsan%2C%20db%20disk%2C%20old%20analyzer%2C%203%2F6%29
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Not required.
🤖 Generated with Claude Code
Version info
26.7.1.240