Fix possible 'associated promise has been destructed' in TestKeeper by vdimir · Pull Request #73570 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix possible 'associated promise has been destructed' in TestKeeper#73570

Merged
vdimir merged 3 commits into
masterfrom
vdimir/test_keeper_promise
Dec 27, 2024
Merged

Fix possible 'associated promise has been destructed' in TestKeeper#73570
vdimir merged 3 commits into
masterfrom
vdimir/test_keeper_promise

Conversation

@vdimir

@vdimir vdimir commented Dec 18, 2024

Copy link
Copy Markdown
Member

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

...

@robot-clickhouse

robot-clickhouse commented Dec 18, 2024

Copy link
Copy Markdown
Member

@yariks5s yariks5s self-assigned this Dec 18, 2024
@vdimir vdimir force-pushed the vdimir/test_keeper_promise branch 3 times, most recently from 0dce045 to 07f7d94 Compare December 19, 2024 15:50
@vdimir vdimir enabled auto-merge December 20, 2024 12:42
@vdimir

vdimir commented Dec 20, 2024

Copy link
Copy Markdown
Member Author

I’ve made some updates to the patch since review, instead of skipping the check for expired, we will now call callback with Error::ZSESSIONEXPIRED, as is done in the finalizer for the queue.

@vdimir vdimir force-pushed the vdimir/test_keeper_promise branch from d9f0745 to 704be53 Compare December 20, 2024 15:20
@vdimir

vdimir commented Dec 23, 2024

Copy link
Copy Markdown
Member Author

@vdimir vdimir added this pull request to the merge queue Dec 27, 2024
Merged via the queue into master with commit 368186d Dec 27, 2024
@vdimir vdimir deleted the vdimir/test_keeper_promise branch December 27, 2024 16:40
@robot-ch-test-poll2 robot-ch-test-poll2 added the pr-synced-to-cloud The PR is synced to the cloud repo label Dec 27, 2024
pull Bot pushed a commit to AKJUS/ClickHouse that referenced this pull request Jun 25, 2026
In Coordination::ZooKeeper::sendThread, after a request is popped from the
queue the local RequestInfo is the sole owner of its callback until the request
is inserted into operations (from which receiveEvent()/finalize() later satisfy
it). Async callers such as ZooKeeper::asyncTryExistsNoThrow capture a
shared_ptr<std::promise<...>> in that callback and hand the std::future to a
waiting caller before the request is processed.

If anything in the pop->insert window throws (the OpenTelemetry span finalize,
addRootPath, or the operations map insert, all of which can allocate and throw
under memory pressure), the local RequestInfo is destroyed while unwinding and
the callback never runs. The captured std::promise is then destroyed unsatisfied,
so the waiter's future.get() observes a std::future_error (broken promise). That
future_error is a std::logic_error, which the server reports as a LOGICAL_ERROR
and aborts on (see PR ClickHouse#51907). This matches the reported crash (STID 2508-34fb,
ZooKeeper-client variant): ~promise<ExistsResponse> running from sendThread's
teardown.

Arm a SCOPE_EXIT guard over that window: if the request is dropped before its
callback ownership transfers to operations, satisfy the callback with an error
response (ZCONNECTIONLOSS if it was probably sent, otherwise ZSESSIONEXPIRED)
instead of abandoning the promise. This mirrors how finalize() drains
outstanding operations and the already-merged sibling fixes for the same error
class (TestKeeper, PR ClickHouse#73570; the thread-pool callback runner, PR ClickHouse#107383). The
fix is at the send layer, so it covers every async request type uniformly.

Add a unit test reproducing the window contract: without the guard the future
carries a broken-promise future_error; with the guard it carries a normal
ZSESSIONEXPIRED response and nothing aborts.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants