Fix use-after-free of workload storage mutex during server shutdown#101447
Conversation
During server shutdown, `ContextSharedPart` destroys `workload_entity_storage` via `unique_ptr::reset` while a query pipeline may still be running `PipelineExecutor::allocateCPU`, which holds a raw reference to the storage obtained from `getWorkloadEntityStorage`. This caused a use-after-free of the `recursive_mutex` inside `WorkloadEntityStorageBase`, detected by TSan as a data race in `TraceEvent<EventUnlock>` / `TraceMutexUnlock`. Fix: - Change `workload_entity_storage` in `ContextSharedPart` from `unique_ptr` to `shared_ptr`. - Add `Context::getWorkloadEntityStoragePtr` returning a `shared_ptr` copy. - In `PipelineExecutor::allocateCPU`, hold the `shared_ptr` for the duration of the call so the storage object cannot be destroyed underneath it. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
`getWorkloadEntityStoragePtr` returned `shared->workload_entity_storage` without holding `shared->mutex`, while `shutdown` moved from that same field under the mutex at the same time. Reading and writing one `std::shared_ptr` object concurrently from two threads is a data race. Fix: acquire `shared->mutex` (shared/read lock) before copying the `shared_ptr` out, matching the exclusive lock held by the write side in `shutdown`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous fix only guarded the read of `shared->workload_entity_storage` but the write inside `callOnce` was still unprotected. If `shutdown` ran concurrently with the first call to `getWorkloadEntityStoragePtr`, the `callOnce` lambda could write to `workload_entity_storage` at the same time as `shutdown` moved from it under `shared->mutex` — another data race on the same `std::shared_ptr` object. Fix: create the storage object in a local variable first (outside the mutex, since construction can be slow), then assign to `shared->workload_entity_storage` while holding `shared->mutex`. Both read and write of the field are now guarded by `shared->mutex`. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
The Stress test (arm_msan) failure is fixed by #101239, which should be merged first. After it is merged, please update the branch to include the fix. |
|
The failures of "Flaky check" in "functions_bad_arguments" will be fixed by #101994. |
|
@groeneai, make the test |
The test creates users and roles with hardcoded global names (u_02015_allowed, u_02015_denied, role_02015). When the flaky check runs this test multiple times with parallel workers, one instance's DROP USER races with another instance's REVOKE/CREATE, causing UNKNOWN_ROLE errors. Fix: suffix user and role names with $CLICKHOUSE_DATABASE (unique per test invocation) so parallel instances operate on independent global objects. Reproduction: --test-runs 10 -j 4 fails 6/10 without fix, 50/50 passes with fix at -j 8. Ref: ClickHouse#101447
|
@alexey-milovidov — fix PR created: #102019 Root cause: The test uses hardcoded global user/role names ( Fix: Suffix all user/role names with Verified locally: 6/10 fail without fix ( |
|
The flaky test failures in this PR ( |
Split flaky check and targeted check into separate roles: Flaky check — runs only new/modified tests from the PR with thread fuzzer enabled. Tests run in parallel with themselves (shared queue) to ensure new tests are parallel-safe under randomized thread ordering. Targeted check — runs previously failed and coverage-relevant tests with per-worker partitioning (--no-self-parallel) so the same test never runs concurrently on multiple workers, avoiding conflicts on shared resources (hardcoded users, roles, settings profiles, etc.). Both use --test-runs 50. Changes: - Add --no-self-parallel flag to clickhouse-test: partitions tests by name hash so all copies go to the same worker - Flaky check: only changed tests, thread fuzzer, shared queue - Targeted check: relevant tests, no thread fuzzer, per-worker queues, 50 runs, single clickhouse-test invocation - Restore original thread fuzzer parameters (reverts reductions from 06c945a and 9790435) - Add amd_binary to flaky check, use AMD_LARGE for targeted debug - Targeted check uses --flaky-check --no-self-parallel PRs affected by tests running in parallel with themselves: #102015 #101093 #99653 #101503 #101108 #101884 #100746 #101447 #100941 #101099 #100203 #96130 #102038
Split flaky check and targeted check into separate roles: Flaky check — runs only new/modified tests from the PR with thread fuzzer enabled. Tests can run in parallel with themselves (shared queue) to ensure new tests are parallel-safe. Targeted check — runs previously failed and coverage-relevant tests with per-worker partitioning (--no-self-parallel) so the same test never runs concurrently on multiple workers, avoiding conflicts on shared resources (hardcoded users, roles, settings profiles, etc.). Both use --test-runs 50. Changes: - Add --no-self-parallel flag to clickhouse-test: partitions tests by name hash so all copies go to the same worker - Flaky check: only changed tests, thread fuzzer, shared queue - Targeted check: relevant tests, no thread fuzzer, per-worker partitioning, 50 runs, single clickhouse-test invocation - Restore original thread fuzzer parameters - Change targeted check to arm_asan_ubsan on ARM_LARGE PRs affected by tests running in parallel with themselves: #102015 #101093 #99653 #101503 #101108 #101884 #100746 #101447 #100941 #101099 #100203 #96130
The `SHUTDOWN(... workload_entity_storage, stopWatching())` path read the `shared->workload_entity_storage` `shared_ptr` without holding `shared->mutex`, while `getWorkloadEntityStoragePtr` may concurrently initialize it under the same mutex. A concurrent read/write of the same `std::shared_ptr` object is a data race. Copy the pointer under a shared lock and call `stopWatching` on the local copy. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
`getWorkloadEntityStorage` returns a raw reference whose owning `shared_ptr` is discarded immediately, so callers running concurrently with shutdown could still observe a dangling reference once the storage is destroyed. Migrate the callers that run on query-execution paths (`ProcessList::insert`, the `CREATE`/`DROP WORKLOAD`/`RESOURCE` interpreters, `system.workloads`, `system.resources`, and the `DiskObjectStorage` subscription) to `getWorkloadEntityStoragePtr`, holding the `shared_ptr` for the duration of the access. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Continued work on this PR:
Verification: built on ARM (clean, The previously-reported CI failures ( |
The previous fix protected `PipelineExecutor::allocateCPU` and the query-execution callers, but the lazy initialization of the resource manager still observed the same shutdown race. `Context::getWorkloadClassifier` initializes the resource manager lazily through `getResourceManager`, which calls `createResourceManager`. That passed `getWorkloadEntityStorage()` (a raw reference whose owning `shared_ptr` temporary is destroyed at the end of the full expression), so no owner was held across `WorkloadResourceManager`'s constructor while it runs `getAllEntitiesAndSubscribe`. If this first lazy init overlaps shutdown's reset of `workload_entity_storage`, the storage can be destroyed while the constructor is still using it — the same `recursive_mutex` use-after-free this PR set out to fix. Make `WorkloadResourceManager` take and retain a `shared_ptr<IWorkloadEntityStorage>`, and have `createResourceManager` pass `getWorkloadEntityStoragePtr`. The storage is only touched in the constructor, so holding the owner for the manager's lifetime is more than sufficient. Also remove the now-unused reference accessor `Context::getWorkloadEntityStorage` (the last caller in `Server.cpp`'s startup `loadEntities` is migrated to `getWorkloadEntityStoragePtr`), so the shutdown-unsafe accessor can no longer be reintroduced. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Merged latest
The only CI red is the |
LLVM Coverage ReportChanged lines: Changed C/C++ lines covered by tests: 104/117 (88.89%) | Lost baseline coverage (was covered on master, now uncovered in this PR): 4 line(s) · Uncovered code |

During server shutdown,
ContextSharedPartdestroysworkload_entity_storageviaunique_ptr::resetwhile a query pipeline may still be runningPipelineExecutor::allocateCPU, which holds a raw reference to the storage obtained fromgetWorkloadEntityStorage. This caused a use-after-free of therecursive_mutexinsideWorkloadEntityStorageBase, detected by TSan as a data race inTraceEvent<EventUnlock>/TraceMutexUnlock.Fix:
workload_entity_storageinContextSharedPartfromunique_ptrtoshared_ptr.Context::getWorkloadEntityStoragePtrreturning ashared_ptrcopy.PipelineExecutor::allocateCPU, hold theshared_ptrfor the duration of the call so the storage object cannot be destroyed underneath it.Test: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=101060&sha=34f9537c4f02bdf3a862b3cdcad591081052235c&name_0=PR&name_1=Stress+test+%28amd_debug%29
#101060
Job log: https://pastila.clickhouse.com/?0000168c/a152326ece24275e5b4fe9dd6ab13982#ZZO9TJyd92smEO3/A1NiNQ==GCM
Failure: https://pastila.clickhouse.com/?001fdf85/ffb73f64f70fed2915e833076e3ffa75#JCRYrJgw6/Sz2vqG4S5AUw==GCM
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
...
Documentation entry for user-facing changes