iframe-proxy

alexey-milovidov · 2026-06-07T22:15:07Z

ResourceGuard::Request is reused per thread via the Request::local() thread-local instance. Its exception member is set by failed() but was never cleared on reuse, so a thread that had previously run a failed resource request kept a stale exception. A subsequent request scheduled on that thread could be granted via execute() (no exception set), yet Request::wait() would still observe the stale exception and throw RESOURCE_ACCESS_DENIED ("Resource request failed: ... Scheduler queue with resource request is about to be destructed"), even though that request had actually been granted.

This can surface to users as a query spuriously failing with RESOURCE_ACCESS_DENIED when restructuring/dropping a workload's queue (e.g. CREATE WORKLOAD/ALTER/DROP) fails one request and the same worker thread is then reused for another request that is in fact granted.

The fix clears exception in Request::enqueue() (the reuse reset point) so a granted request can never observe a previous request's failure. Genuinely failed requests still throw: failed() sets exception and wait() throws as before; the value is only reset when the request is reused for a new enqueue.

This was found while investigating a reliable deadlock in SchedulerWorkloadResourceManager.DropNotEmptyQueueLong exposed by the LIFO worker-thread wake order; it is a pre-existing latent bug that makes sense to fix on its own. Verified against SchedulerWorkloadResourceManager.*: passes under ThreadSanitizer and under heavy parallel stress (previously a reliable hang/spurious failure).

Related: #100177

Changelog category (leave one):

Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Fixed a rare spurious RESOURCE_ACCESS_DENIED error ("Scheduler queue with resource request is about to be destructed") for a query that was actually granted access to a workload resource, caused by a reused thread-local request object retaining a previous request's failure.

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

Version info

Merged into: 26.6.1.760

…uest` `ResourceGuard::Request` is reused per thread via the `Request::local()` thread-local instance. Its `exception` member is set by `failed()` but was never cleared on reuse, so a thread that had previously run a failed resource request kept a stale `exception`. A subsequent request scheduled on that thread could be granted via `execute()` (no exception set), yet `Request::wait()` would still observe the stale `exception` and throw `RESOURCE_ACCESS_DENIED` ("Resource request failed: ... Scheduler queue with resource request is about to be destructed"), even though that request had actually been granted. This can surface to users as a query spuriously failing with `RESOURCE_ACCESS_DENIED` when a workload's queue is restructured/dropped (e.g. `CREATE WORKLOAD`/`ALTER`/`DROP`) fails one request and the same worker thread is then reused for another request that is in fact granted. Clear `exception` in `Request::enqueue()` (the reuse reset point) so a granted request can never observe a previous request's failure. Genuinely failed requests still throw: `failed()` sets `exception` and `wait()` throws as before; the value is only reset when the request is reused for a new enqueue. Related: #100177 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

clickhouse-gh · 2026-06-07T22:15:41Z

…tale-exception

…d regression tests The default-lock `ResourceGuard` constructor enqueues the request and then calls `Request::wait()` inside the constructor. If either `enqueue()` or `wait()` throws (the request failed because its queue is being destructed), the constructor propagates the exception and `~ResourceGuard()` never runs, so the request is never finished. The reused thread-local `Request::local()` instance is then left in the `Enqueued` or `Dequeued` state, and the next request enqueued on that thread hits `chassert(state == Finished)` in `Request::enqueue()` (an exception in debug/sanitizer builds), even though this pull request is specifically making that same-thread reuse valid. Wrap the constructor's `enqueue`/`wait` in a `try`/`catch` and restore the reused request to the `Finished` state before rethrowing, via the new `Request::recoverAfterConstructorFailure`. For a request failed in `wait()` (state `Dequeued`) this mirrors what `~ResourceGuard()` does on the `Lock::Defer` path: `finish` is a documented no-op for a failed request. For a request that failed in `enqueue()` (state `Enqueued`) the request never entered the scheduler, so only the state is reset. Add two regression tests in `gtest_workload_resource_manager`: - `ReuseRequestAfterFailedDeferRequestIsGranted` covers the original stale `exception` reset: a `Lock::Defer` request fails (queue destroyed) and a subsequent granted request on the same thread must not observe the previous failure. Without clearing `exception` in `enqueue()`, the granted request spuriously throws `RESOURCE_ACCESS_DENIED`. - `ReuseRequestAfterFailedDefaultRequestIsGranted` covers the constructor failure path above: a `Lock::Default` request fails and a granted request is then issued on the same thread. An uncaught exception in a worker thread is swallowed by the thread pool, so the granted reuse is wrapped in `EXPECT_NO_THROW` to surface failures. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

clickhouse-gh · 2026-06-09T02:37:40Z

📊 Cloud Performance Report

✅ AI verdict: no_change — no significant changes across 38 queries analysed

This PR only fixes an exception-recovery bug in the workload I/O scheduler's ResourceGuard (clearing a stale error on a reused request) plus adds two regression tests; it does not touch the normal query-execution path. The two flagged improvements on tpch_adapted_1_official (Q7 -32%, Q20 -44%) therefore cannot plausibly be caused by this change, and the source side had few runs with noisy measurements (Q20 especially). Both have been downgraded to not-sure as likely run-to-run variance rather than a real PR effect.

clickbench

🟢 No significant changes

tpch_adapted_1_official

⚠️ 2 inconclusive

Flagged queries (2 of 22)

	Query	Verdict	Baseline med (ms)	PR med (ms)	Change	q-value	Hint
⚠️	7	not_sure	112	76	-32.1%	<0.0001	PR only touches the I/O scheduler ResourceGuard error-recovery path; cannot speed up a TPC-H join. Off-path noise.
⚠️	20	not_sure	241	135	-44.0%	<0.0001	Same off-path change; source runs noisy (cv 36%) on only 6 runs. -44% is run-to-run variance, not this PR.

_{q-value = BH-FDR adjusted p; smaller is stronger evidence. MIRAI flags a query when q < fdr_q (default 0.10) — the value the verdict is based on.}

Debug info

StressHouse run: 41f7e294-9ad8-44f5-a7aa-3f44ca378e84
MIRAI run: 9f9e9087-3ed7-427f-a635-f6a6d9078c4e
PR check IDs:
- clickbench_67596_1780971423
- clickbench_67694_1780971521
- clickbench_67803_1780971558
- tpch_adapted_1_official_67934_1780971781
- tpch_adapted_1_official_68019_1780971811
- tpch_adapted_1_official_68115_1780971845

…sourceTest.h` The style check flags `ErrorCodes::INVALID_SCHEDULER_NODE` as defined but not used in `ResourceTest.h`: the check excludes `tests/*.cpp` files but not headers, so the declaration must live in `gtest_workload_resource_manager.cpp` where the code is actually used. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

Address review: `enqueueRequestUsingBudget` commits `budget.ask` before calling the concrete `enqueueRequest`, which can throw before linking the request into the queue (`INVALID_SCHEDULER_NODE` when the queue is being destructed, or `SERVER_OVERLOADED` when `max_waiting_queries` is reached). The request never enters the scheduler and `finish` never runs for it, but the queue budget had already been changed assuming the request would be granted `request->cost` and consume `estimated_cost`. In particular, when the queue carries a debt from a previous over-consumption, `ask` raises the request's cost to repay it, and a failed enqueue would silently forget the debt, skewing costs of subsequent requests. Wrap `enqueueRequest` in `try`/`catch` and roll the transaction back via `budget.adjust(estimated_cost, request->cost)` before rethrowing. Add a regression test `SchedulerRoot.BudgetAfterFailedEnqueue`: it creates a negative budget, purges the queue so `enqueueRequest` throws, and checks the budget is unchanged after the failed enqueue (without the rollback the budget observably jumps from the debt value back to zero). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

alexey-milovidov · 2026-06-10T13:41:29Z

Pushed two commits:

ef0dd5d733c fixes the style check failure: ErrorCodes::INVALID_SCHEDULER_NODE was declared in ResourceTest.h but used only in gtest_workload_resource_manager.cpp (the style check excludes tests/*.cpp but not headers), so the declaration moved to the test file.
e19c2c166bd addresses the review about budget exception safety: enqueueRequestUsingBudget now rolls back the budget.ask transaction if enqueueRequest throws before the request is linked, with a regression test SchedulerRoot.BudgetAfterFailedEnqueue (verified to fail without the rollback).

SchedulerRoot.* (4) and SchedulerWorkloadResourceManager.* (23) gtests pass locally; the cpp style check is clean.

`ResourceGuardSessionDataHooks` in `HTTPConnectionPool.cpp` is the second owner of a `ResourceGuard::Request` (besides the thread-local instance used by `ResourceGuard`). Its `atStart` does `request.enqueue` followed by `request.wait`, and `HTTPSession::write`/`HTTPSession::receive` catch any exception thrown from `atStart` (note that `DB::Exception` is a `Poco::Exception`) and unconditionally call `atFail`. If `enqueue` throws before the request is linked into the scheduler (for example `INVALID_SCHEDULER_NODE` from a purged queue, or `SERVER_OVERLOADED` from `max_waiting_queries`), the request is left in the `Enqueued` state. The subsequent `atFail` then calls `request.finish(0, link)` on a non-`Dequeued` request, tripping `chassert(state == Dequeued)` in debug/sanitizer builds and adjusting the queue budget with a stale `estimated_cost` in release builds. Wrap `atStart` in a try/catch that calls `recoverAfterConstructorFailure` on failure (mirroring the `ResourceGuard` constructor), and gate `atFinish`/`atFail` behind an `active` flag set only after `wait` succeeds, so the unconditional `atFail` during unwinding does not finish the request twice. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

…tale-exception

alexey-milovidov · 2026-06-13T02:55:00Z

The sole CI red on the previous head (e19c2c166bd) was 01710_projection_aggregation_in_order in Stateless tests (arm_asan_ubsan, azure, parallel), which failed with Code: 159. DB::Exception: Timeout exceeded: elapsed 60032 ms, maximum: 60000 ms: While executing MergeTreeSelect(...) — a 60s timeout on the heavy ASan+UBSan+azure+parallel job. This is unrelated to the scheduler changes in this PR: the same test has timed out the same way on unrelated PRs over the last 30 days (#106734, #105246) and there is no open tracking issue. It is a slow-runner flake.

@groeneai, could you take a look at 01710_projection_aggregation_in_order timing out on arm_asan_ubsan, azure, parallel? Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106690&sha=e19c2c166bdefa72b341a736015a546b292cf9fd&name_0=PR&name_1=Stateless%20tests%20%28arm_asan_ubsan%2C%20azure%2C%20parallel%29

clickhouse-gh · 2026-06-13T03:01:15Z

+            // call `atFail()` while unwinding, so recover the request to the `Finished` state now and keep
+            // `active` false. Otherwise `atFail()` would call `request.finish()` on a non-`Dequeued` request,
+            // tripping `chassert(state == Dequeued)` in debug builds and corrupting the queue budget in release.
+            request.recoverAfterConstructorFailure(link);


This recovery path needs a focused regression test for the ResourceGuardSessionDataHooks owner as well. The new tests cover the thread-local ResourceGuard::Request::local() owner and the budget rollback, but they do not exercise the Poco control flow that makes this line necessary: atStart throws, HTTPSession::write/receive catches it, and then unconditionally calls atFail.

Without a test around this hook path, dropping either this recovery call or the active guard below would leave all current regressions green while reintroducing the debug chassert / release budget-drift failure for scheduled HTTP socket I/O. Please add the smallest test that drives an HTTP session with an attached read or write ResourceLink, makes the scheduler request fail during atStart (for example by purging/filling the queue), and verifies the hook unwinds cleanly and the request can be reused/finished correctly.

groeneai · 2026-06-13T03:23:56Z

@alexey-milovidov Investigated. The recorded red for 01710_projection_aggregation_in_order on e19c2c166bd is not a 60s timeout, it is the runner-wide OvercommitTracker OOM on the arm_asan_ubsan, azure, parallel shard. Same host-level event we analyzed for 04033_tpc_ds_q04 on your #106182.

Evidence (CIDB + the result_pr.json for the run you linked):

The failing test log is Code: 241 ... (total) memory limit exceeded ... current RSS: 54.24 GiB, maximum: 53.69 GiB. OvercommitTracker decision: Query was selected to stop ... While executing ProjectionDataSink, on the trivial INSERT INTO agg SELECT 1, ... FROM numbers(100000). It ran 1.46s before the host-cap kill, and the reproducibility check passed 0 out of 3 reruns.
There is no Code: 159 / Timeout exceeded row for this test in 90 days, and none for any test at e19c2c166bd. The 60032 ms MergeTreeSelect string is not in CIDB for this test or this PR.
01710 is fast on this shard: 5691 OK runs, p50 1.8s, p95 4.6s, max 36.7s, never near the 60s budget. It is not a slow runner.
On 2026-06-10 (your run) 304 distinct tests on this shard were OvercommitTracker-killed at the identical 53.69 GiB cap (559 on 06-09). 01710 is one collateral victim, picked because OvercommitTracker had to free memory while other parallel tests held RSS at the host cap.

The contention self-resolved: shard OOM-kills dropped 304 (06-10) -> 59 (06-11) -> 12 (06-12), and 01710 has had zero failures since 06-10. This tracks the revert #107122 of the multistage-distributed-queries change (#106020) that drove the heap-corruption and thread-teardown storm on the asan shards from 06-08 to 06-10.

So no per-test fix is warranted. A long tag would not help (the kill is at 1.46s, not a duration issue), and pinning a setting or shrinking the insert would only change which query OvercommitTracker selects, not prevent the shard OOM. The only real fix is infra-level (lower per-shard test parallelism or raise the server memory budget on this runner), the same conclusion as #106182. If you are seeing a genuine Code: 159 60s MergeTreeSelect timeout on a head I am not looking at, point me at it and I will dig further.

Report: https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=106690&sha=e19c2c166bdefa72b341a736015a546b292cf9fd&name_0=PR&name_1=Stateless%20tests%20%28arm_asan_ubsan%2C%20azure%2C%20parallel%29

clickhouse-gh · 2026-06-13T15:39:18Z

LLVM Coverage Report

Changed lines: Changed C/C++ lines covered by tests: 120/127 (94.49%) | Lost baseline coverage: none · Uncovered code

Full report · Diff report

clickhouse-gh Bot added the pr-bugfix Pull request with bugfix, not backported by default label Jun 7, 2026

clickhouse-gh Bot reviewed Jun 7, 2026

View reviewed changes

Comment thread src/Common/Scheduler/ResourceGuard.h

alexey-milovidov and others added 2 commits June 8, 2026 23:00

Merge remote-tracking branch 'origin/master' into fix-resourceguard-s…

01f5bd6

…tale-exception

clickhouse-gh Bot reviewed Jun 9, 2026

View reviewed changes

Comment thread src/Common/Scheduler/ResourceGuard.h

serxa approved these changes Jun 9, 2026

View reviewed changes

clickhouse-gh Bot assigned serxa Jun 9, 2026

alexey-milovidov and others added 2 commits June 10, 2026 13:40

clickhouse-gh Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread src/Common/Scheduler/ResourceGuard.h

alexey-milovidov and others added 2 commits June 13, 2026 02:53

Merge remote-tracking branch 'origin/master' into fix-resourceguard-s…

badcad2

…tale-exception

clickhouse-gh Bot reviewed Jun 13, 2026

View reviewed changes

Merge branch 'master' into fix-resourceguard-stale-exception

da47f28

alexey-milovidov mentioned this pull request Jun 13, 2026

Skip macOS-incompatible distributed tests 04327 and 04336 #107376

Merged

alexey-milovidov added this pull request to the merge queue Jun 13, 2026

Merged via the queue into master with commit 97d86a3 Jun 13, 2026
166 checks passed

alexey-milovidov deleted the fix-resourceguard-stale-exception branch June 13, 2026 20:28

robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Jun 13, 2026

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix spurious resource request failure from reused `ResourceGuard::Request`#106690

Fix spurious resource request failure from reused `ResourceGuard::Request`#106690
alexey-milovidov merged 8 commits into
masterfrom
fix-resourceguard-stale-exception

alexey-milovidov commented Jun 7, 2026 •

edited by robot-clickhouse

Loading

Uh oh!

clickhouse-gh Bot commented Jun 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

clickhouse-gh Bot commented Jun 9, 2026

Uh oh!

alexey-milovidov commented Jun 10, 2026

Uh oh!

Uh oh!

alexey-milovidov commented Jun 13, 2026

Uh oh!

clickhouse-gh Bot Jun 13, 2026

Uh oh!

groeneai commented Jun 13, 2026

Uh oh!

clickhouse-gh Bot commented Jun 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

alexey-milovidov commented Jun 7, 2026 • edited by robot-clickhouse Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Documentation entry for user-facing changes

Version info

Uh oh!

clickhouse-gh Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

AI Review

Summary

Tests

Final Verdict

Uh oh!

Uh oh!

Uh oh!

clickhouse-gh Bot commented Jun 9, 2026

clickbench

tpch_adapted_1_official

Uh oh!

alexey-milovidov commented Jun 10, 2026

Uh oh!

Uh oh!

alexey-milovidov commented Jun 13, 2026

Uh oh!

clickhouse-gh Bot Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

groeneai commented Jun 13, 2026

Uh oh!

clickhouse-gh Bot commented Jun 13, 2026

LLVM Coverage Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alexey-milovidov commented Jun 7, 2026 •

edited by robot-clickhouse

Loading

clickhouse-gh Bot commented Jun 7, 2026 •

edited

Loading