{{ message }}
gh-119592: gh-152967: Fix ProcessPoolExecutor stranding submitted work when a max_tasks_per_child worker exits#152978
Open
gpshead wants to merge 1 commit into
Open
Conversation
…n a max_tasks_per_child worker exits Worker replacement went through the executor object: the manager thread read executor attributes that shutdown(wait=False) clears concurrently, and could not replace workers at all once the executor was garbage collected. A worker exiting at its max_tasks_per_child limit in those states left the remaining submitted work permanently unexecuted and hung interpreter exit; the racing case could crash the manager thread. Replace workers from the executor manager thread using its own state plus configuration read through the live executor weakref, which shutdown() never clears: - After shutdown(wait=False) with the executor still referenced, a replacement is spawned and the remaining work is executed as documented. - Once the executor has been garbage collected (pythongh-152967), or a replacement worker cannot be started and no workers remain, the remaining futures now fail with BrokenProcessPool instead of never resolving. - A new _force_shutting_down flag stops both spawn paths from starting workers that would escape terminate_workers()/kill_workers(). Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
gpshead
commented
Jul 3, 2026
Member
Author
There was a problem hiding this comment.
While this might look like a public API change in the diff... it's on the _ExecutorManagerThread internal use only class. Fine to backport.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Worker replacement went through the executor object: the manager thread read executor attributes that shutdown(wait=False) clears concurrently, and could not replace workers at all once the executor was garbage collected. A worker exiting at its max_tasks_per_child limit in those states left the remaining submitted work permanently unexecuted and hung interpreter exit; the racing case could crash the manager thread.
Replace workers from the executor manager thread using its own state plus configuration read through the live executor weakref, which shutdown() never clears:
Drafted and investigated entirely by Claude Fable 5 based on the issues. I'm putting this up as a draft to better iterate on review to see what shape this should take and how feasible backporting this further as a bugfix could even be. edit: Looks to be in good shape. Undrafting.