{{ message }}
perf(wsq): remove UnboundedWSQ::pop() and relax steal() ordering;#795
Closed
wicyn wants to merge 1 commit into
Closed
perf(wsq): remove UnboundedWSQ::pop() and relax steal() ordering;#795wicyn wants to merge 1 commit into
wicyn wants to merge 1 commit into
Conversation
downgrade BoundedWSQ::steal() top load and pop() CAS to relaxed/acq_rel
UnboundedWSQ is used only as the executor's overflow region: the owner only
pushes and thieves only steal — the owner never pops. With pop() gone as dead
code, the cross-variable Dekker mutual-exclusion structure between top and
bottom disappears, and that structure was the sole reason steal() needed a
seq_cst fence and a seq_cst CAS. The downgrades follow from this:
UnboundedWSQ::steal()
- Remove the seq_cst fence: it existed only to pair with pop()'s fence in the
single total order S and arbitrate the "last element" race. With no pop(),
the fence has nothing to pair with.
- CAS success order seq_cst -> acq_rel: the only remaining contention is
thief-vs-thief on the single variable top. CAS RMW atomicity guarantees
exactly one thief wins, and release/acquire is sufficient to chain top's
modification order; no cross-variable total order is required.
- _top.load acquire -> relaxed: top increases monotonically, so relaxed can
at worst read a smaller, stale value, making the t<b check conservative
(one missed steal, never out of bounds); the CAS then re-validates against
the latest top and returns empty on failure. Slot-data visibility is
carried by _bottom.load(acquire) paired with push()'s release, independent
of how top is read.
BoundedWSQ::steal()
- Only _top.load acquire -> relaxed. BoundedWSQ still has pop(), so the Dekker
structure remains and both the seq_cst fence and the seq_cst CAS are kept
unchanged. Relaxing the top load is safe: the seq_cst fence immediately
following it is a full barrier that already prevents the top load from being
reordered with the subsequent bottom load and slot load — the LoadLoad
reordering that acquire was meant to block is already covered by the fence.
Slot visibility likewise depends on bottom's acquire chain, not on how top
is read.
BoundedWSQ::pop()
- CAS success order seq_cst -> acq_rel. The seq_cst fence in pop() is retained,
and so is steal()'s seq_cst fence and steal()'s seq_cst CAS. The argument
that pop()'s CAS need not enter S:
* The Dekker / SB exclusion that prevents fast-path double-take is carried
entirely by the two seq_cst fences plus steal()'s seq_cst CAS anchor.
pop()'s own side is cleanly SB-shaped (store bottom; seq_cst fence;
load top), so its fence does the SB duty on its own — pop()'s CAS is not
the anchor for either fence.
* pop()'s CAS executes only on the t==b (last-element) path, i.e. never on
the fast path. That path is single-variable contention: owner-CAS vs
thief-CAS both move top from t to t+1, and RMW atomicity guarantees
exactly one winner regardless of memory order. acq_rel provides the
acquire needed to observe the loser/winner state and the release to
publish the claim; entry into the global total order S is not required.
Net effect: pop() keeps its seq_cst fence (load-bearing) but drops one
seq_cst RMW on the cold last-element path.
Tests (wsq_test.cpp)
- UnboundedWSQ tests converted to steal-only; all UnboundedWSQ::pop() calls
removed. BoundedWSQ tests are untouched (BoundedWSQ still has pop()).
- UnboundedWSQ.Resize: drain loop pop() -> steal().
- UnboundedWSQ.Owner / ValueType: dropped the LIFO pop() verification; the
pre-existing steal() FIFO check now covers the single-thread path. Removed
the now-duplicated second push round.
- UnboundedWSQ.*Consumers / *.BulkPush: dropped the "master/bulk push and pop"
LIFO blocks; in the concurrent phase the owner only pushes and waits
(while(consumed != N) yield()) for thieves to drain, since a steal-only
owner cannot pop. `items` is now collected solely from `stolens`, and the
trailing pop()==nullptr assertion is removed (steal()==nullptr kept).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

UnboundedWSQ is used only as the executor's overflow region: the owner only pushes and thieves only steal — the owner never pops. With pop() gone as dead code, the cross-variable Dekker mutual-exclusion structure between top and bottom disappears, and that structure was the sole reason steal() needed a seq_cst fence and a seq_cst CAS. The downgrades follow from this:
UnboundedWSQ::steal()
BoundedWSQ::steal()
BoundedWSQ::pop()
Tests (wsq_test.cpp)
itemsis now collected solely fromstolens, and the trailing pop()==nullptr assertion is removed (steal()==nullptr kept).