{{ message }}
perf(wsq): remove UnboundedWSQ::pop() and relax steal() ordering;#796
Open
wicyn wants to merge 2 commits into
Open
perf(wsq): remove UnboundedWSQ::pop() and relax steal() ordering;#796wicyn wants to merge 2 commits into
wicyn wants to merge 2 commits into
Conversation
downgrade BoundedWSQ::steal() top load and pop() CAS to relaxed/acq_rel
UnboundedWSQ is used only as the executor's overflow region: the owner only
pushes and thieves only steal — the owner never pops. With pop() gone as dead
code, the cross-variable Dekker mutual-exclusion structure between top and
bottom disappears, and that structure was the sole reason steal() needed a
seq_cst fence and a seq_cst CAS. The downgrades follow from this:
UnboundedWSQ::steal()
- Remove the seq_cst fence: it existed only to pair with pop()'s fence in the
single total order S and arbitrate the "last element" race. With no pop(),
the fence has nothing to pair with.
- CAS success order seq_cst -> acq_rel: the only remaining contention is
thief-vs-thief on the single variable top. CAS RMW atomicity guarantees
exactly one thief wins, and release/acquire is sufficient to chain top's
modification order; no cross-variable total order is required.
- _top.load acquire -> relaxed: top increases monotonically, so relaxed can
at worst read a smaller, stale value, making the t<b check conservative
(one missed steal, never out of bounds); the CAS then re-validates against
the latest top and returns empty on failure. Slot-data visibility is
carried by _bottom.load(acquire) paired with push()'s release, independent
of how top is read.
BoundedWSQ::steal()
- Only _top.load acquire -> relaxed. BoundedWSQ still has pop(), so the Dekker
structure remains and both the seq_cst fence and the seq_cst CAS are kept
unchanged. Relaxing the top load is safe: the seq_cst fence immediately
following it is a full barrier that already prevents the top load from being
reordered with the subsequent bottom load and slot load — the LoadLoad
reordering that acquire was meant to block is already covered by the fence.
Slot visibility likewise depends on bottom's acquire chain, not on how top
is read.
BoundedWSQ::pop()
- CAS success order seq_cst -> acq_rel. The seq_cst fence in pop() is retained,
and so is steal()'s seq_cst fence and steal()'s seq_cst CAS. The argument
that pop()'s CAS need not enter S:
* The Dekker / SB exclusion that prevents fast-path double-take is carried
entirely by the two seq_cst fences plus steal()'s seq_cst CAS anchor.
pop()'s own side is cleanly SB-shaped (store bottom; seq_cst fence;
load top), so its fence does the SB duty on its own — pop()'s CAS is not
the anchor for either fence.
* pop()'s CAS executes only on the t==b (last-element) path, i.e. never on
the fast path. That path is single-variable contention: owner-CAS vs
thief-CAS both move top from t to t+1, and RMW atomicity guarantees
exactly one winner regardless of memory order. acq_rel provides the
acquire needed to observe the loser/winner state and the release to
publish the claim; entry into the global total order S is not required.
Net effect: pop() keeps its seq_cst fence (load-bearing) but drops one
seq_cst RMW on the cold last-element path.
Tests (wsq_test.cpp)
- UnboundedWSQ tests converted to steal-only; all UnboundedWSQ::pop() calls
removed. BoundedWSQ tests are untouched (BoundedWSQ still has pop()).
- UnboundedWSQ.Resize: drain loop pop() -> steal().
- UnboundedWSQ.Owner / ValueType: dropped the LIFO pop() verification; the
pre-existing steal() FIFO check now covers the single-thread path. Removed
the now-duplicated second push round.
- UnboundedWSQ.*Consumers / *.BulkPush: dropped the "master/bulk push and pop"
LIFO blocks; in the concurrent phase the owner only pushes and waits
(while(consumed != N) yield()) for thieves to drain, since a steal-only
owner cannot pop. `items` is now collected solely from `stolens`, and the
trailing pop()==nullptr assertion is removed (steal()==nullptr kept).
Member
d5ddd24 to
af2908a
Compare
…tor overflow
OverflowWSQ
- Steal-only: owner has push()/bulk_push() but no pop(). The executor's
overflow region never pops from the owner end.
- With no pop(), the cross-variable Dekker mutual-exclusion between top and
bottom disappears, which is the sole reason a seq_cst fence + seq_cst CAS
were required in steal(). steal() is therefore lighter than UnboundedWSQ's:
* no seq_cst fence (it only existed to pair with pop()'s fence);
* CAS success order acq_rel instead of seq_cst — remaining contention is
thief-vs-thief on the single variable top, where RMW atomicity picks
the unique winner and release/acquire suffices to chain top's
modification order; no entry into the global total order S is needed;
* _top.load relaxed (top is monotonic; a stale-small read only makes the
t<b check conservative, never out of bounds; the CAS re-validates).
Slot-data visibility is carried by _bottom.load(acquire) paired with
push()'s release, independent of how top is read.
- Buffer lifetime identical to UnboundedWSQ: resize() parks the old Array in
_garbage, released only at destruction — a thief holding a stale _array
pointer still reads live, value-correct memory. No use-after-free.
UnboundedWSQ
- Untouched. pop() and the original seq_cst orderings are preserved for any
current or future use case that needs owner-side LIFO pop.
Tests
- UnboundedWSQ / BoundedWSQ tests untouched.
- Add dedicated OverflowWSQ tests.
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

UnboundedWSQ is used only as the executor's overflow region: the owner only pushes and thieves only steal — the owner never pops. With pop() gone as dead code, the cross-variable Dekker mutual-exclusion structure between top and bottom disappears, and that structure was the sole reason steal() needed a seq_cst fence and a seq_cst CAS. The downgrades follow from this:
UnboundedWSQ::steal()
BoundedWSQ::steal()
BoundedWSQ::pop()
Tests (wsq_test.cpp)
itemsis now collected solely fromstolens, and the trailing pop()==nullptr assertion is removed (steal()==nullptr kept).