perf(wsq): remove UnboundedWSQ::pop() and relax steal() ordering; dow… by wicyn · Pull Request #793 · taskflow/taskflow · GitHub
Skip to content

perf(wsq): remove UnboundedWSQ::pop() and relax steal() ordering; dow…#793

Closed
wicyn wants to merge 0 commit into
taskflow:masterfrom
wicyn:master
Closed

perf(wsq): remove UnboundedWSQ::pop() and relax steal() ordering; dow…#793
wicyn wants to merge 0 commit into
taskflow:masterfrom
wicyn:master

Conversation

@wicyn

@wicyn wicyn commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

…ngrade BoundedWSQ::steal() top load to relaxed

UnboundedWSQ is used only as the executor's overflow region: the owner only pushes and thieves only steal — the owner never pops. With pop() gone as dead code, the cross-variable Dekker mutual-exclusion structure between top and bottom disappears, and that structure was the sole reason steal() needed a seq_cst fence and a seq_cst CAS. The downgrades follow from this:

UnboundedWSQ::steal()

  • Remove the seq_cst fence: it existed only to pair with pop()'s fence in the single total order S and arbitrate the "last element" race. With no pop(), the fence has nothing to pair with.
  • CAS success order seq_cst -> acq_rel: the only remaining contention is thief-vs-thief on the single variable top. CAS RMW atomicity guarantees exactly one thief wins, and release/acquire is sufficient to chain top's modification order; no cross-variable total order is required.
  • _top.load acquire -> relaxed: top increases monotonically, so relaxed can at worst read a smaller, stale value, making the t<b check conservative (one missed steal, never out of bounds); the CAS then re-validates against the latest top and returns empty on failure. Slot-data visibility is carried by _bottom.load(acquire) paired with push()'s release, independent of how top is read.

BoundedWSQ::steal()

  • Only _top.load acquire -> relaxed. BoundedWSQ still has pop(), so the Dekker structure remains and both the seq_cst fence and the seq_cst CAS are kept unchanged. Relaxing the top load is safe: the seq_cst fence immediately following it is a full barrier that already prevents the top load from being reordered with the subsequent bottom load and slot load — the LoadLoad reordering that acquire was meant to block is already covered by the fence. Slot visibility likewise depends on bottom's acquire chain, not on how top is read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant