Add SVE implementation of `replace` by hazzlim · Pull Request #6195 · microsoft/STL · GitHub
Skip to content

Add SVE implementation of replace#6195

Open
hazzlim wants to merge 11 commits into
microsoft:mainfrom
hazzlim:replace-sve-pr
Open

Add SVE implementation of replace#6195
hazzlim wants to merge 11 commits into
microsoft:mainfrom
hazzlim:replace-sve-pr

Conversation

@hazzlim

@hazzlim hazzlim commented Mar 31, 2026

Copy link
Copy Markdown
Contributor

This PR adds an SVE implementation of replace. This algorithm was previously not vectorized using Neon, due to the absence of masked stores in the instruction set. See #4433 for why this is an issue.

Benchmark results ⏲️

Results are speedup values relative to the existing C code as a baseline - higher is better. Benchmark results were obtained running on a Neoverse N2 machine.

  MSVC Speedup Clang Speedup
r<std::uint8_t> 17.03 7.024
r<std::uint16_t> 10.17 3.767
r<std::uint32_t> 4.592 2.109
r<std::uint64_t> 2.475 1.23

@hazzlim hazzlim requested a review from a team as a code owner March 31, 2026 22:18
@github-project-automation github-project-automation Bot moved this to Initial Review in STL Code Reviews Mar 31, 2026
Comment thread stl/inc/algorithm Outdated
@StephanTLavavej StephanTLavavej added performance Must go faster ARM64 Related to the ARM64 architecture ARM64EC I can't believe it's not x64! labels Mar 31, 2026
@StephanTLavavej StephanTLavavej self-assigned this Apr 2, 2026
Comment thread stl/src/vector_algorithms.cpp Outdated
Comment thread stl/src/vector_algorithms.cpp Outdated
Comment thread stl/src/vector_algorithms.cpp Outdated
@StephanTLavavej StephanTLavavej removed their assignment Apr 3, 2026
@StephanTLavavej StephanTLavavej moved this from Initial Review to Ready To Merge in STL Code Reviews Apr 3, 2026
@StephanTLavavej StephanTLavavej moved this from Ready To Merge to Merging in STL Code Reviews Apr 15, 2026
@StephanTLavavej

This comment was marked as outdated.

@StephanTLavavej StephanTLavavej moved this from Merging to Blocked in STL Code Reviews Apr 16, 2026
@StephanTLavavej StephanTLavavej added the blocked Something is preventing work on this label Apr 16, 2026
@StephanTLavavej

This comment was marked as resolved.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an SVE-backed implementation of std::replace for ARM64/ARM64EC, enabling vectorization for smaller element sizes where masked stores are available on SVE.

Changes:

  • Enable _VECTORIZED_REPLACE on ARM64/ARM64EC and add 1- and 2-byte replace entry points.
  • Implement SVE-based masked-load/compare/masked-store __std_replace_{1,2,4,8} in vector_algorithms.cpp.
  • Extend replace benchmarks to include uint8_t and uint16_t.
Show a summary per file
File Description
tests/std/tests/VSO_0000000_vector_algorithms/test.cpp Adjusts which vector algorithm tests are run under the ARM64EC “call all x64” configuration.
stl/src/vector_algorithms.cpp Adds SVE include and introduces SVE-based replace implementations for 1/2/4/8 byte elements on ARM64/ARM64EC.
stl/inc/xutility Enables replace vectorization for ARM64/ARM64EC and introduces _VECTORIZED_REPLACE_1_2.
stl/inc/algorithm Declares new __std_replace_1/2 and updates dispatch/safety logic to allow 1/2-byte vectorized replace on ARM.
benchmarks/src/replace.cpp Adds replace benchmarks for uint8_t and uint16_t.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 5/5 changed files
  • Comments generated: 2

Comment thread stl/inc/algorithm
Comment thread stl/src/vector_algorithms.cpp
@StephanTLavavej StephanTLavavej removed the blocked Something is preventing work on this label Jun 22, 2026
@StephanTLavavej StephanTLavavej moved this from Blocked to Ready To Merge in STL Code Reviews Jun 22, 2026
@StephanTLavavej

Copy link
Copy Markdown
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ARM64EC I can't believe it's not x64! ARM64 Related to the ARM64 architecture performance Must go faster

Projects

Status: Ready To Merge

Development

Successfully merging this pull request may close these issues.

4 participants