Unpoison SVE scalar reductions for MemorySanitizer by alexey-milovidov · Pull Request #19 · ClickHouse/SimSIMD · GitHub
Skip to content

Unpoison SVE scalar reductions for MemorySanitizer#19

Merged
alexey-milovidov merged 283 commits into
mainfrom
fix-msan-unpoison-sve-reductions
Apr 6, 2026
Merged

Unpoison SVE scalar reductions for MemorySanitizer#19
alexey-milovidov merged 283 commits into
mainfrom
fix-msan-unpoison-sve-reductions

Conversation

@alexey-milovidov

Copy link
Copy Markdown
Member

LLVM's MSan does not instrument ARM SVE intrinsics — there is zero handling of aarch64_sve_* in MemorySanitizer.cpp (as of LLVM 20), only a file-level FIXME: This sanitizer does not yet handle scalable vectors. This causes false positives when svaddv produces scalar results that MSan considers uninitialized.

Add SIMSIMD_UNPOISON calls after every svaddv reduction in SVE function bodies across spatial.h, dot.h, binary.h, and sparse.h. Define the macro in types.h with __msan_unpoison when MSan is active.

See also: llvm/llvm-project#165028

ashvardanian and others added 30 commits November 18, 2024 13:53
Previous AVX-512 implementation of complex products
used an extra ZMM register for `swap_adjacent_vec`.
Moreover, they used the `vpshufb` instruction available
only with the Ice Lake capability and newer.

The replacement uses the `_mm512_permute_ps` and
its double-precision variant.
ashvardanian and others added 23 commits December 20, 2025 23:21
Previous bit-cast would round towards zeros.
When the true value is close to an integer boundary (e.g., 8.0):
 - SciPy (float64): might get 8.0000000001 → truncates to 8
 - SimSIMD (float32): might get 7.9999998 → truncates to 7
### Patch

- Improve: Round integer distances (c487b55)
- Fix: Absolute tolerance bound for integers (73a9ff7)
- Make: Skip flaky Arm failures (6be67bb)
- Fix: NEON guard for u8 dot dispatch (2c5876d)
### Patch

- Make: Same upload/download CI versions (ae9e567)
…n#296)

GCC and Clang do not recognize the  target attribute.
The correct flag for AVX-VNNI (AVX2-VNNI) instructions is .

This fix enables SIMSIMD_TARGET_SIERRA to compile successfully with
both GCC and Clang, allowing runtime dispatch to select optimal
SIERRA kernels on compatible CPUs.

Changes:
- include/simsimd/dot.h: avx2vnni -> avxvnni in pragma directives
- include/simsimd/spatial.h: avx2vnni -> avxvnni in pragma directives

Tested with:
- GCC 15.2.0
- Clang 20.1.8

Co-authored-by: Seungwon Yang <seungwon.yang@navercorp.com>
### Patch

- Fix: Replace `avx2vnni` with `avxvnni` for Sierra Forest (ashvardanian#296) (a8bb232)
- Make: Remove `NPM_TOKEN` for OIDC publishing (13cd5bc)
- Make: Sign rebase with GitHub Actions bot (e7b89b5)
- Fix: Revert to `atol=1` for test integer outputs vs SciPy (b75bdbd)
Co-authored-by: Markus Graf <24669860+markusalbertgraf@users.noreply.github.com>
### Patch

- Fix: Wrong predicate width in BF16 SVE L2 kernel (ashvardanian#301) (87ae846)
- Improve: FreeBSD comp-time target selection (ashvardanian#300) (cb11f8b)
…n#302)

`simsimd_capabilities` probes SIMD instructions with an uninitialized
`dummy_input` buffer and `n=0`. SVE implementations use `do { ... } while`
loops that always execute the body once. MemorySanitizer doesn't understand
SVE predicated loads and reports use-of-uninitialized-value.

Initialize the buffer to zero to silence the false positive.

Co-authored-by: Alexey Milovidov <18581488+alexey-milovidov@users.noreply.github.com>
### Patch

- Fix: Initialize `dummy_input` to fix MSan false positive (ashvardanian#302) (c2ad842)
MSan cannot track initialization through SIMD intrinsics (SVE, NEON,
SSE, AVX), causing false-positive "use-of-uninitialized-value" reports.
Add `__msan_unpoison` calls after every dispatch macro to mark results
as initialized.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
MemorySanitizer cannot track data flow through SIMD intrinsics
(SVE, NEON, SSE, AVX), causing false-positive "use-of-uninitialized-value"
reports. For example, SVE predicated loads only access memory for active
lanes, but MSan sees the full vector width as a memory read, flagging
tail elements as uninitialized.

Add `__attribute__((no_sanitize("memory")))` to `SIMSIMD_PUBLIC` and
`SIMSIMD_DYNAMIC` macros when MSan is detected via `__has_feature`.
This disables MSan instrumentation for all SimSIMD functions, which is
appropriate since they are entirely SIMD code that MSan cannot analyze.

The previous approach of unpoisoning results after dispatch (in lib.c)
was insufficient because MSan aborts inside the function body before
the dispatch wrapper can unpoison the output.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
`simsimd_capabilities` probes SIMD functions with n=0 to pre-initialize
dispatch function pointers. SVE implementations use `do { } while (i < n)`
loops that always execute the body once, even with n=0. MemorySanitizer
instruments SVE predicated loads (`svld1_f32` etc.) as full-width vector
reads regardless of the predicate mask, so it reports use-of-uninitialized
memory when the buffer is smaller than the SIMD register width.

Increase the dummy buffer from 8 bytes (`double[1]`) to 256 bytes
(`double[32]`) to cover the widest possible SVE vector (2048 bits).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix MSan false positive: enlarge dummy buffer for SVE predicated loads
Include `<unistd.h>` for ARM POSIX capability detection
LLVM's MSan does not instrument ARM SVE intrinsics — there is zero
handling of `aarch64_sve_*` in MemorySanitizer.cpp (as of LLVM 20),
only a file-level "FIXME: This sanitizer does not yet handle scalable
vectors". This causes false positives when `svaddv` produces scalar
results that MSan considers uninitialized.

Add `SIMSIMD_UNPOISON` calls after every `svaddv` reduction in SVE
function bodies across spatial.h, dot.h, binary.h, and sparse.h.
Define the macro in types.h with `__msan_unpoison` when MSan is active.

See also: llvm/llvm-project#165028
@alexey-milovidov alexey-milovidov merged commit 541a4cd into main Apr 6, 2026
@CLAassistant

Copy link
Copy Markdown

alexey-milovidov added a commit to ClickHouse/ClickHouse that referenced this pull request Apr 6, 2026
LLVM's MSan does not instrument ARM SVE intrinsics (no handling of
`aarch64_sve_*` in MemorySanitizer.cpp as of LLVM 20). This causes
false positives when `svaddv` produces scalar results that MSan
considers uninitialized.

Add `SIMSIMD_UNPOISON` calls after every `svaddv` reduction in SVE
function bodies.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=101239&sha=8abafa3f0d5fe90ff3bae820f1a3da291249d26e&name_0=PR&name_1=Stress%20test%20%28arm_msan%29

Contrib PR: ClickHouse/SimSIMD#19
Upstream PR: ashvardanian/NumKong#342
alexey-milovidov added a commit to ClickHouse/ClickHouse that referenced this pull request Apr 6, 2026
LLVM's MSan does not instrument ARM SVE intrinsics (no handling of
`aarch64_sve_*` in MemorySanitizer.cpp as of LLVM 20). This causes
false positives when `svaddv` produces scalar results that MSan
considers uninitialized.

Add `SIMSIMD_UNPOISON` calls after every `svaddv` reduction in SVE
function bodies.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=101239&sha=8abafa3f0d5fe90ff3bae820f1a3da291249d26e&name_0=PR&name_1=Stress%20test%20%28arm_msan%29

Contrib PR: ClickHouse/SimSIMD#19
Upstream PR: ashvardanian/NumKong#342
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.