Fix MSan use-of-uninitialized-value in SimSIMD SVE functions#101239
Conversation
Update SimSIMD submodule to include fix for `svmla_*_x` / `svmls_*_x` in SVE accumulator operations. The `_x` (don't-care) variant left inactive lanes undefined, but `svaddv(svptrue, ...)` summed all lanes including those undefined ones. Changed to `_m` (merge) which preserves the accumulator value for inactive lanes. Fixes #101232 https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100177&sha=a7031a78dd031fc0e90fbeebc0c95df386c5c1a7&name_0=PR&name_1=Stress%20test%20%28arm_msan%29 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Thanks @nikitamikhaylov — agreed, the submodule update fixing the |
|
@groeneai The issue still reproduces. Do you know what's wrong here? |
The previous approach (unpoisoning `*result` after the dispatch wrapper returns in lib.c) did not help because MSan reports the error inside the SVE function body before it returns to the dispatch layer. Add `SIMSIMD_UNPOISON` calls after every `svaddv` scalar reduction inside SVE function implementations. This prevents MSan-tainted values from propagating to normalize functions and result stores. https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=101239&sha=a43cadd697dc5f4dbe13fd8481164dfac8be525e&name_0=PR&name_1=Stress%20test%20%28arm_msan%29 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@nikitamikhaylov The PR has never actually built — every CI run fails at Checkout Submodules because the SimSIMD submodule pointer ( The fix commit exists at a different hash — it was likely force-pushed. The correct commit is That commit has the right fix: changes To fix: update the submodule pointer to |
The fix (ClickHouse/SimSIMD#18) was merged on the ClickHouse/v6.5.15 fork branch, replacing the temporary local patches with the proper upstream solution that uses `_m` (merge) instead of `_x` (don't-care) for SVE multiply-accumulate operations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
@groeneai this failure in the last CI run https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=101239&sha=8abafa3f0d5fe90ff3bae820f1a3da291249d26e&name_0=PR&name_1=Stress+test+%28arm_msan%29 seems to be another issue with this submodule? Is there anything else to be fixed there? |
|
@PedroTadim Yes, this is a separate issue from the What the What this new crash is: Root cause — a lost fix: #if defined(__has_feature) && __has_feature(memory_sanitizer)
#define _SIMSIMD_NO_SANITIZE_MEMORY __attribute__((no_sanitize("memory")))
#else
#define _SIMSIMD_NO_SANITIZE_MEMORY
#endif
#define SIMSIMD_PUBLIC __attribute__((unused, nonnull)) _SIMSIMD_NO_SANITIZE_MEMORY inline staticHowever, this change was lost during merge conflict resolution in merge commit 1ab7d5e0. Both the current master submodule ( Fix: The |
Ok, can you add the _SIMSIMD_NO_SANITIZE_MEMORY macro back in this PR or a new one? Currently, this error happens very often in our CI. |
|
@PedroTadim, that won't help. |
| target_compile_definitions(_simsimd PUBLIC SIMSIMD_DYNAMIC_DISPATCH) | ||
| # Disable SVE: LLVM's MemorySanitizer cannot instrument SVE scalable vector types, | ||
| # emitting unconditional `__msan_warning_noreturn` at every SVE function entry. | ||
| target_compile_definitions(_simsimd PRIVATE SIMSIMD_TARGET_SVE=0 SIMSIMD_TARGET_SVE2=0) |
There was a problem hiding this comment.
Maybe we can disable it only for MSan?
The `read_stream_count_was_reduced` flag (added in 5f270da) is set when ReadFromMergeTree produces fewer streams than requested. This causes AggregatingStep to cap post-aggregation resize to the actual stream count, which is correct for small data. However, for read-in-order queries, the stream count is determined by the number of parts and mark ranges, not by data size. When `max_streams_to_max_threads_ratio` increases `requested_num_streams` beyond the number of parts, the flag is incorrectly set. After merge-sort reduces the pipeline to 1 stream, AggregatingStep then refuses to expand to `max_threads`, losing parallelism. Fix: skip setting the flag when `reader_settings.read_in_order` is true. https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=101239&sha=8abafa3f0d5fe90ff3bae820f1a3da291249d26e&name_0=PR&name_1=Stateless%20tests%20%28amd_debug%2C%20parallel%29 #101239 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…lickHouse#101994, ClickHouse#102158, ClickHouse#101952, ClickHouse#102148, ClickHouse#102008, ClickHouse#102010) # Conflicts: # src/Common/AsynchronousMetrics.cpp

Disable SimSIMD SVE to fix MSan use-of-uninitialized-value on ARM.
LLVM's MSan cannot instrument ARM SVE scalable vector types — it emits unconditional
__msan_warning_noreturnat every SVE function entry, making all SVE code paths instant-abort under MSan. Disable SVE/SVE2 compilation in SimSIMD when building with MSan. The NEON fallback handles everything correctly.Unsuccessful attempts
The following submodule-level fixes were tried but proved ineffective because MSan aborts at SVE function entry before any user code executes:
_x→_mfor SVE accumulators (contrib PRs #18, upstream #331): Changedsvmla_*_xtosvmla_*_mso inactive lanes preserve the accumulator value. Semantically correct but does not help MSan.SIMSIMD_UNPOISONaftersvaddvreductions (contrib PRs #19, #21, upstream #342): Added__msan_unpoisoncalls after everysvaddvscalar reduction. The functions never reached these calls because MSan aborts at entry.Fixes #101232
https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100177&sha=a7031a78dd031fc0e90fbeebc0c95df386c5c1a7&name_0=PR&name_1=Stress%20test%20%28arm_msan%29
#100177
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
...
Documentation entry for user-facing changes
Version info
26.4.1.578