Fix MSan use-of-uninitialized-value in SimSIMD SVE functions by alexey-milovidov · Pull Request #101239 · ClickHouse/ClickHouse · GitHub
Skip to content

Fix MSan use-of-uninitialized-value in SimSIMD SVE functions#101239

Merged
alexey-milovidov merged 18 commits into
masterfrom
fix-simsimd-sve-msan
Apr 7, 2026
Merged

Fix MSan use-of-uninitialized-value in SimSIMD SVE functions#101239
alexey-milovidov merged 18 commits into
masterfrom
fix-simsimd-sve-msan

Conversation

@alexey-milovidov

@alexey-milovidov alexey-milovidov commented Mar 30, 2026

Copy link
Copy Markdown
Member

Disable SimSIMD SVE to fix MSan use-of-uninitialized-value on ARM.

LLVM's MSan cannot instrument ARM SVE scalable vector types — it emits unconditional __msan_warning_noreturn at every SVE function entry, making all SVE code paths instant-abort under MSan. Disable SVE/SVE2 compilation in SimSIMD when building with MSan. The NEON fallback handles everything correctly.

Unsuccessful attempts

The following submodule-level fixes were tried but proved ineffective because MSan aborts at SVE function entry before any user code executes:

  1. _x_m for SVE accumulators (contrib PRs #18, upstream #331): Changed svmla_*_x to svmla_*_m so inactive lanes preserve the accumulator value. Semantically correct but does not help MSan.

  2. SIMSIMD_UNPOISON after svaddv reductions (contrib PRs #19, #21, upstream #342): Added __msan_unpoison calls after every svaddv scalar reduction. The functions never reached these calls because MSan aborts at entry.

Fixes #101232

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100177&sha=a7031a78dd031fc0e90fbeebc0c95df386c5c1a7&name_0=PR&name_1=Stress%20test%20%28arm_msan%29
#100177

Changelog category (leave one):

  • CI Fix or Improvement (changelog entry is not required)

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

...

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Version info

  • Merged into: 26.4.1.578

Update SimSIMD submodule to include fix for `svmla_*_x` / `svmls_*_x`
in SVE accumulator operations. The `_x` (don't-care) variant left
inactive lanes undefined, but `svaddv(svptrue, ...)` summed all lanes
including those undefined ones. Changed to `_m` (merge) which preserves
the accumulator value for inactive lanes.

Fixes #101232

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=100177&sha=a7031a78dd031fc0e90fbeebc0c95df386c5c1a7&name_0=PR&name_1=Stress%20test%20%28arm_msan%29

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@clickhouse-gh

clickhouse-gh Bot commented Mar 30, 2026

Copy link
Copy Markdown
Contributor

@clickhouse-gh clickhouse-gh Bot added pr-ci submodule changed At least one submodule changed in this PR. labels Mar 30, 2026
@nikitamikhaylov

Copy link
Copy Markdown
Member

@groeneai It was right to update the submodule rather than disabling the MSAN in the SimSIMD like you did here #100862

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@groeneai

Copy link
Copy Markdown
Contributor

Thanks @nikitamikhaylov — agreed, the submodule update fixing the _x_m SVE accumulator variants is the proper approach since it addresses the root cause in SimSIMD directly. Our #100862 was a workaround (disable SVE under MSAN) that we already closed in favor of this.

@nikitamikhaylov

Copy link
Copy Markdown
Member

@groeneai The issue still reproduces. Do you know what's wrong here?

The previous approach (unpoisoning `*result` after the dispatch wrapper
returns in lib.c) did not help because MSan reports the error inside the
SVE function body before it returns to the dispatch layer.

Add `SIMSIMD_UNPOISON` calls after every `svaddv` scalar reduction inside
SVE function implementations. This prevents MSan-tainted values from
propagating to normalize functions and result stores.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=101239&sha=a43cadd697dc5f4dbe13fd8481164dfac8be525e&name_0=PR&name_1=Stress%20test%20%28arm_msan%29

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@groeneai

groeneai commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

@nikitamikhaylov The PR has never actually built — every CI run fails at Checkout Submodules because the SimSIMD submodule pointer (324864d39356) references a commit that no longer exists on ClickHouse/SimSIMD:

gh api repos/ClickHouse/SimSIMD/commits/324864d393566df939e845a78a81902a25ee2a11
→ 422: "No commit found for SHA: 324864d393566df939e845a78a81902a25ee2a11"

The fix commit exists at a different hash — it was likely force-pushed. The correct commit is 5a0677af82f7e77afae43045ae7c01debf21625a on the ClickHouse/v6.5.15 branch:

gh api repos/ClickHouse/SimSIMD/commits/5a0677af
→ "Fix use-of-uninitialized-value in SVE accumulator operations (#18)"

That commit has the right fix: changes svmla_f32_xsvmla_f32_m (and similarly for f64, f16, bf16) in all SVE accumulator operations in spatial.h and dot.h. The _x (don't-care) suffix leaves inactive lanes undefined; the subsequent svaddv(svptrue, ...) sums all lanes including undefined ones, triggering MSan. The _m (merge) variant preserves the accumulator's initialized-zero value for inactive lanes, so the reduction is clean.

To fix: update the submodule pointer to 5a0677af82f7e77afae43045ae7c01debf21625a. That alone should resolve all the STID 1003-358c/326e/2410-543a/47e0 family (574 hits across 346 PRs in the last 30 days). The SIMSIMD_UNPOISON calls in the latest commit would then be unnecessary since the root cause is addressed.

alexey-milovidov and others added 3 commits April 4, 2026 18:40
The fix (ClickHouse/SimSIMD#18) was merged on the ClickHouse/v6.5.15
fork branch, replacing the temporary local patches with the proper
upstream solution that uses `_m` (merge) instead of `_x` (don't-care)
for SVE multiply-accumulate operations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@PedroTadim

Copy link
Copy Markdown
Member

@groeneai this failure in the last CI run https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=101239&sha=8abafa3f0d5fe90ff3bae820f1a3da291249d26e&name_0=PR&name_1=Stress+test+%28arm_msan%29 seems to be another issue with this submodule? Is there anything else to be fixed there?

WARNING: MemorySanitizer: use-of-uninitialized-value
---

Stack trace:
#0 0xab2b236c33a4 in simsimd_cos_f32_sve ci/tmp/build/./contrib/SimSIMD/include/simsimd/spatial.h:818:26
#1 0xab2b236bd250 in simsimd_cos_f32 ci/tmp/build/./contrib/SimSIMD/c/lib.c:185:1
#2 0xab2b236bd250 in simsimd_capabilities ci/tmp/build/./contrib/SimSIMD/c/lib.c:322:5
#3 0xab2b1527b2a0 in unum::usearch::metric_punned_t::configure_with_simsimd() ci/tmp/build/./contrib/usearch/include/usearch/index_plugins.hpp:1929:59
#4 0xab2b1527b2a0 in unum::usearch::metric_punned_t::builtin(unsigned long, unum::usearch::metric_kind_t, unum::usearch::scalar_kind_t) ci/tmp/build/./contrib/usearch/include/usearch/index_plugins.hpp:1779:21
#5 0xab2b1527b2a0 in unum::usearch::metric_punned_t::metric_punned_t(unsigned long, unum::usearch::metric_kind_t, unum::usearch::scalar_kind_t) ci/tmp/build/./contrib/usearch/include/usearch/index_plugins.hpp:1752:27
#6 0xab2b1527b2a0 in DB::USearchIndexWithSerialization::USearchIndexWithSerialization(unsigned long, unum::usearch::metric_kind_t, unum::usearch::scalar_kind_t, DB::UsearchHnswParams) ci/tmp/build/./src/Storages/MergeTree/MergeTreeIndexVectorSimilarity.cpp:117:28
#7 0xab2b152977a0 in DB::USearchIndexWithSerialization* std::__1::construct_at[abi:fe210105]<DB::USearchIndexWithSerialization, unsigned long const&, unum::usearch::metric_kind_t const&, unum::usearch::scalar_kind_t const&, DB::UsearchHnswParams const&, DB::USearchIndexWithSerialization*>(DB::USearchIndexWithSerialization*, unsigned long const&, unum::usearch::metric_kind_t const&, unum::usearch::scalar_kind_t const&, DB::UsearchHnswParams const&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__memory/construct_at.h:38:49
#8 0xab2b152977a0 in DB::USearchIndexWithSerialization* std::__1::__construct_at[abi:fe210105]<DB::USearchIndexWithSerialization, unsigned long const&, unum::usearch::metric_kind_t const&, unum::usearch::scalar_kind_t const&, DB::UsearchHnswParams const&, DB::USearchIndexWithSerialization*>(DB::USearchIndexWithSerialization*, unsigned long const&, unum::usearch::metric_kind_t const&, unum::usearch::scalar_kind_t const&, DB::UsearchHnswParams const&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__memory/construct_at.h:46:10
#9 0xab2b152977a0 in void std::__1::allocator_traits<std::__1::allocator<DB::USearchIndexWithSerialization>>::construct[abi:fe210105]<DB::USearchIndexWithSerialization, unsigned long const&, unum::usearch::metric_kind_t const&, unum::usearch::scalar_kind_t const&, DB::UsearchHnswParams const&, 0>(std::__1::allocator<DB::USearchIndexWithSerialization>&, DB::USearchIndexWithSerialization*, unsigned long const&, unum::usearch::metric_kind_t const&, unum::usearch::scalar_kind_t const&, DB::UsearchHnswParams const&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__memory/allocator_traits.h:302:5
#10 0xab2b152977a0 in std::__1::__shared_ptr_emplace<DB::USearchIndexWithSerialization, std::__1::allocator<DB::USearchIndexWithSerialization>>::__shared_ptr_emplace[abi:fe210105]<unsigned long const&, unum::usearch::metric_kind_t const&, unum::usearch::scalar_kind_t const&, DB::UsearchHnswParams const&, std::__1::allocator<DB::USearchIndexWithSerialization>, 0>(std::__1::allocator<DB::USearchIndexWithSerialization>, unsigned long const&, unum::usearch::metric_kind_t const&, unum::usearch::scalar_kind_t const&, DB::UsearchHnswParams const&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__memory/shared_ptr.h:162:5
#11 0xab2b152977a0 in std::__1::shared_ptr<DB::USearchIndexWithSerialization> std::__1::allocate_shared[abi:fe210105]<DB::USearchIndexWithSerialization, std::__1::allocator<DB::USearchIndexWithSerialization>, unsigned long const&, unum::usearch::metric_kind_t const&, unum::usearch::scalar_kind_t const&, DB::UsearchHnswParams const&, 0>(std::__1::allocator<DB::USearchIndexWithSerialization> const&, unsigned long const&, unum::usearch::metric_kind_t const&, unum::usearch::scalar_kind_t const&, DB::UsearchHnswParams const&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__memory/shared_ptr.h:736:51
#12 0xab2b152977a0 in std::__1::shared_ptr<DB::USearchIndexWithSerialization> std::__1::make_shared[abi:fe210105]<DB::USearchIndexWithSerialization, unsigned long const&, unum::usearch::metric_kind_t const&, unum::usearch::scalar_kind_t const&, DB::UsearchHnswParams const&, 0>(unsigned long const&, unum::usearch::metric_kind_t const&, unum::usearch::scalar_kind_t const&, DB::UsearchHnswParams const&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__memory/shared_ptr.h:744:10
#13 0xab2b1528726c in DB::MergeTreeIndexAggregatorVectorSimilarity::update(DB::Block const&, unsigned long*, unsigned long) ci/tmp/build/./src/Storages/MergeTree/MergeTreeIndexVectorSimilarity.cpp:411:17
#14 0xab2b14ff4c80 in DB::MergeTreeDataPartWriterOnDisk::calculateAndSerializeSkipIndices(DB::Block const&, std::__1::vector<DB::Granule, std::__1::allocator<DB::Granule>> const&) ci/tmp/build/./src/Storages/MergeTree/MergeTreeDataPartWriterOnDisk.cpp:260:42
#15 0xab2b14fdecf8 in DB::MergeTreeDataPartWriterCompact::writeDataBlockPrimaryIndexAndSkipIndices(DB::Block const&, std::__1::vector<DB::Granule, std::__1::allocator<DB::Granule>> const&) ci/tmp/build/./src/Storages/MergeTree/MergeTreeDataPartWriterCompact.cpp:267:5
#16 0xab2b14fe0a30 in DB::MergeTreeDataPartWriterCompact::finalizeIndexGranularity() ci/tmp/build/./src/Storages/MergeTree/MergeTreeDataPartWriterCompact.cpp:354:9
#17 0xab2b150ac9bc in DB::MergeTreeDataWriter::writeTempPartImpl(DB::BlockWithPartition&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>, DB::SourcePartsSetForPatch, std::__1::shared_ptr<DB::Context const>, unsigned long) ci/tmp/build/./src/Storages/MergeTree/MergeTreeDataWriter.cpp:950:10
#18 0xab2b150a135c in DB::MergeTreeDataWriter::writeTempPart(DB::BlockWithPartition&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const>, std::__1::shared_ptr<DB::Context const>) ci/tmp/build/./src/Storages/MergeTree/MergeTreeDataWriter.cpp:588:12
#19 0xab2b1553f0a4 in DB::MergeTreeSink::consume(DB::Chunk&) ci/tmp/build/./src/Storages/MergeTree/MergeTreeSink.cpp:144:25
#20 0xab2b17428fec in DB::SinkToStorage::onConsume(DB::Chunk) ci/tmp/build/./src/Processors/Sinks/SinkToStorage.cpp:10:5
#21 0xab2b1704f314 in DB::ExceptionKeepingTransform::work()::$_1::operator()() const ci/tmp/build/./src/Processors/Transforms/ExceptionKeepingTransform.cpp:136:51
#22 0xab2b1704f314 in std::__1::__invoke_result_impl<void, DB::ExceptionKeepingTransform::work()::$_1&>::type std::__1::__invoke[abi:fe210105]<DB::ExceptionKeepingTransform::work()::$_1&>(DB::ExceptionKeepingTransform::work()::$_1&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:87:27
#23 0xab2b1704f314 in void std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:fe210105]<DB::ExceptionKeepingTransform::work()::$_1&>(DB::ExceptionKeepingTransform::work()::$_1&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:342:5
#24 0xab2b1704f314 in void std::__1::__invoke_r[abi:fe210105]<void, DB::ExceptionKeepingTransform::work()::$_1&>(DB::ExceptionKeepingTransform::work()::$_1&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:348:10
#25 0xab2b1704f314 in void std::__1::__function::__policy_func<void ()>::__call_func[abi:fe210105]<DB::ExceptionKeepingTransform::work()::$_1>(std::__1::__function::__policy_storage const*) ci/tmp/build/./contrib/llvm-project/libcxx/include/__functional/function.h:450:12
#26 0xab2b1704ec2c in std::__1::__function::__policy_func<void ()>::operator()[abi:fe210105]() const ci/tmp/build/./contrib/llvm-project/libcxx/include/__functional/function.h:508:12
#27 0xab2b1704ec2c in std::__1::function<void ()>::operator()() const ci/tmp/build/./contrib/llvm-project/libcxx/include/__functional/function.h:772:10
#28 0xab2b1704ec2c in DB::runStep(std::__1::function<void ()>, std::__1::shared_ptr<DB::ThreadGroup>&) ci/tmp/build/./src/Processors/Transforms/ExceptionKeepingTransform.cpp:105:9
#29 0xab2b1704ddf4 in DB::ExceptionKeepingTransform::work() ci/tmp/build/./src/Processors/Transforms/ExceptionKeepingTransform.cpp:136:34
#30 0xab2b167ad0f4 in DB::executeJob(DB::ExecutingGraph::Node*, DB::ReadProgressCallback*) ci/tmp/build/./src/Processors/Executors/ExecutionThreadContext.cpp:53:26
#31 0xab2b167ad0f4 in DB::ExecutionThreadContext::executeTask() ci/tmp/build/./src/Processors/Executors/ExecutionThreadContext.cpp:102:9
#32 0xab2b1677f850 in DB::PipelineExecutor::executeStepImpl(unsigned long, DB::IAcquiredSlot*, std::__1::atomic<bool>*) ci/tmp/build/./src/Processors/Executors/PipelineExecutor.cpp:351:26
#33 0xab2b1677d9c4 in DB::PipelineExecutor::executeSingleThread(unsigned long, DB::IAcquiredSlot*) ci/tmp/build/./src/Processors/Executors/PipelineExecutor.cpp:279:5
#34 0xab2b1677d9c4 in DB::PipelineExecutor::executeImpl(unsigned long, bool) ci/tmp/build/./src/Processors/Executors/PipelineExecutor.cpp:602:13
#35 0xab2b1677d0d0 in DB::PipelineExecutor::execute(unsigned long, bool) ci/tmp/build/./src/Processors/Executors/PipelineExecutor.cpp:136:9
#36 0xab2b16779b34 in DB::CompletedPipelineExecutor::execute() ci/tmp/build/./src/Processors/Executors/CompletedPipelineExecutor.cpp:105:18
#37 0xab2b084f1f00 in DB::AsynchronousInsertQueue::processData(DB::AsynchronousInsertQueue::InsertQuery, std::__1::unique_ptr<DB::AsynchronousInsertQueue::InsertData, std::__1::default_delete<DB::AsynchronousInsertQueue::InsertData>>, std::__1::shared_ptr<DB::Context const>, std::__1::shared_ptr<DB::ThreadGroup>, DB::AsynchronousInsertQueue::QueueShardFlushTimeHistory&) ci/tmp/build/./src/Interpreters/AsynchronousInsertQueue.cpp:1207:28
#38 0xab2b084fdd4c in DB::AsynchronousInsertQueue::scheduleDataProcessingJob(DB::AsynchronousInsertQueue::InsertQuery const&, std::__1::unique_ptr<DB::AsynchronousInsertQueue::InsertData, std::__1::default_delete<DB::AsynchronousInsertQueue::InsertData>>, std::__1::shared_ptr<DB::Context const>, unsigned long, std::__1::shared_ptr<DB::ThreadGroup>)::$_0::operator()() ci/tmp/build/./src/Interpreters/AsynchronousInsertQueue.cpp:370:17
#39 0xab2b084fdd4c in std::__1::__invoke_result_impl<void, DB::AsynchronousInsertQueue::scheduleDataProcessingJob(DB::AsynchronousInsertQueue::InsertQuery const&, std::__1::unique_ptr<DB::AsynchronousInsertQueue::InsertData, std::__1::default_delete<DB::AsynchronousInsertQueue::InsertData>>, std::__1::shared_ptr<DB::Context const>, unsigned long, std::__1::shared_ptr<DB::ThreadGroup>)::$_0&>::type std::__1::__invoke[abi:fe210105]<DB::AsynchronousInsertQueue::scheduleDataProcessingJob(DB::AsynchronousInsertQueue::InsertQuery const&, std::__1::unique_ptr<DB::AsynchronousInsertQueue::InsertData, std::__1::default_delete<DB::AsynchronousInsertQueue::InsertData>>, std::__1::shared_ptr<DB::Context const>, unsigned long, std::__1::shared_ptr<DB::ThreadGroup>)::$_0&>(DB::AsynchronousInsertQueue::scheduleDataProcessingJob(DB::AsynchronousInsertQueue::InsertQuery const&, std::__1::unique_ptr<DB::AsynchronousInsertQueue::InsertData, std::__1::default_delete<DB::AsynchronousInsertQueue::InsertData>>, std::__1::shared_ptr<DB::Context const>, unsigned long, std::__1::shared_ptr<DB::ThreadGroup>)::$_0&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:87:27
#40 0xab2b084fdd4c in void std::__1::__invoke_void_return_wrapper<void, true>::__call[abi:fe210105]<DB::AsynchronousInsertQueue::scheduleDataProcessingJob(DB::AsynchronousInsertQueue::InsertQuery const&, std::__1::unique_ptr<DB::AsynchronousInsertQueue::InsertData, std::__1::default_delete<DB::AsynchronousInsertQueue::InsertData>>, std::__1::shared_ptr<DB::Context const>, unsigned long, std::__1::shared_ptr<DB::ThreadGroup>)::$_0&>(DB::AsynchronousInsertQueue::scheduleDataProcessingJob(DB::AsynchronousInsertQueue::InsertQuery const&, std::__1::unique_ptr<DB::AsynchronousInsertQueue::InsertData, std::__1::default_delete<DB::AsynchronousInsertQueue::InsertData>>, std::__1::shared_ptr<DB::Context const>, unsigned long, std::__1::shared_ptr<DB::ThreadGroup>)::$_0&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:342:5
#41 0xab2b084fdd4c in void std::__1::__invoke_r[abi:fe210105]<void, DB::AsynchronousInsertQueue::scheduleDataProcessingJob(DB::AsynchronousInsertQueue::InsertQuery const&, std::__1::unique_ptr<DB::AsynchronousInsertQueue::InsertData, std::__1::default_delete<DB::AsynchronousInsertQueue::InsertData>>, std::__1::shared_ptr<DB::Context const>, unsigned long, std::__1::shared_ptr<DB::ThreadGroup>)::$_0&>(DB::AsynchronousInsertQueue::scheduleDataProcessingJob(DB::AsynchronousInsertQueue::InsertQuery const&, std::__1::unique_ptr<DB::AsynchronousInsertQueue::InsertData, std::__1::default_delete<DB::AsynchronousInsertQueue::InsertData>>, std::__1::shared_ptr<DB::Context const>, unsigned long, std::__1::shared_ptr<DB::ThreadGroup>)::$_0&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__type_traits/invoke.h:348:10
#42 0xab2b084fdd4c in void std::__1::__function::__policy_func<void ()>::__call_func[abi:fe210105]<DB::AsynchronousInsertQueue::scheduleDataProcessingJob(DB::AsynchronousInsertQueue::InsertQuery const&, std::__1::unique_ptr<DB::AsynchronousInsertQueue::InsertData, std::__1::default_delete<DB::AsynchronousInsertQueue::InsertData>>, std::__1::shared_ptr<DB::Context const>, unsigned long, std::__1::shared_ptr<DB::ThreadGroup>)::$_0>(std::__1::__function::__policy_storage const*) ci/tmp/build/./contrib/llvm-project/libcxx/include/__functional/function.h:450:12
#43 0xab2af5b909f4 in std::__1::__function::__policy_func<void ()>::operator()[abi:fe210105]() const ci/tmp/build/./contrib/llvm-project/libcxx/include/__functional/function.h:508:12
#44 0xab2af5b909f4 in std::__1::function<void ()>::operator()() const ci/tmp/build/./contrib/llvm-project/libcxx/include/__functional/function.h:772:10
#45 0xab2af5b909f4 in ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::worker() ci/tmp/build/./src/Common/ThreadPool.cpp:799:17
#46 0xab2af5ba1298 in std::__1::__invoke_result_impl<void, void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&>::type std::__1::__invoke[abi:fe210105]<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__type_traits/invoke.h
#47 0xab2af5ba1298 in decltype(auto) std::__1::__apply_tuple_impl[abi:fe210105]<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&)(), std::__1::tuple<ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>&, 0ul>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&)(), std::__1::tuple<ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>&, std::__1::__tuple_indices<0ul>) ci/tmp/build/./contrib/llvm-project/libcxx/include/tuple:1380:5
#48 0xab2af5ba1298 in decltype(auto) std::__1::apply[abi:fe210105]<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&)(), std::__1::tuple<ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>&>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&)(), std::__1::tuple<ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>&) ci/tmp/build/./contrib/llvm-project/libcxx/include/tuple:1384:5
#49 0xab2af5ba1298 in ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*>(void (ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<ThreadFromGlobalPoolImpl<false, true>>::ThreadFromThreadPool*&&)::'lambda'()::operator()() ci/tmp/build/./src/Common/ThreadPool.h:312:13
#50 0xab2af5b89e74 in std::__1::__function::__policy_func<void ()>::operator()[abi:fe210105]() const ci/tmp/build/./contrib/llvm-project/libcxx/include/__functional/function.h:508:12
#51 0xab2af5b89e74 in std::__1::function<void ()>::operator()() const ci/tmp/build/./contrib/llvm-project/libcxx/include/__functional/function.h:772:10
#52 0xab2af5b89e74 in ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::worker() ci/tmp/build/./src/Common/ThreadPool.cpp:809:17
#53 0xab2af5b9b338 in std::__1::__invoke_result_impl<void, void (ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool*>::type std::__1::__invoke[abi:fe210105]<void (ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool*>(void (ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::*&&)(), ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool*&&) ci/tmp/build/./contrib/llvm-project/libcxx/include/__type_traits/invoke.h
#54 0xab2af5b9b338 in void std::__1::__thread_execute[abi:fe210105]<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool*, 2ul>(std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool*>&, std::__1::__tuple_indices<2ul>) ci/tmp/build/./contrib/llvm-project/libcxx/include/__thread/thread.h:159:3
#55 0xab2af5b9b338 in void* std::__1::__thread_proxy[abi:fe210105]<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct>>, void (ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool::*)(), ThreadPoolImpl<std::__1::thread>::ThreadFromThreadPool*>>(void*) ci/tmp/build/./contrib/llvm-project/libcxx/include/__thread/thread.h:168:3
#56 0xff7fa76f0394 in start_thread nptl/pthread_create.c:442:8
#57 0xff7fa7759e98  misc/../sysdeps/unix/sysv/linux/aarch64/clone.S:79

@groeneai

groeneai commented Apr 6, 2026

Copy link
Copy Markdown
Contributor

@PedroTadim Yes, this is a separate issue from the _x_m accumulator fix.

What the _x_m fix (in this PR) addresses:
The svmla_f32_x intrinsics used "don't-care" for inactive SVE lanes, meaning the final svaddv with svptrue summed garbage from those lanes. That was a real bug — fixed in commit 5a0677af by switching to svmla_f32_m.

What this new crash is:
The failure at spatial.h:818 is on svdupq_n_f32(0.f, 0.f, 0.f, 0.f) — this is an SVE vector initialization to all-zeros. It's a false positive: MSAN lacks SVE-specific shadow propagation, so it can't track that the output of svdupq_n_f32 is actually initialized. It flags the entire SVE vector as "uninitialized" and then reports it when downstream operations touch it.

Root cause — a lost fix:
This was previously addressed in the fork by commit 4a6beec6 which added __attribute__((no_sanitize("memory"))) to the SIMSIMD_PUBLIC and SIMSIMD_DYNAMIC macros in types.h:

#if defined(__has_feature) && __has_feature(memory_sanitizer)
#define _SIMSIMD_NO_SANITIZE_MEMORY __attribute__((no_sanitize("memory")))
#else
#define _SIMSIMD_NO_SANITIZE_MEMORY
#endif

#define SIMSIMD_PUBLIC __attribute__((unused, nonnull)) _SIMSIMD_NO_SANITIZE_MEMORY inline static

However, this change was lost during merge conflict resolution in merge commit 1ab7d5e0. Both the current master submodule (e3714957) and this PR's target (5a0677af) are missing it.

Fix: The _SIMSIMD_NO_SANITIZE_MEMORY attribute needs to be re-added to types.h in the SimSIMD fork. This is the only complete solution — MSAN fundamentally cannot track SVE (or any SIMD) intrinsics, so suppressing instrumentation for these functions is correct.

@PedroTadim

Copy link
Copy Markdown
Member

@PedroTadim Yes, this is a separate issue from the _x_m accumulator fix.

What the _x_m fix (in this PR) addresses: The svmla_f32_x intrinsics used "don't-care" for inactive SVE lanes, meaning the final svaddv with svptrue summed garbage from those lanes. That was a real bug — fixed in commit 5a0677af by switching to svmla_f32_m.

What this new crash is: The failure at spatial.h:818 is on svdupq_n_f32(0.f, 0.f, 0.f, 0.f) — this is an SVE vector initialization to all-zeros. It's a false positive: MSAN lacks SVE-specific shadow propagation, so it can't track that the output of svdupq_n_f32 is actually initialized. It flags the entire SVE vector as "uninitialized" and then reports it when downstream operations touch it.

Root cause — a lost fix: This was previously addressed in the fork by commit 4a6beec6 which added __attribute__((no_sanitize("memory"))) to the SIMSIMD_PUBLIC and SIMSIMD_DYNAMIC macros in types.h:

#if defined(__has_feature) && __has_feature(memory_sanitizer)
#define _SIMSIMD_NO_SANITIZE_MEMORY __attribute__((no_sanitize("memory")))
#else
#define _SIMSIMD_NO_SANITIZE_MEMORY
#endif

#define SIMSIMD_PUBLIC __attribute__((unused, nonnull)) _SIMSIMD_NO_SANITIZE_MEMORY inline static

However, this change was lost during merge conflict resolution in merge commit 1ab7d5e0. Both the current master submodule (e3714957) and this PR's target (5a0677af) are missing it.

Fix: The _SIMSIMD_NO_SANITIZE_MEMORY attribute needs to be re-added to types.h in the SimSIMD fork. This is the only complete solution — MSAN fundamentally cannot track SVE (or any SIMD) intrinsics, so suppressing instrumentation for these functions is correct.

Ok, can you add the _SIMSIMD_NO_SANITIZE_MEMORY macro back in this PR or a new one? Currently, this error happens very often in our CI.

@alexey-milovidov

Copy link
Copy Markdown
Member Author

@PedroTadim, that won't help.

groeneai added a commit to groeneai/ClickHouse that referenced this pull request Apr 7, 2026
groeneai added a commit to groeneai/ClickHouse that referenced this pull request Apr 7, 2026
groeneai added a commit to groeneai/ClickHouse that referenced this pull request Apr 7, 2026
groeneai added a commit to groeneai/ClickHouse that referenced this pull request Apr 7, 2026
groeneai added a commit to groeneai/ClickHouse that referenced this pull request Apr 7, 2026
groeneai added a commit to groeneai/ClickHouse that referenced this pull request Apr 7, 2026
groeneai added a commit to groeneai/ClickHouse that referenced this pull request Apr 7, 2026
groeneai added a commit to groeneai/ClickHouse that referenced this pull request Apr 7, 2026
groeneai added a commit to groeneai/ClickHouse that referenced this pull request Apr 7, 2026
groeneai added a commit to groeneai/ClickHouse that referenced this pull request Apr 7, 2026
target_compile_definitions(_simsimd PUBLIC SIMSIMD_DYNAMIC_DISPATCH)
# Disable SVE: LLVM's MemorySanitizer cannot instrument SVE scalable vector types,
# emitting unconditional `__msan_warning_noreturn` at every SVE function entry.
target_compile_definitions(_simsimd PRIVATE SIMSIMD_TARGET_SVE=0 SIMSIMD_TARGET_SVE2=0)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can disable it only for MSan?

@vitlibar

vitlibar commented Apr 7, 2026

Copy link
Copy Markdown
Member

alexey-milovidov added a commit that referenced this pull request Apr 11, 2026
The `read_stream_count_was_reduced` flag (added in 5f270da) is set
when ReadFromMergeTree produces fewer streams than requested. This
causes AggregatingStep to cap post-aggregation resize to the actual
stream count, which is correct for small data.

However, for read-in-order queries, the stream count is determined by
the number of parts and mark ranges, not by data size. When
`max_streams_to_max_threads_ratio` increases `requested_num_streams`
beyond the number of parts, the flag is incorrectly set. After
merge-sort reduces the pipeline to 1 stream, AggregatingStep then
refuses to expand to `max_threads`, losing parallelism.

Fix: skip setting the flag when `reader_settings.read_in_order` is
true.

https://s3.amazonaws.com/clickhouse-test-reports/json.html?PR=101239&sha=8abafa3f0d5fe90ff3bae820f1a3da291249d26e&name_0=PR&name_1=Stateless%20tests%20%28amd_debug%2C%20parallel%29
#101239

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Onyx2406 added a commit to Onyx2406/ClickHouse that referenced this pull request Apr 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocker This issue / pr blocks a new release major pr-ci pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MemorySanitizer: use-of-uninitialized-value (STID: 1003-358c)

8 participants