Add deterministic cuDNN max pooling by duncanriach · Pull Request #25269 · tensorflow/tensorflow · GitHub
Skip to content

Add deterministic cuDNN max pooling#25269

Merged
tensorflow-copybara merged 1 commit into
tensorflow:masterfrom
duncanriach:cudnn_deterministic_max_pooling
Jan 30, 2019
Merged

Add deterministic cuDNN max pooling#25269
tensorflow-copybara merged 1 commit into
tensorflow:masterfrom
duncanriach:cudnn_deterministic_max_pooling

Conversation

@duncanriach

@duncanriach duncanriach commented Jan 29, 2019

Copy link
Copy Markdown
Contributor

When the TF_CUDNN_DETERMINISTIC environment variable is set to 1/true, cuDNN max pooling is performed using a deterministic algorithm.

This current pull request follows-on from pull request 24747 which dealt only with the convolution algorithms.

This current pull request completes the functionality controlled by TF_CUDNN_DETERMINISTIC, so the TODO from the previous pull request is also removed.

@rthadur rthadur self-assigned this Jan 29, 2019
@rthadur rthadur requested a review from yifeif January 29, 2019 02:00
@rthadur rthadur added the size:XS CL Change Size: Extra Small label Jan 29, 2019
@tensorflow-copybara tensorflow-copybara requested review from timshen91 and removed request for yifeif January 29, 2019 02:28
@tensorflow-copybara

Copy link
Copy Markdown
Collaborator

@rthadur rthadur added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Jan 29, 2019

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps const auto?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't get a chance to address this before the merge. I agree that adding the const would have been preferable.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with either way. Not having const is shorter, and happens to be consistent with other places. The variable is used only once, and it's immediately below, so const doesn't actually help much.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better code to have const. I can fix all of them if you can guide me to them.

@rthadur rthadur added kokoro:force-run Tests on submitted change and removed kokoro:force-run Tests on submitted change labels Jan 30, 2019
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Jan 30, 2019
@tensorflow-copybara tensorflow-copybara merged commit 9e78e8a into tensorflow:master Jan 30, 2019
tensorflow-copybara pushed a commit that referenced this pull request Jan 30, 2019
…_max_pooling

PiperOrigin-RevId: 231661364
@duncanriach duncanriach deleted the cudnn_deterministic_max_pooling branch October 17, 2019 20:21
copybara-service Bot pushed a commit that referenced this pull request Apr 15, 2025
Imported from GitHub PR openxla/xla#25269

Reported issue:
```
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //xla/service:compiler_test_gpu_amd_any
-----------------------------------------------------------------------------
Running test /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-dbg/bin/xla/service/compiler_test_gpu_amd_any.runfiles/xla/xla/service/compiler_test_gpu_amd_any --gtest_shuffle --gtest_fail_if_no_test_linked on GPU 0
=================================================================
==168009==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x50400002c1c0 at pc 0x7f59e50b52e7 bp 0x7ffc8c2358d0 sp 0x7ffc8c2358c8
READ of size 8 at 0x50400002c1c0 thread T0
    #0 0x7f59e50b52e6 in absl::lts_20230802::container_internal::CommonFields::capacity() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:990:36
    #1 0x7f59e50b52e6 in absl::lts_20230802::container_internal::probe(absl::lts_20230802::container_internal::CommonFields const&, unsigned long) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1298:41
    #2 0x7f59e50b52e6 in std::pair<unsigned long, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::find_or_prepare_insert<std::tuple<std::type_index, void*>>(std::tuple<std::type_index, void*> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2645:16
    #3 0x7f59e50af8a8 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable::operator()<std::tuple<std::type_index, void*>, std::piecewise_construct_t const&, std::tuple<std::tuple<std::type_index, void*>&&>, std::tuple<stream_executor::MultiKernelLoaderSpec&&>>(std::tuple<std::type_index, void*> const&, std::piecewise_construct_t const&, std::tuple<std::tuple<std::type_index, void*>&&>&&, std::tuple<stream_executor::MultiKernelLoaderSpec&&>&&) const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2459:20
    #4 0x7f59e50af8a8 in decltype(std::declval<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>()(std::declval<std::tuple<std::type_index, void*>&& const&>(), std::piecewise_construct, std::declval<std::tuple<std::tuple<std::type_index, void*>&&>>(), std::declval<std::tuple<stream_executor::MultiKernelLoaderSpec&&>>())) absl::lts_20230802::container_internal::memory_internal::DecomposePairImpl<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::tuple<std::type_index, void*>&&, std::tuple<stream_executor::MultiKernelLoaderSpec&&>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::tuple<std::type_index, void*>&&>, std::tuple<stream_executor::MultiKernelLoaderSpec&&>>) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/container_memory.h:140:10
    #5 0x7f59e50af8a8 in decltype(memory_internal::DecomposePairImpl(std::forward<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>(fp), PairArgs(std::forward<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(fp0)))) absl::lts_20230802::container_internal::DecomposePair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/container_memory.h:207:10
    #6 0x7f59e50af8a8 in decltype(absl::container_internal::DecomposePair(std::declval<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>(), std::declval<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>())) absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>::apply<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/flat_hash_map.h:591:12
    #7 0x7f59e50af8a8 in decltype(absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>::apply(std::forward<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>(fp), std::forward<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(fp0))) absl::lts_20230802::container_internal::hash_policy_traits<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, void>::apply<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/hash_policy_traits.h:134:12
    #8 0x7f59e50af8a8 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::emplace<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, 0>(std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2064:12
    #9 0x7f59e50af8a8 in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::insert(std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1991:12
    #10 0x7f59e50af8a8 in stream_executor::gpu::GpuKernelRegistry::RegisterKernel(std::type_info const&, void*, stream_executor::MultiKernelLoaderSpec const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/gpu/gpu_kernel_registry.cc:67:45
    #11 0x7f59e50d1982 in absl::lts_20230802::Status stream_executor::gpu::GpuKernelRegistry::RegisterKernel<stream_executor::gpu::MakeBatchPointersKernel>(void*, stream_executor::MultiKernelLoaderSpec const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/stream_executor/gpu/gpu_kernel_registry.h:86:12
    #12 0x7f59e50d1982 in RegisterKernelMakeBatchPointersKernelRocmImpl() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1
    #13 0x7f59e50d1982 in 'lambda'()::operator()() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1
    #14 0x7f59e50d1982 in 'lambda'()::__invoke() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1
    #15 0x7f59e50d1982 in stream_executor::port::Initializer::Initializer(void (*)()) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/stream_executor/platform/default/initialize.h:26:42
    #16 0x7f59e50d1982 in __cxx_global_var_init.1 /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1
    #17 0x7f59e50d1982 in _GLOBAL__sub_I_make_batch_pointers_kernel_rocm.cu.cc /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc
    #18 0x7f5a5b27a47d in call_init elf/dl-init.c:70:3
    #19 0x7f5a5b27a567 in call_init elf/dl-init.c:33:6
    #20 0x7f5a5b27a567 in _dl_init elf/dl-init.c:117:5
    #21 0x7f5a5b2942c9  (/lib64/ld-linux-x86-64.so.2+0x202c9) (BuildId: e4de036b19e4768e7591b596c4be9f9015f2d28a)

0x50400002c1c0 is located 8 bytes after 40-byte region [0x50400002c190,0x50400002c1b8)
allocated by thread T0 here:
    #0 0x557d0f77fcdf in malloc (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-dbg/bin/xla/service/compiler_test_gpu_amd_any+0x1e8cdf) (BuildId: e96972f8c7f880083ff6ad5985d3c06d)
    #1 0x7f59d733098b in operator new(unsigned long) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xae98b) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2)

SUMMARY: AddressSanitizer: heap-buffer-overflow /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:990:36 in absl::lts_20230802::container_internal::CommonFields::capacity() const
Shadow bytes around the buggy address:
  0x50400002bf00: fa fa fd fd fd fd fd fd fa fa fd fd fd fd fd fa
  0x50400002bf80: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x50400002c000: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 00 fa
  0x50400002c080: fa fa fd fd fd fd fd fd fa fa 00 00 00 00 00 00
  0x50400002c100: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
=>0x50400002c180: fa fa 00 00 00 00 00 fa[fa]fa fa fa fa fa fa fa
  0x50400002c200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x50400002c280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x50400002c300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x50400002c380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x50400002c400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==168009==ABORTING
```

Why this fixes the issue:
* Consider compiling this class into a different .so files where this function will get inlined and we will get different instances while we still want to have singleton.
* In rocm compiler wrapper script we do not yet support sanitizer flags so our cu.cc files are not getting instrumented while our normal cc files do! This might cause a memory disalignment while running with asan (theory).

Copybara import of the project:

--
ffcd58918137191cdba6db571e0e5af0e57de2e1 by alekstheod <atheodor@amd.com>:

Fix asan issue do to a singleton in header file

Merging this change closes #25269

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#25269 from ROCm:ci_fix_singleton_in_header_file ffcd58918137191cdba6db571e0e5af0e57de2e1
PiperOrigin-RevId: 747815699
copybara-service Bot pushed a commit that referenced this pull request Apr 15, 2025
Imported from GitHub PR openxla/xla#25269

Reported issue:
```
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //xla/service:compiler_test_gpu_amd_any
-----------------------------------------------------------------------------
Running test /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-dbg/bin/xla/service/compiler_test_gpu_amd_any.runfiles/xla/xla/service/compiler_test_gpu_amd_any --gtest_shuffle --gtest_fail_if_no_test_linked on GPU 0
=================================================================
==168009==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x50400002c1c0 at pc 0x7f59e50b52e7 bp 0x7ffc8c2358d0 sp 0x7ffc8c2358c8
READ of size 8 at 0x50400002c1c0 thread T0
    #0 0x7f59e50b52e6 in absl::lts_20230802::container_internal::CommonFields::capacity() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:990:36
    #1 0x7f59e50b52e6 in absl::lts_20230802::container_internal::probe(absl::lts_20230802::container_internal::CommonFields const&, unsigned long) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1298:41
    #2 0x7f59e50b52e6 in std::pair<unsigned long, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::find_or_prepare_insert<std::tuple<std::type_index, void*>>(std::tuple<std::type_index, void*> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2645:16
    #3 0x7f59e50af8a8 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable::operator()<std::tuple<std::type_index, void*>, std::piecewise_construct_t const&, std::tuple<std::tuple<std::type_index, void*>&&>, std::tuple<stream_executor::MultiKernelLoaderSpec&&>>(std::tuple<std::type_index, void*> const&, std::piecewise_construct_t const&, std::tuple<std::tuple<std::type_index, void*>&&>&&, std::tuple<stream_executor::MultiKernelLoaderSpec&&>&&) const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2459:20
    #4 0x7f59e50af8a8 in decltype(std::declval<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>()(std::declval<std::tuple<std::type_index, void*>&& const&>(), std::piecewise_construct, std::declval<std::tuple<std::tuple<std::type_index, void*>&&>>(), std::declval<std::tuple<stream_executor::MultiKernelLoaderSpec&&>>())) absl::lts_20230802::container_internal::memory_internal::DecomposePairImpl<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::tuple<std::type_index, void*>&&, std::tuple<stream_executor::MultiKernelLoaderSpec&&>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::tuple<std::type_index, void*>&&>, std::tuple<stream_executor::MultiKernelLoaderSpec&&>>) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/container_memory.h:140:10
    #5 0x7f59e50af8a8 in decltype(memory_internal::DecomposePairImpl(std::forward<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>(fp), PairArgs(std::forward<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(fp0)))) absl::lts_20230802::container_internal::DecomposePair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/container_memory.h:207:10
    #6 0x7f59e50af8a8 in decltype(absl::container_internal::DecomposePair(std::declval<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>(), std::declval<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>())) absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>::apply<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/flat_hash_map.h:591:12
    #7 0x7f59e50af8a8 in decltype(absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>::apply(std::forward<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>(fp), std::forward<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(fp0))) absl::lts_20230802::container_internal::hash_policy_traits<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, void>::apply<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/hash_policy_traits.h:134:12
    #8 0x7f59e50af8a8 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::emplace<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, 0>(std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2064:12
    #9 0x7f59e50af8a8 in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::insert(std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1991:12
    #10 0x7f59e50af8a8 in stream_executor::gpu::GpuKernelRegistry::RegisterKernel(std::type_info const&, void*, stream_executor::MultiKernelLoaderSpec const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/gpu/gpu_kernel_registry.cc:67:45
    #11 0x7f59e50d1982 in absl::lts_20230802::Status stream_executor::gpu::GpuKernelRegistry::RegisterKernel<stream_executor::gpu::MakeBatchPointersKernel>(void*, stream_executor::MultiKernelLoaderSpec const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/stream_executor/gpu/gpu_kernel_registry.h:86:12
    #12 0x7f59e50d1982 in RegisterKernelMakeBatchPointersKernelRocmImpl() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1
    #13 0x7f59e50d1982 in 'lambda'()::operator()() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1
    #14 0x7f59e50d1982 in 'lambda'()::__invoke() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1
    #15 0x7f59e50d1982 in stream_executor::port::Initializer::Initializer(void (*)()) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/stream_executor/platform/default/initialize.h:26:42
    #16 0x7f59e50d1982 in __cxx_global_var_init.1 /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1
    #17 0x7f59e50d1982 in _GLOBAL__sub_I_make_batch_pointers_kernel_rocm.cu.cc /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc
    #18 0x7f5a5b27a47d in call_init elf/dl-init.c:70:3
    #19 0x7f5a5b27a567 in call_init elf/dl-init.c:33:6
    #20 0x7f5a5b27a567 in _dl_init elf/dl-init.c:117:5
    #21 0x7f5a5b2942c9  (/lib64/ld-linux-x86-64.so.2+0x202c9) (BuildId: e4de036b19e4768e7591b596c4be9f9015f2d28a)

0x50400002c1c0 is located 8 bytes after 40-byte region [0x50400002c190,0x50400002c1b8)
allocated by thread T0 here:
    #0 0x557d0f77fcdf in malloc (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-dbg/bin/xla/service/compiler_test_gpu_amd_any+0x1e8cdf) (BuildId: e96972f8c7f880083ff6ad5985d3c06d)
    #1 0x7f59d733098b in operator new(unsigned long) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xae98b) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2)

SUMMARY: AddressSanitizer: heap-buffer-overflow /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:990:36 in absl::lts_20230802::container_internal::CommonFields::capacity() const
Shadow bytes around the buggy address:
  0x50400002bf00: fa fa fd fd fd fd fd fd fa fa fd fd fd fd fd fa
  0x50400002bf80: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x50400002c000: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 00 fa
  0x50400002c080: fa fa fd fd fd fd fd fd fa fa 00 00 00 00 00 00
  0x50400002c100: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
=>0x50400002c180: fa fa 00 00 00 00 00 fa[fa]fa fa fa fa fa fa fa
  0x50400002c200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x50400002c280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x50400002c300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x50400002c380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x50400002c400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==168009==ABORTING
```

Why this fixes the issue:
* Consider compiling this class into a different .so files where this function will get inlined and we will get different instances while we still want to have singleton.
* In rocm compiler wrapper script we do not yet support sanitizer flags so our cu.cc files are not getting instrumented while our normal cc files do! This might cause a memory disalignment while running with asan (theory).

Copybara import of the project:

--
ffcd58918137191cdba6db571e0e5af0e57de2e1 by alekstheod <atheodor@amd.com>:

Fix asan issue do to a singleton in header file

Merging this change closes #25269

PiperOrigin-RevId: 747823659
copybara-service Bot pushed a commit that referenced this pull request Apr 15, 2025
Imported from GitHub PR openxla/xla#25269

Reported issue:
```
exec ${PAGER:-/usr/bin/less} "$0" || exit 1
Executing tests from //xla/service:compiler_test_gpu_amd_any
-----------------------------------------------------------------------------
Running test /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-dbg/bin/xla/service/compiler_test_gpu_amd_any.runfiles/xla/xla/service/compiler_test_gpu_amd_any --gtest_shuffle --gtest_fail_if_no_test_linked on GPU 0
=================================================================
==168009==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x50400002c1c0 at pc 0x7f59e50b52e7 bp 0x7ffc8c2358d0 sp 0x7ffc8c2358c8
READ of size 8 at 0x50400002c1c0 thread T0
    #0 0x7f59e50b52e6 in absl::lts_20230802::container_internal::CommonFields::capacity() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:990:36
    #1 0x7f59e50b52e6 in absl::lts_20230802::container_internal::probe(absl::lts_20230802::container_internal::CommonFields const&, unsigned long) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1298:41
    #2 0x7f59e50b52e6 in std::pair<unsigned long, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::find_or_prepare_insert<std::tuple<std::type_index, void*>>(std::tuple<std::type_index, void*> const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2645:16
    #3 0x7f59e50af8a8 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable::operator()<std::tuple<std::type_index, void*>, std::piecewise_construct_t const&, std::tuple<std::tuple<std::type_index, void*>&&>, std::tuple<stream_executor::MultiKernelLoaderSpec&&>>(std::tuple<std::type_index, void*> const&, std::piecewise_construct_t const&, std::tuple<std::tuple<std::type_index, void*>&&>&&, std::tuple<stream_executor::MultiKernelLoaderSpec&&>&&) const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2459:20
    #4 0x7f59e50af8a8 in decltype(std::declval<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>()(std::declval<std::tuple<std::type_index, void*>&& const&>(), std::piecewise_construct, std::declval<std::tuple<std::tuple<std::type_index, void*>&&>>(), std::declval<std::tuple<stream_executor::MultiKernelLoaderSpec&&>>())) absl::lts_20230802::container_internal::memory_internal::DecomposePairImpl<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::tuple<std::type_index, void*>&&, std::tuple<stream_executor::MultiKernelLoaderSpec&&>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::tuple<std::type_index, void*>&&>, std::tuple<stream_executor::MultiKernelLoaderSpec&&>>) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/container_memory.h:140:10
    #5 0x7f59e50af8a8 in decltype(memory_internal::DecomposePairImpl(std::forward<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>(fp), PairArgs(std::forward<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(fp0)))) absl::lts_20230802::container_internal::DecomposePair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/container_memory.h:207:10
    #6 0x7f59e50af8a8 in decltype(absl::container_internal::DecomposePair(std::declval<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>(), std::declval<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>())) absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>::apply<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/flat_hash_map.h:591:12
    #7 0x7f59e50af8a8 in decltype(absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>::apply(std::forward<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable>(fp), std::forward<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(fp0))) absl::lts_20230802::container_internal::hash_policy_traits<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, void>::apply<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>>(absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::EmplaceDecomposable&&, std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/hash_policy_traits.h:134:12
    #8 0x7f59e50af8a8 in std::pair<absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::iterator, bool> absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::emplace<std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, 0>(std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:2064:12
    #9 0x7f59e50af8a8 in absl::lts_20230802::container_internal::raw_hash_set<absl::lts_20230802::container_internal::FlatHashMapPolicy<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>, absl::lts_20230802::hash_internal::Hash<std::tuple<std::type_index, void*>>, std::equal_to<std::tuple<std::type_index, void*>>, std::allocator<std::pair<std::tuple<std::type_index, void*> const, stream_executor::MultiKernelLoaderSpec>>>::insert(std::pair<std::tuple<std::type_index, void*>, stream_executor::MultiKernelLoaderSpec>&&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:1991:12
    #10 0x7f59e50af8a8 in stream_executor::gpu::GpuKernelRegistry::RegisterKernel(std::type_info const&, void*, stream_executor::MultiKernelLoaderSpec const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/gpu/gpu_kernel_registry.cc:67:45
    #11 0x7f59e50d1982 in absl::lts_20230802::Status stream_executor::gpu::GpuKernelRegistry::RegisterKernel<stream_executor::gpu::MakeBatchPointersKernel>(void*, stream_executor::MultiKernelLoaderSpec const&) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/stream_executor/gpu/gpu_kernel_registry.h:86:12
    #12 0x7f59e50d1982 in RegisterKernelMakeBatchPointersKernelRocmImpl() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1
    #13 0x7f59e50d1982 in 'lambda'()::operator()() const /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1
    #14 0x7f59e50d1982 in 'lambda'()::__invoke() /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1
    #15 0x7f59e50d1982 in stream_executor::port::Initializer::Initializer(void (*)()) /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/./xla/stream_executor/platform/default/initialize.h:26:42
    #16 0x7f59e50d1982 in __cxx_global_var_init.1 /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc:35:1
    #17 0x7f59e50d1982 in _GLOBAL__sub_I_make_batch_pointers_kernel_rocm.cu.cc /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/xla/stream_executor/rocm/make_batch_pointers_kernel_rocm.cu.cc
    #18 0x7f5a5b27a47d in call_init elf/dl-init.c:70:3
    #19 0x7f5a5b27a567 in call_init elf/dl-init.c:33:6
    #20 0x7f5a5b27a567 in _dl_init elf/dl-init.c:117:5
    #21 0x7f5a5b2942c9  (/lib64/ld-linux-x86-64.so.2+0x202c9) (BuildId: e4de036b19e4768e7591b596c4be9f9015f2d28a)

0x50400002c1c0 is located 8 bytes after 40-byte region [0x50400002c190,0x50400002c1b8)
allocated by thread T0 here:
    #0 0x557d0f77fcdf in malloc (/root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/bazel-out/k8-dbg/bin/xla/service/compiler_test_gpu_amd_any+0x1e8cdf) (BuildId: e96972f8c7f880083ff6ad5985d3c06d)
    #1 0x7f59d733098b in operator new(unsigned long) (/lib/x86_64-linux-gnu/libstdc++.so.6+0xae98b) (BuildId: e37fe1a879783838de78cbc8c80621fa685d58a2)

SUMMARY: AddressSanitizer: heap-buffer-overflow /root/.cache/bazel/_bazel_root/f367074f9120c6f1a67d35844ac058a3/execroot/xla/external/com_google_absl/absl/container/internal/raw_hash_set.h:990:36 in absl::lts_20230802::container_internal::CommonFields::capacity() const
Shadow bytes around the buggy address:
  0x50400002bf00: fa fa fd fd fd fd fd fd fa fa fd fd fd fd fd fa
  0x50400002bf80: fa fa fd fd fd fd fd fa fa fa fd fd fd fd fd fa
  0x50400002c000: fa fa fd fd fd fd fd fa fa fa 00 00 00 00 00 fa
  0x50400002c080: fa fa fd fd fd fd fd fd fa fa 00 00 00 00 00 00
  0x50400002c100: fa fa 00 00 00 00 00 00 fa fa 00 00 00 00 00 00
=>0x50400002c180: fa fa 00 00 00 00 00 fa[fa]fa fa fa fa fa fa fa
  0x50400002c200: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x50400002c280: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x50400002c300: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x50400002c380: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x50400002c400: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==168009==ABORTING
```

Why this fixes the issue:
* Consider compiling this class into a different .so files where this function will get inlined and we will get different instances while we still want to have singleton.
* In rocm compiler wrapper script we do not yet support sanitizer flags so our cu.cc files are not getting instrumented while our normal cc files do! This might cause a memory disalignment while running with asan (theory).

Copybara import of the project:

--
ffcd58918137191cdba6db571e0e5af0e57de2e1 by alekstheod <atheodor@amd.com>:

Fix asan issue do to a singleton in header file

Merging this change closes #25269

FUTURE_COPYBARA_INTEGRATE_REVIEW=openxla/xla#25269 from ROCm:ci_fix_singleton_in_header_file ffcd58918137191cdba6db571e0e5af0e57de2e1
PiperOrigin-RevId: 747833213
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla: yes ready to pull PR ready for merge process size:XS CL Change Size: Extra Small

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants