iframe-proxy

copybara-service · 2026-06-25T05:04:14Z

Update XNNPACK in XLA

Commit history for google/XNNPACK (76de1380 -> 37e59cdb):

70889f15 Vaisakh K V: Added SME1 support for qp8-f32-qc4w gemm
ab8714dd Vaisakh K V: Merge remote-tracking branch 'google/master' into sme1/qp8-f32-qc4w-gemm
8649fdd1 aizu-m: validate reduction axes before nchw remap in static reduce
383840f8 Frank Barchard: Update XNNPACK elementwise benchmarks to use consistent N elements.
cced44b5 Dillon Sharlet: Add fp16 and bf16 implementations of exp, expm1, log, log1p, erf, tanh
6062dc08 waris ): fix-concat-oob-write
d4312603 Frank Barchard: Fix undeclared identifier XNN_SIMD_NUM_RCP_ITER_F16 in wasm relaxed simd fp16
2fef1fcf Alexander Shaposhnikov: Change cache key schema for bazel builds.
a1b0202e Frank Barchard: Add 2 bit SSE GEMM microkernels
be7bb974 Dillon Sharlet: Don't rewrite sum(a * b) => dot(a, b) if the dot would be a vector-vector multiply
85a6530d Misha Gutman: Added f32_qc8w operator level support for batch matrix multiply.
932ea64e Quentin Khan: Make hardware configuration and initialization guards (re)setable.
207dab62 Misha Gutman: Enabled f32_qc8w bmm on the subgraph level.
4bc0e6ab Quentin Khan: Remove useless test main function.
799de5c9 XNNPACK Team: Merge pull request Limit the number of images in a summary [feature request] #10373 from sin99xx:master
859e1a51 XNNPACK Team: Merge pull request Deadlock in MapDataset #10369 from aizu-m:reduce-axis-bounds
6f2d84ec Quentin Khan: Fix isomorphic matcher.
51af7f59 Quentin Khan: Make external tensor order follow topological traversal of the graph.
5fa627d5 Frank Barchard: f16 vlog switch from rational-3-3 to rational-1-3
eb93f24d Quentin Khan: Correctly zero-initialize xnn_unary_params.
98930bab Dillon Sharlet: Test reduce kernels with pi summation in both ascending and descending order
cf112cb8 Dillon Sharlet: Minor loop fusion improvements
68373b6a Dillon Sharlet: Disable F32QC8W tests when using YNNPACK
ce3b7c1b Frank Barchard: Fix XNNPACK compilation failure on Windows ARM64.
1f4e631e Misha Gutman: Rewrote bmm(f32, dequant(qint8)) -> f32 to bmm(f32, qint8 -> qcint8) -> f32.
43c4f7b3 Quentin Khan: Add fine-grain detection of unsupported fp16 ops when falling back to fp32.
3ed73f57 Mohammadreza Heydary: Rewrote bmm(f32, dequant(qint8)) -> f32 to bmm(f32, qint8 -> qcint8) -> f32.
12f3b6e9 Alexander Shaposhnikov: On Hexagon std::int32_t is defined as long, which is distinct from int. Because slinky::thread_pool uses int in its base class signature, using int32_t in the derived class causes a type mismatch.
dc0d8821 Alexander Shaposhnikov: Fix wrong types in hexagon_hvx.h
ce684eb2 Misha Gutman: Rewrote bmm(f32, dequant(qint8)) -> f32 to bmm(f32, qint8 -> qcint8) -> f32.
3e5f1c7e Byungchul Kim: Internal changes only
5be8df86 Alexander Shaposhnikov: Adjust define_transpose_a declaration.
e504f4e0 Dillon Sharlet: Add rewrite for add(square(x), y) => multiply_add(x, x, y)
32df615f Dillon Sharlet: Move sum_kn kernels from neondot to neon
f4271c2a Dillon Sharlet: Remove concatenating constructor from vec types
f6486e3e Dillon Sharlet: Explicitly number enumerations in ynnpack.h for ABI stability
bd205c6b Dillon Sharlet: Refactor precision for unary ops
00f16e11 Dillon Sharlet: Clean up target suffix for elementwise ops
95a93bc1 Dillon Sharlet: Add isnan, isinf, isfinite for YNNPACK float types
d1576216 Dillon Sharlet: Refactor precision for unary ops
f5d61330 XNNPACK Team: Merge pull request Suppress linker warnings in windows builds. #10230 from qualcomm:sme1/qp8-f32-qc4w-gemm
8d8fa1fe aizu-m: reject overflowing output dimensions in constant pad reshape
a9a5a662 aizu-m: Guard indirection buffer size against overflow in igemm reshape
a9c19098 aizu-m: Validate expand-dims axes in resize_expand_dims_output_tensor
337236cb aizu-m: bound rope token count to weights tensor size in reshape
463c3e99 aizu-m: validate input rank in softmax and qp8 convert reshape
a505fe2a Dillon Sharlet: Only define isinf, isnan, isfinite if not on MSVC
0e469aba Byungchul Kim: Internal changes only
4bd2e73b XNNPACK Team: Merge pull request TensorFlow build: protobuf/pyext/_message.so' failed #10357 from aizu-m:constant-pad-dim-overflow
494ddd06 Misha Gutman: Fixed underallocating memory for qint8->qcint8 conversion which only could have been seen in combination with weight_cache.
13c32f5e Frank Barchard: Add AVX512SKX F16-F32ACC GEMM/IGEMM microkernels to XNNPACK
9d225af7 XNNPACK Team: Merge pull request slim.conv2d Error: Input has undefined axis dimension. #10389 from aizu-m:igemm-indirection-overflow
843528da XNNPACK Team: Merge pull request Unable to train restore or save when running lamp server #10424 from aizu-m:softmax-convert-input-rank
a80e3c43 XNNPACK Team: Merge pull request Branch 155393864 #10404 from aizu-m:expand-dims-axis-validation
85115d25 XNNPACK Team: Merge pull request Added store intermediate graph feature #10412 from aizu-m:rope-tokens-weights-bound
c6f7e32e Dillon Sharlet: Relax tolerance of batch_matrix_multiply test
71b2a887 Dillon Sharlet: Add tracing to tsl::profiler traces in tensorflow
f42612c6 Frank Barchard: Fix SSE2 gemm being used when it should not be AVX/AVX512
463600be Dillon Sharlet: Add avx512 fp16 convert kernels
3912146d XNNPACK Team: Add tracing to tsl::profiler traces in tensorflow
508ca2d6 Dillon Sharlet: Fix new tests in YNNPACK
1f742eb0 Volodymyr Kysenko: Remove packing output buffer scheduling.
1e14f367 Dillon Sharlet: Add tracing to tsl::profiler traces in tensorflow
87730c12 Dillon Sharlet: Fix bf16 gemm config when there are no kernels
11852443 Dillon Sharlet: Add fp8 types and reference convert kernels to YNNPACK
f0e859a4 Byungchul Kim: XNNPACK runner allows external tensor to grow even when it was added without copy
e71d1ae4 Byungchul Kim: Fix memory leak
199b0e5c Byungchul Kim: Internal changes only
f774d42b Dillon Sharlet: Add ynn_define_dynamic_quantization
ff27209a Frank Barchard: f16-f32acc-approxgelu for WebAssembly and native
c6fc8008 foodlook: softmax: reject input/output datatype mismatch
afa8f9c0 Gerardo Carranza: Add BatchMatmul Generator to ATS.
cc3f6ba9 Dillon Sharlet: Add ynn_define_gather and remove ynn_define_lut
15e4a6e6 Dillon Sharlet: Fix attention benchmark to be more realistic and test longer sequences
7f2b83e6 Gerardo Carranza: Add FullyConnected Generator to ATS.
8084a7f9 Frank Barchard: Update ISA guards for F16-VLOG, F16-VSIN, and F16-VCOS in XNNPACK.
0a6f23d2 Gerardo Carranza: Add support for asymmetric quantization and weights format in FullyConnected.
f0b3a299 Richard Townsend: [gn] Experimental CI for x64 Windows
7c92728e aizu-m: size both argmax pooling outputs before returning reallocation
4d7d0a38 Dillon Sharlet: Improve precision of compute_qd8_params
9c202cae Dillon Sharlet: Add YNN_NODE_FLAG_UNIQUE_DIMS
79ae5230 Dillon Sharlet: Don't return negative dimension indices from axis_to_slinky_dim
37099ac2 XNNPACK Team: Merge pull request [TF:XLA] XLA does not recognize symbol Polly. #10471 from aizu-m:argmax-pooling-index-realloc
c6e5f040 XNNPACK Team: Merge pull request BasicRNNCell comment fix #10467 from foodlook:fix-softmax-datatype-mismatch
a51b41a4 Gregory Comer: Add bf16-qd8 convert operator support
210d8cee Gregory Comer: Add bf16-qu8 convert operator support
b92d2986 Dillon Sharlet: Add bf16 and fp16 dequantize_dot kernels
78beab26 Frank Barchard: Fix build error in unary when AVX is disabled
7a54dd8b Frank Barchard: Support FP16 in Convolution 2D, Depthwise Convolution 2D, and Fully Connected subgraphs.
7d47dd87 Dillon Sharlet: Fix FP16-to-FP32 fallback rewrite for convert nodes in XNNPACK.
c69adf79 aizu-m: free temporary f32 buffers in pf32_f16 conv create error path
f24b7d89 aizu-m: validate index tensor shape in unpooling reshape
25b9c41d XNNPACK Team: Merge pull request Update 1_notmnist.ipynb #10022 from GregoryComer:bf16-qd8-operator
71708b85 Dillon Sharlet: SIMD wrapper implementation improvements
7f16ce87 aizu-m: validate input_id before values lookup in global pooling define
a62374fa Volodymyr Kysenko: Use backward propagated extents to decide when loops should be fused.
cba17d66 Dillon Sharlet: Change sme2 docker image to a version of qemu that supports fp8
2278f2c9 Dillon Sharlet: Attempt to fix docker QEMU build
9b5fee0b Frank Barchard: Improve precision of f16-f32acc vapproxgelu microkernels using NR division.
19655824 Dillon Sharlet: Add patch to allow enable FPMR register access in our qemu build
e99ee83 Dillon Sharlet: Apparently ubuntu-latest-16core doesn't exist
5f4cf8b5 Frank Barchard: 4-bit weight packw microkernel Scalar, SSE2 and AVX2
ad262b84 Dillon Sharlet: Add neonfp8 cast support
9c2b0e11 Frank Barchard: vtanh f16-f32acc kernels across multiple architectures
e8b818d9 aizu-m: reject overflowing axis in fuse_dims and split_dim define
54143bc4 XNNPACK Team: Add Slice, ResizeBilinear, and DepthwiseConv2D ops for WebGpu TensorAPI backend
b936b7ac Dillon Sharlet: Fix GitHub Actions cache configuration for XNNPACK.
f5a3ba01 Richard Townsend: [gn] add standalone modules
d4ea4e0b Dillon Sharlet: Use input shape instead of output shape to determine the number of channels
82601d27 Dillon Sharlet: Add SIMD types using GNU vector extension types
0f219961 Dillon Sharlet: Implement dot tests by widening sub-32-bit floats to float
4b7c368f Dillon Sharlet: Add ARM fp8 dot kernels
702d9390 Volodymyr Kysenko: Allow fusing loops with different required steps by using their least common multiple.
964559ae XNNPACK Team: Merge pull request when logits is all zero, why in_top_k will return true? #10489 from aizu-m:unpooling-index-shape-check
ea34996a XNNPACK Team: Merge pull request Branch 158278922 #10495 from aizu-m:global-pooling-input-id-bounds
48b9ad77 XNNPACK Team: Merge pull request Variable "weights" does not exist with BasicLSTMcell or LSTMBlockCell #10487 from aizu-m:pf32-f16-conv-bias-leak
987df4b9 Dillon Sharlet: Cleanup/simplification pass of dot subgraph and tests
d8fe82d3 Frank Barchard: GIO weight packing ukernels for KR=1 x8/x16/x32
b1e3c597 Richard Townsend: [gn] DEPS update for June, 2026
a5fe7ac0 Dillon Sharlet: Catch out of bounds indices in gather
1eb6eb37 Dillon Sharlet: Add CMake build for YNNPACK
e25ed4b2 Volodymyr Kysenko: Update slinky dependency to a newer commit.
a50d9b38 Volodymyr Kysenko: Add XNN_FLAG_NO_BROADCAST to batch matrix multiply in attention benchmark.
e53f11a4 XNNPACK Team: Merge pull request tf.layers.conv3d_transpose() gives error #10520 from aizu-m:fuse-split-axis-overflow
4bfa3bb8 Volodymyr Kysenko: Change the order of the dot loops in schedule_info.
d0aa6a4 Dillon Sharlet: Fix required dot tiling
697543f6 Dillon Sharlet: Fix split-fuse in YNNPACK
44b9b41f aizu-m: goto error on convolution_op alloc failure in deconv create
3f731006 aizu-m: free partial operator on alloc failure in create paths
d6e4de13 Minh Vu: Fix benchmark minmax params order
fb4e3d2d aizu-m: reject input rank mismatch in transpose reshape
a9c1b094 aizu-m: Guard transpose rank-mismatch test with XNNPACK_USE_YNNPACK
d49f39ee XNNPACK Team: Merge pull request [Bash] read with -r to not mangle backslashes #10543 from fallintoplace:fix/f16-benchmark-minmax-params
a27ef624 XNNPACK Team: Merge pull request [Bash] Removed unnecessary $/${} #10542 from aizu-m:free-partial-op-on-alloc-failure
b0e7c33f XNNPACK Team: Merge pull request incorrect documentation for deep_cnn tutorial #10545 from aizu-m:transpose-rank-mismatch
737a5564 XNNPACK Team: Merge pull request [Bash] Put 2>&1 behind the redirect #10541 from aizu-m:deconv-create-oom-leak
608aa8b9 Dillon Sharlet: LUT kernel improvements
f9e04bf2 Dillon Sharlet: Improve ynn_define_gather to support gathering multiple dimensions
23718f1f Dillon Sharlet: Fix static_transpose when running with --define xnnpack_use_ynnpack=true
a2b924c2 Frank Barchard: Update generated scalar ukernel for qs8-qc4w-packw x8c8 gemm goi
3ce6c680 Frank Barchard: Enable SSSE3 GEMM microkernels for qs8_qc2w and qdu8_f32_qc2w in XNNPACK.
7359ebdc Frank Barchard: Add f16-vexp FP32 accumulation kernels for performance on AVX512F/F16C/NEON/Wasm
617815ff Alexander Shaposhnikov: Add missing f32x32 operators and reduce kernels on Hexagon.
108e8348 Ping Yu: Update Gemma3 example with slicing and layernorm adjustments.
944466f5 Dillon Sharlet: Remove some excessive axis validation in split/fuse/broadcast ops
08712133 Dillon Sharlet: Change ynn_define_unary_polynomial arguments from float to double
ce360caa Byungchul Kim: Allow weights share among xnnpack runners.
7d99d465 Frank Barchard: Add AVX512SKX microkernels for BF16/F16/F32 to QS8/QU8 vcvt.
74e718e8 Dillon Sharlet: Don't let arithmetic folding change the type of an operation
671aee6b Dillon Sharlet: Add some safety checks
0882ef3d Volodymyr Kysenko: Add memory usage tracking to XNNPACK subgraph benchmarks.
3a991096 Dillon Sharlet: Fix bug reinterpreting xnn_status as bool
bf52497e Dillon Sharlet: Add YNN_VALUE_FLAG_NO_EXCESS_PRECISION
9a7ac40a Volodymyr Kysenko: Extract source region inference into a helper function.
68c99f5c Dillon Sharlet: Add composite op library in YNNPACK
75d2c7d4 Frank Barchard: Fix ASan buffer overflow in AVX packw microkernel.
890ecdd4 Frank Barchard: Add NEON and AVX2 GEMM packing micro-kernels for QS8 and X8.
96ee1373 Dillon Sharlet: Move more ops to composites library from XNNPACK compatibility layer
653cbf90 Dillon Sharlet: Reduce max rank of tests for reduce
37e59cdb Dillon Sharlet: Add define_dot_quantization to the composites library

Commit history for google/XNNPACK (76de1380 -> 37e59cdb): - 70889f15 Vaisakh K V: Added SME1 support for qp8-f32-qc4w gemm - ab8714dd Vaisakh K V: Merge remote-tracking branch 'google/master' into sme1/qp8-f32-qc4w-gemm - 8649fdd1 aizu-m: validate reduction axes before nchw remap in static reduce - 383840f8 Frank Barchard: Update XNNPACK elementwise benchmarks to use consistent N elements. - cced44b5 Dillon Sharlet: Add fp16 and bf16 implementations of `exp`, `expm1`, `log`, `log1p`, `erf`, `tanh` - 6062dc08 waris ): fix-concat-oob-write - d4312603 Frank Barchard: Fix undeclared identifier XNN_SIMD_NUM_RCP_ITER_F16 in wasm relaxed simd fp16 - 2fef1fcf Alexander Shaposhnikov: Change cache key schema for bazel builds. - a1b0202e Frank Barchard: Add 2 bit SSE GEMM microkernels - be7bb974 Dillon Sharlet: Don't rewrite sum(a * b) => dot(a, b) if the dot would be a vector-vector multiply - 85a6530d Misha Gutman: Added f32_qc8w operator level support for batch matrix multiply. - 932ea64e Quentin Khan: Make hardware configuration and initialization guards (re)setable. - 207dab62 Misha Gutman: Enabled f32_qc8w bmm on the subgraph level. - 4bc0e6ab Quentin Khan: Remove useless test main function. - 799de5c9 XNNPACK Team: Merge pull request #10373 from sin99xx:master - 859e1a51 XNNPACK Team: Merge pull request #10369 from aizu-m:reduce-axis-bounds - 6f2d84ec Quentin Khan: Fix isomorphic matcher. - 51af7f59 Quentin Khan: Make external tensor order follow topological traversal of the graph. - 5fa627d5 Frank Barchard: f16 vlog switch from rational-3-3 to rational-1-3 - eb93f24d Quentin Khan: Correctly zero-initialize `xnn_unary_params`. - 98930bab Dillon Sharlet: Test reduce kernels with pi summation in both ascending and descending order - cf112cb8 Dillon Sharlet: Minor loop fusion improvements - 68373b6a Dillon Sharlet: Disable F32QC8W tests when using YNNPACK - ce3b7c1b Frank Barchard: Fix XNNPACK compilation failure on Windows ARM64. - 1f4e631e Misha Gutman: Rewrote bmm(f32, dequant(qint8)) -> f32 to bmm(f32, qint8 -> qcint8) -> f32. - 43c4f7b3 Quentin Khan: Add fine-grain detection of unsupported fp16 ops when falling back to fp32. - 3ed73f57 Mohammadreza Heydary: Rewrote bmm(f32, dequant(qint8)) -> f32 to bmm(f32, qint8 -> qcint8) -> f32. - 12f3b6e9 Alexander Shaposhnikov: On Hexagon std::int32_t is defined as long, which is distinct from int. Because slinky::thread_pool uses int in its base class signature, using int32_t in the derived class causes a type mismatch. - dc0d8821 Alexander Shaposhnikov: Fix wrong types in hexagon_hvx.h - ce684eb2 Misha Gutman: Rewrote bmm(f32, dequant(qint8)) -> f32 to bmm(f32, qint8 -> qcint8) -> f32. - 3e5f1c7e Byungchul Kim: Internal changes only - 5be8df86 Alexander Shaposhnikov: Adjust define_transpose_a declaration. - e504f4e0 Dillon Sharlet: Add rewrite for add(square(x), y) => multiply_add(x, x, y) - 32df615f Dillon Sharlet: Move `sum_kn` kernels from `neondot` to `neon` - f4271c2a Dillon Sharlet: Remove concatenating constructor from `vec` types - f6486e3e Dillon Sharlet: Explicitly number enumerations in ynnpack.h for ABI stability - bd205c6b Dillon Sharlet: Refactor precision for unary ops - 00f16e11 Dillon Sharlet: Clean up target suffix for elementwise ops - 95a93bc1 Dillon Sharlet: Add `isnan`, `isinf`, `isfinite` for YNNPACK float types - d1576216 Dillon Sharlet: Refactor precision for unary ops - f5d61330 XNNPACK Team: Merge pull request #10230 from qualcomm:sme1/qp8-f32-qc4w-gemm - 8d8fa1fe aizu-m: reject overflowing output dimensions in constant pad reshape - a9a5a662 aizu-m: Guard indirection buffer size against overflow in igemm reshape - a9c19098 aizu-m: Validate expand-dims axes in resize_expand_dims_output_tensor - 337236cb aizu-m: bound rope token count to weights tensor size in reshape - 463c3e99 aizu-m: validate input rank in softmax and qp8 convert reshape - a505fe2a Dillon Sharlet: Only define `isinf`, `isnan`, `isfinite` if not on MSVC - 0e469aba Byungchul Kim: Internal changes only - 4bd2e73b XNNPACK Team: Merge pull request #10357 from aizu-m:constant-pad-dim-overflow - 494ddd06 Misha Gutman: Fixed underallocating memory for qint8->qcint8 conversion which only could have been seen in combination with weight_cache. - 13c32f5e Frank Barchard: Add AVX512SKX F16-F32ACC GEMM/IGEMM microkernels to XNNPACK - 9d225af7 XNNPACK Team: Merge pull request #10389 from aizu-m:igemm-indirection-overflow - 843528da XNNPACK Team: Merge pull request #10424 from aizu-m:softmax-convert-input-rank - a80e3c43 XNNPACK Team: Merge pull request #10404 from aizu-m:expand-dims-axis-validation - 85115d25 XNNPACK Team: Merge pull request #10412 from aizu-m:rope-tokens-weights-bound - c6f7e32e Dillon Sharlet: Relax tolerance of batch_matrix_multiply test - 71b2a887 Dillon Sharlet: Add tracing to tsl::profiler traces in tensorflow - f42612c6 Frank Barchard: Fix SSE2 gemm being used when it should not be AVX/AVX512 - 463600be Dillon Sharlet: Add avx512 fp16 convert kernels - 3912146d XNNPACK Team: Add tracing to tsl::profiler traces in tensorflow - 508ca2d6 Dillon Sharlet: Fix new tests in YNNPACK - 1f742eb0 Volodymyr Kysenko: Remove packing output buffer scheduling. - 1e14f367 Dillon Sharlet: Add tracing to tsl::profiler traces in tensorflow - 87730c12 Dillon Sharlet: Fix bf16 gemm config when there are no kernels - 11852443 Dillon Sharlet: Add fp8 types and reference convert kernels to YNNPACK - f0e859a4 Byungchul Kim: XNNPACK runner allows external tensor to grow even when it was added without copy - e71d1ae4 Byungchul Kim: Fix memory leak - 199b0e5c Byungchul Kim: Internal changes only - f774d42b Dillon Sharlet: Add `ynn_define_dynamic_quantization` - ff27209a Frank Barchard: f16-f32acc-approxgelu for WebAssembly and native - c6fc8008 foodlook: softmax: reject input/output datatype mismatch - afa8f9c Gerardo Carranza: Add BatchMatmul Generator to ATS. - cc3f6ba9 Dillon Sharlet: Add `ynn_define_gather` and remove `ynn_define_lut` - 15e4a6e6 Dillon Sharlet: Fix attention benchmark to be more realistic and test longer sequences - 7f2b83e6 Gerardo Carranza: Add FullyConnected Generator to ATS. - 8084a7f9 Frank Barchard: Update ISA guards for F16-VLOG, F16-VSIN, and F16-VCOS in XNNPACK. - 0a6f23d2 Gerardo Carranza: Add support for asymmetric quantization and weights format in FullyConnected. - f0b3a299 Richard Townsend: [gn] Experimental CI for x64 Windows - 7c92728e aizu-m: size both argmax pooling outputs before returning reallocation - 4d7d0a38 Dillon Sharlet: Improve precision of compute_qd8_params - 9c202cae Dillon Sharlet: Add `YNN_NODE_FLAG_UNIQUE_DIMS` - 79ae5230 Dillon Sharlet: Don't return negative dimension indices from `axis_to_slinky_dim` - 37099ac2 XNNPACK Team: Merge pull request #10471 from aizu-m:argmax-pooling-index-realloc - c6e5f040 XNNPACK Team: Merge pull request #10467 from foodlook:fix-softmax-datatype-mismatch - a51b41a4 Gregory Comer: Add bf16-qd8 convert operator support - 210d8cee Gregory Comer: Add bf16-qu8 convert operator support - b92d2986 Dillon Sharlet: Add bf16 and fp16 dequantize_dot kernels - 78beab26 Frank Barchard: Fix build error in unary when AVX is disabled - 7a54dd8b Frank Barchard: Support FP16 in Convolution 2D, Depthwise Convolution 2D, and Fully Connected subgraphs. - 7d47dd87 Dillon Sharlet: Fix FP16-to-FP32 fallback rewrite for convert nodes in XNNPACK. - c69adf79 aizu-m: free temporary f32 buffers in pf32_f16 conv create error path - f24b7d89 aizu-m: validate index tensor shape in unpooling reshape - 25b9c41d XNNPACK Team: Merge pull request #10022 from GregoryComer:bf16-qd8-operator - 71708b85 Dillon Sharlet: SIMD wrapper implementation improvements - 7f16ce87 aizu-m: validate input_id before values lookup in global pooling define - a62374fa Volodymyr Kysenko: Use backward propagated extents to decide when loops should be fused. - cba17d66 Dillon Sharlet: Change sme2 docker image to a version of qemu that supports fp8 - 2278f2c9 Dillon Sharlet: Attempt to fix docker QEMU build - 9b5fee0b Frank Barchard: Improve precision of f16-f32acc vapproxgelu microkernels using NR division. - 19655824 Dillon Sharlet: Add patch to allow enable FPMR register access in our qemu build - e99ee83 Dillon Sharlet: Apparently ubuntu-latest-16core doesn't exist - 5f4cf8b5 Frank Barchard: 4-bit weight packw microkernel Scalar, SSE2 and AVX2 - ad262b84 Dillon Sharlet: Add neonfp8 cast support - 9c2b0e11 Frank Barchard: vtanh f16-f32acc kernels across multiple architectures - e8b818d9 aizu-m: reject overflowing axis in fuse_dims and split_dim define - 54143bc4 XNNPACK Team: Add Slice, ResizeBilinear, and DepthwiseConv2D ops for WebGpu TensorAPI backend - b936b7ac Dillon Sharlet: Fix GitHub Actions cache configuration for XNNPACK. - f5a3ba01 Richard Townsend: [gn] add standalone modules - d4ea4e0b Dillon Sharlet: Use input shape instead of output shape to determine the number of channels - 82601d27 Dillon Sharlet: Add SIMD types using GNU vector extension types - 0f219961 Dillon Sharlet: Implement dot tests by widening sub-32-bit floats to float - 4b7c368f Dillon Sharlet: Add ARM fp8 dot kernels - 702d9390 Volodymyr Kysenko: Allow fusing loops with different required steps by using their least common multiple. - 964559ae XNNPACK Team: Merge pull request #10489 from aizu-m:unpooling-index-shape-check - ea34996a XNNPACK Team: Merge pull request #10495 from aizu-m:global-pooling-input-id-bounds - 48b9ad77 XNNPACK Team: Merge pull request #10487 from aizu-m:pf32-f16-conv-bias-leak - 987df4b9 Dillon Sharlet: Cleanup/simplification pass of dot subgraph and tests - d8fe82d3 Frank Barchard: GIO weight packing ukernels for KR=1 x8/x16/x32 - b1e3c597 Richard Townsend: [gn] DEPS update for June, 2026 - a5fe7ac0 Dillon Sharlet: Catch out of bounds indices in gather - 1eb6eb37 Dillon Sharlet: Add CMake build for YNNPACK - e25ed4b2 Volodymyr Kysenko: Update slinky dependency to a newer commit. - a50d9b38 Volodymyr Kysenko: Add XNN_FLAG_NO_BROADCAST to batch matrix multiply in attention benchmark. - e53f11a4 XNNPACK Team: Merge pull request #10520 from aizu-m:fuse-split-axis-overflow - 4bfa3bb8 Volodymyr Kysenko: Change the order of the dot loops in schedule_info. - d0aa6a4 Dillon Sharlet: Fix required dot tiling - 697543f6 Dillon Sharlet: Fix split-fuse in YNNPACK - 44b9b41f aizu-m: goto error on convolution_op alloc failure in deconv create - 3f731006 aizu-m: free partial operator on alloc failure in create paths - d6e4de13 Minh Vu: Fix benchmark minmax params order - fb4e3d2d aizu-m: reject input rank mismatch in transpose reshape - a9c1b094 aizu-m: Guard transpose rank-mismatch test with XNNPACK_USE_YNNPACK - d49f39ee XNNPACK Team: Merge pull request #10543 from fallintoplace:fix/f16-benchmark-minmax-params - a27ef624 XNNPACK Team: Merge pull request #10542 from aizu-m:free-partial-op-on-alloc-failure - b0e7c33f XNNPACK Team: Merge pull request #10545 from aizu-m:transpose-rank-mismatch - 737a5564 XNNPACK Team: Merge pull request #10541 from aizu-m:deconv-create-oom-leak - 608aa8b9 Dillon Sharlet: LUT kernel improvements - f9e04bf2 Dillon Sharlet: Improve `ynn_define_gather` to support gathering multiple dimensions - 23718f1f Dillon Sharlet: Fix static_transpose when running with --define xnnpack_use_ynnpack=true - a2b924c2 Frank Barchard: Update generated scalar ukernel for qs8-qc4w-packw x8c8 gemm goi - 3ce6c680 Frank Barchard: Enable SSSE3 GEMM microkernels for qs8_qc2w and qdu8_f32_qc2w in XNNPACK. - 7359ebdc Frank Barchard: Add f16-vexp FP32 accumulation kernels for performance on AVX512F/F16C/NEON/Wasm - 617815ff Alexander Shaposhnikov: Add missing f32x32 operators and reduce kernels on Hexagon. - 108e8348 Ping Yu: Update Gemma3 example with slicing and layernorm adjustments. - 944466f5 Dillon Sharlet: Remove some excessive axis validation in split/fuse/broadcast ops - 08712133 Dillon Sharlet: Change `ynn_define_unary_polynomial` arguments from float to double - ce360caa Byungchul Kim: Allow weights share among xnnpack runners. - 7d99d465 Frank Barchard: Add AVX512SKX microkernels for BF16/F16/F32 to QS8/QU8 vcvt. - 74e718e8 Dillon Sharlet: Don't let arithmetic folding change the type of an operation - 671aee6b Dillon Sharlet: Add some safety checks - 0882ef3d Volodymyr Kysenko: Add memory usage tracking to XNNPACK subgraph benchmarks. - 3a991096 Dillon Sharlet: Fix bug reinterpreting `xnn_status` as `bool` - bf52497e Dillon Sharlet: Add `YNN_VALUE_FLAG_NO_EXCESS_PRECISION` - 9a7ac40a Volodymyr Kysenko: Extract source region inference into a helper function. - 68c99f5c Dillon Sharlet: Add composite op library in YNNPACK - 75d2c7d4 Frank Barchard: Fix ASan buffer overflow in AVX packw microkernel. - 890ecdd4 Frank Barchard: Add NEON and AVX2 GEMM packing micro-kernels for QS8 and X8. - 96ee1373 Dillon Sharlet: Move more ops to composites library from XNNPACK compatibility layer - 653cbf90 Dillon Sharlet: Reduce max rank of tests for reduce - 37e59cdb Dillon Sharlet: Add `define_dot_quantization` to the composites library PiperOrigin-RevId: 929008850

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update XNNPACK in XLA#122009

Update XNNPACK in XLA#122009
copybara-service[bot] wants to merge 1 commit into
masterfrom
exported_pr_929008850

copybara-service Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

copybara-service Bot commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant