{{ message }}
Update XNNPACK in XLA#122009
Draft
copybara-service[bot] wants to merge 1 commit into
Draft
Conversation
Commit history for google/XNNPACK (76de1380 -> 37e59cdb): - 70889f15 Vaisakh K V: Added SME1 support for qp8-f32-qc4w gemm - ab8714dd Vaisakh K V: Merge remote-tracking branch 'google/master' into sme1/qp8-f32-qc4w-gemm - 8649fdd1 aizu-m: validate reduction axes before nchw remap in static reduce - 383840f8 Frank Barchard: Update XNNPACK elementwise benchmarks to use consistent N elements. - cced44b5 Dillon Sharlet: Add fp16 and bf16 implementations of `exp`, `expm1`, `log`, `log1p`, `erf`, `tanh` - 6062dc08 waris ): fix-concat-oob-write - d4312603 Frank Barchard: Fix undeclared identifier XNN_SIMD_NUM_RCP_ITER_F16 in wasm relaxed simd fp16 - 2fef1fcf Alexander Shaposhnikov: Change cache key schema for bazel builds. - a1b0202e Frank Barchard: Add 2 bit SSE GEMM microkernels - be7bb974 Dillon Sharlet: Don't rewrite sum(a * b) => dot(a, b) if the dot would be a vector-vector multiply - 85a6530d Misha Gutman: Added f32_qc8w operator level support for batch matrix multiply. - 932ea64e Quentin Khan: Make hardware configuration and initialization guards (re)setable. - 207dab62 Misha Gutman: Enabled f32_qc8w bmm on the subgraph level. - 4bc0e6ab Quentin Khan: Remove useless test main function. - 799de5c9 XNNPACK Team: Merge pull request #10373 from sin99xx:master - 859e1a51 XNNPACK Team: Merge pull request #10369 from aizu-m:reduce-axis-bounds - 6f2d84ec Quentin Khan: Fix isomorphic matcher. - 51af7f59 Quentin Khan: Make external tensor order follow topological traversal of the graph. - 5fa627d5 Frank Barchard: f16 vlog switch from rational-3-3 to rational-1-3 - eb93f24d Quentin Khan: Correctly zero-initialize `xnn_unary_params`. - 98930bab Dillon Sharlet: Test reduce kernels with pi summation in both ascending and descending order - cf112cb8 Dillon Sharlet: Minor loop fusion improvements - 68373b6a Dillon Sharlet: Disable F32QC8W tests when using YNNPACK - ce3b7c1b Frank Barchard: Fix XNNPACK compilation failure on Windows ARM64. - 1f4e631e Misha Gutman: Rewrote bmm(f32, dequant(qint8)) -> f32 to bmm(f32, qint8 -> qcint8) -> f32. - 43c4f7b3 Quentin Khan: Add fine-grain detection of unsupported fp16 ops when falling back to fp32. - 3ed73f57 Mohammadreza Heydary: Rewrote bmm(f32, dequant(qint8)) -> f32 to bmm(f32, qint8 -> qcint8) -> f32. - 12f3b6e9 Alexander Shaposhnikov: On Hexagon std::int32_t is defined as long, which is distinct from int. Because slinky::thread_pool uses int in its base class signature, using int32_t in the derived class causes a type mismatch. - dc0d8821 Alexander Shaposhnikov: Fix wrong types in hexagon_hvx.h - ce684eb2 Misha Gutman: Rewrote bmm(f32, dequant(qint8)) -> f32 to bmm(f32, qint8 -> qcint8) -> f32. - 3e5f1c7e Byungchul Kim: Internal changes only - 5be8df86 Alexander Shaposhnikov: Adjust define_transpose_a declaration. - e504f4e0 Dillon Sharlet: Add rewrite for add(square(x), y) => multiply_add(x, x, y) - 32df615f Dillon Sharlet: Move `sum_kn` kernels from `neondot` to `neon` - f4271c2a Dillon Sharlet: Remove concatenating constructor from `vec` types - f6486e3e Dillon Sharlet: Explicitly number enumerations in ynnpack.h for ABI stability - bd205c6b Dillon Sharlet: Refactor precision for unary ops - 00f16e11 Dillon Sharlet: Clean up target suffix for elementwise ops - 95a93bc1 Dillon Sharlet: Add `isnan`, `isinf`, `isfinite` for YNNPACK float types - d1576216 Dillon Sharlet: Refactor precision for unary ops - f5d61330 XNNPACK Team: Merge pull request #10230 from qualcomm:sme1/qp8-f32-qc4w-gemm - 8d8fa1fe aizu-m: reject overflowing output dimensions in constant pad reshape - a9a5a662 aizu-m: Guard indirection buffer size against overflow in igemm reshape - a9c19098 aizu-m: Validate expand-dims axes in resize_expand_dims_output_tensor - 337236cb aizu-m: bound rope token count to weights tensor size in reshape - 463c3e99 aizu-m: validate input rank in softmax and qp8 convert reshape - a505fe2a Dillon Sharlet: Only define `isinf`, `isnan`, `isfinite` if not on MSVC - 0e469aba Byungchul Kim: Internal changes only - 4bd2e73b XNNPACK Team: Merge pull request #10357 from aizu-m:constant-pad-dim-overflow - 494ddd06 Misha Gutman: Fixed underallocating memory for qint8->qcint8 conversion which only could have been seen in combination with weight_cache. - 13c32f5e Frank Barchard: Add AVX512SKX F16-F32ACC GEMM/IGEMM microkernels to XNNPACK - 9d225af7 XNNPACK Team: Merge pull request #10389 from aizu-m:igemm-indirection-overflow - 843528da XNNPACK Team: Merge pull request #10424 from aizu-m:softmax-convert-input-rank - a80e3c43 XNNPACK Team: Merge pull request #10404 from aizu-m:expand-dims-axis-validation - 85115d25 XNNPACK Team: Merge pull request #10412 from aizu-m:rope-tokens-weights-bound - c6f7e32e Dillon Sharlet: Relax tolerance of batch_matrix_multiply test - 71b2a887 Dillon Sharlet: Add tracing to tsl::profiler traces in tensorflow - f42612c6 Frank Barchard: Fix SSE2 gemm being used when it should not be AVX/AVX512 - 463600be Dillon Sharlet: Add avx512 fp16 convert kernels - 3912146d XNNPACK Team: Add tracing to tsl::profiler traces in tensorflow - 508ca2d6 Dillon Sharlet: Fix new tests in YNNPACK - 1f742eb0 Volodymyr Kysenko: Remove packing output buffer scheduling. - 1e14f367 Dillon Sharlet: Add tracing to tsl::profiler traces in tensorflow - 87730c12 Dillon Sharlet: Fix bf16 gemm config when there are no kernels - 11852443 Dillon Sharlet: Add fp8 types and reference convert kernels to YNNPACK - f0e859a4 Byungchul Kim: XNNPACK runner allows external tensor to grow even when it was added without copy - e71d1ae4 Byungchul Kim: Fix memory leak - 199b0e5c Byungchul Kim: Internal changes only - f774d42b Dillon Sharlet: Add `ynn_define_dynamic_quantization` - ff27209a Frank Barchard: f16-f32acc-approxgelu for WebAssembly and native - c6fc8008 foodlook: softmax: reject input/output datatype mismatch - afa8f9c Gerardo Carranza: Add BatchMatmul Generator to ATS. - cc3f6ba9 Dillon Sharlet: Add `ynn_define_gather` and remove `ynn_define_lut` - 15e4a6e6 Dillon Sharlet: Fix attention benchmark to be more realistic and test longer sequences - 7f2b83e6 Gerardo Carranza: Add FullyConnected Generator to ATS. - 8084a7f9 Frank Barchard: Update ISA guards for F16-VLOG, F16-VSIN, and F16-VCOS in XNNPACK. - 0a6f23d2 Gerardo Carranza: Add support for asymmetric quantization and weights format in FullyConnected. - f0b3a299 Richard Townsend: [gn] Experimental CI for x64 Windows - 7c92728e aizu-m: size both argmax pooling outputs before returning reallocation - 4d7d0a38 Dillon Sharlet: Improve precision of compute_qd8_params - 9c202cae Dillon Sharlet: Add `YNN_NODE_FLAG_UNIQUE_DIMS` - 79ae5230 Dillon Sharlet: Don't return negative dimension indices from `axis_to_slinky_dim` - 37099ac2 XNNPACK Team: Merge pull request #10471 from aizu-m:argmax-pooling-index-realloc - c6e5f040 XNNPACK Team: Merge pull request #10467 from foodlook:fix-softmax-datatype-mismatch - a51b41a4 Gregory Comer: Add bf16-qd8 convert operator support - 210d8cee Gregory Comer: Add bf16-qu8 convert operator support - b92d2986 Dillon Sharlet: Add bf16 and fp16 dequantize_dot kernels - 78beab26 Frank Barchard: Fix build error in unary when AVX is disabled - 7a54dd8b Frank Barchard: Support FP16 in Convolution 2D, Depthwise Convolution 2D, and Fully Connected subgraphs. - 7d47dd87 Dillon Sharlet: Fix FP16-to-FP32 fallback rewrite for convert nodes in XNNPACK. - c69adf79 aizu-m: free temporary f32 buffers in pf32_f16 conv create error path - f24b7d89 aizu-m: validate index tensor shape in unpooling reshape - 25b9c41d XNNPACK Team: Merge pull request #10022 from GregoryComer:bf16-qd8-operator - 71708b85 Dillon Sharlet: SIMD wrapper implementation improvements - 7f16ce87 aizu-m: validate input_id before values lookup in global pooling define - a62374fa Volodymyr Kysenko: Use backward propagated extents to decide when loops should be fused. - cba17d66 Dillon Sharlet: Change sme2 docker image to a version of qemu that supports fp8 - 2278f2c9 Dillon Sharlet: Attempt to fix docker QEMU build - 9b5fee0b Frank Barchard: Improve precision of f16-f32acc vapproxgelu microkernels using NR division. - 19655824 Dillon Sharlet: Add patch to allow enable FPMR register access in our qemu build - e99ee83 Dillon Sharlet: Apparently ubuntu-latest-16core doesn't exist - 5f4cf8b5 Frank Barchard: 4-bit weight packw microkernel Scalar, SSE2 and AVX2 - ad262b84 Dillon Sharlet: Add neonfp8 cast support - 9c2b0e11 Frank Barchard: vtanh f16-f32acc kernels across multiple architectures - e8b818d9 aizu-m: reject overflowing axis in fuse_dims and split_dim define - 54143bc4 XNNPACK Team: Add Slice, ResizeBilinear, and DepthwiseConv2D ops for WebGpu TensorAPI backend - b936b7ac Dillon Sharlet: Fix GitHub Actions cache configuration for XNNPACK. - f5a3ba01 Richard Townsend: [gn] add standalone modules - d4ea4e0b Dillon Sharlet: Use input shape instead of output shape to determine the number of channels - 82601d27 Dillon Sharlet: Add SIMD types using GNU vector extension types - 0f219961 Dillon Sharlet: Implement dot tests by widening sub-32-bit floats to float - 4b7c368f Dillon Sharlet: Add ARM fp8 dot kernels - 702d9390 Volodymyr Kysenko: Allow fusing loops with different required steps by using their least common multiple. - 964559ae XNNPACK Team: Merge pull request #10489 from aizu-m:unpooling-index-shape-check - ea34996a XNNPACK Team: Merge pull request #10495 from aizu-m:global-pooling-input-id-bounds - 48b9ad77 XNNPACK Team: Merge pull request #10487 from aizu-m:pf32-f16-conv-bias-leak - 987df4b9 Dillon Sharlet: Cleanup/simplification pass of dot subgraph and tests - d8fe82d3 Frank Barchard: GIO weight packing ukernels for KR=1 x8/x16/x32 - b1e3c597 Richard Townsend: [gn] DEPS update for June, 2026 - a5fe7ac0 Dillon Sharlet: Catch out of bounds indices in gather - 1eb6eb37 Dillon Sharlet: Add CMake build for YNNPACK - e25ed4b2 Volodymyr Kysenko: Update slinky dependency to a newer commit. - a50d9b38 Volodymyr Kysenko: Add XNN_FLAG_NO_BROADCAST to batch matrix multiply in attention benchmark. - e53f11a4 XNNPACK Team: Merge pull request #10520 from aizu-m:fuse-split-axis-overflow - 4bfa3bb8 Volodymyr Kysenko: Change the order of the dot loops in schedule_info. - d0aa6a4 Dillon Sharlet: Fix required dot tiling - 697543f6 Dillon Sharlet: Fix split-fuse in YNNPACK - 44b9b41f aizu-m: goto error on convolution_op alloc failure in deconv create - 3f731006 aizu-m: free partial operator on alloc failure in create paths - d6e4de13 Minh Vu: Fix benchmark minmax params order - fb4e3d2d aizu-m: reject input rank mismatch in transpose reshape - a9c1b094 aizu-m: Guard transpose rank-mismatch test with XNNPACK_USE_YNNPACK - d49f39ee XNNPACK Team: Merge pull request #10543 from fallintoplace:fix/f16-benchmark-minmax-params - a27ef624 XNNPACK Team: Merge pull request #10542 from aizu-m:free-partial-op-on-alloc-failure - b0e7c33f XNNPACK Team: Merge pull request #10545 from aizu-m:transpose-rank-mismatch - 737a5564 XNNPACK Team: Merge pull request #10541 from aizu-m:deconv-create-oom-leak - 608aa8b9 Dillon Sharlet: LUT kernel improvements - f9e04bf2 Dillon Sharlet: Improve `ynn_define_gather` to support gathering multiple dimensions - 23718f1f Dillon Sharlet: Fix static_transpose when running with --define xnnpack_use_ynnpack=true - a2b924c2 Frank Barchard: Update generated scalar ukernel for qs8-qc4w-packw x8c8 gemm goi - 3ce6c680 Frank Barchard: Enable SSSE3 GEMM microkernels for qs8_qc2w and qdu8_f32_qc2w in XNNPACK. - 7359ebdc Frank Barchard: Add f16-vexp FP32 accumulation kernels for performance on AVX512F/F16C/NEON/Wasm - 617815ff Alexander Shaposhnikov: Add missing f32x32 operators and reduce kernels on Hexagon. - 108e8348 Ping Yu: Update Gemma3 example with slicing and layernorm adjustments. - 944466f5 Dillon Sharlet: Remove some excessive axis validation in split/fuse/broadcast ops - 08712133 Dillon Sharlet: Change `ynn_define_unary_polynomial` arguments from float to double - ce360caa Byungchul Kim: Allow weights share among xnnpack runners. - 7d99d465 Frank Barchard: Add AVX512SKX microkernels for BF16/F16/F32 to QS8/QU8 vcvt. - 74e718e8 Dillon Sharlet: Don't let arithmetic folding change the type of an operation - 671aee6b Dillon Sharlet: Add some safety checks - 0882ef3d Volodymyr Kysenko: Add memory usage tracking to XNNPACK subgraph benchmarks. - 3a991096 Dillon Sharlet: Fix bug reinterpreting `xnn_status` as `bool` - bf52497e Dillon Sharlet: Add `YNN_VALUE_FLAG_NO_EXCESS_PRECISION` - 9a7ac40a Volodymyr Kysenko: Extract source region inference into a helper function. - 68c99f5c Dillon Sharlet: Add composite op library in YNNPACK - 75d2c7d4 Frank Barchard: Fix ASan buffer overflow in AVX packw microkernel. - 890ecdd4 Frank Barchard: Add NEON and AVX2 GEMM packing micro-kernels for QS8 and X8. - 96ee1373 Dillon Sharlet: Move more ops to composites library from XNNPACK compatibility layer - 653cbf90 Dillon Sharlet: Reduce max rank of tests for reduce - 37e59cdb Dillon Sharlet: Add `define_dot_quantization` to the composites library PiperOrigin-RevId: 929008850
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Update XNNPACK in XLA
Commit history for google/XNNPACK (76de1380 -> 37e59cdb):
exp,expm1,log,log1p,erf,tanhxnn_unary_params.sum_knkernels fromneondottoneonvectypesisnan,isinf,isfinitefor YNNPACK float typesisinf,isnan,isfiniteif not on MSVCaxisdimension. #10389 from aizu-m:igemm-indirection-overflowynn_define_dynamic_quantizationynn_define_gatherand removeynn_define_lutYNN_NODE_FLAG_UNIQUE_DIMSaxis_to_slinky_dimynn_define_gatherto support gathering multiple dimensionsynn_define_unary_polynomialarguments from float to doublexnn_statusasboolYNN_VALUE_FLAG_NO_EXCESS_PRECISIONdefine_dot_quantizationto the composites library