WebGPU op-test framework + mul op manual merge · Pull Request #20389 · pytorch/executorch · GitHub
Skip to content

WebGPU op-test framework + mul op manual merge#20389

Merged
SS-JIA merged 4 commits into
mainfrom
webgpu-misc-ops-manual-merge
Jun 18, 2026
Merged

WebGPU op-test framework + mul op manual merge#20389
SS-JIA merged 4 commits into
mainfrom
webgpu-misc-ops-manual-merge

Conversation

@ghost

@ghost ghost commented Jun 18, 2026

Copy link
Copy Markdown

Manual co-merge of the stacked ghstack PRs #20339, #20357, and #20358 — all reviewed and accepted, but not picked up by auto-merge. Combining them into one PR so they can be merged directly.

This contains three commits:

  1. Op-test codegen framework ([ExecuTorch][WebGPU] Op-test codegen framework (cases.py -> generated .pte+golden -> gtest driver) #20339) — a declarative cases.py → generated .pte + golden → gtest driver for WebGPU op tests, mirroring the Vulkan op_tests setup. Each op declares its shapes/configs in
    cases.py; the generator exports a .pte per case and compares the on-GPU result against an fp64 torch golden on Dawn.

  2. Consolidate landed-op tests into the framework ([ExecuTorch][WebGPU] Consolidate landed-op tests into the cases.py op-test framework #20357) — migrates the existing add and rms_norm tests off the C++ monolith into the cases.py framework. update_cache/sdpa are kept standalone (stateful —
    they don't fit the single-forward model).

  3. Add mul op with full broadcast ([ExecuTorch][WebGPU] Add mul op with full broadcast (aten.mul.Tensor) #20358) — aten.mul.Tensor for the WebGPU delegate with full broadcasting, plus the shared TensorMeta broadcast-uniform infrastructure. On the Llama critical path (SwiGLU).

JCNTH added 3 commits June 18, 2026 09:07
… .pte+golden -> gtest driver)

Pull Request resolved: #20339

A manifest-driven op-test framework for the WebGPU backend, mirroring Vulkan's `op_tests/cases.py` (declarative per-op suites) but with a torch-computed golden loaded in C++, since the native test binary has no ATen. An op fits when it is stateless and expressible as one `module(inputs) -> golden` forward; stateful KV-cache ops (`sdpa`, `update_cache`) stay hand-written. Lands first as the shared foundation for the following op test-diffs — adding an op's test becomes one `cases.py` entry.

Composition:
- `test_suite.py` — schema (`WebGPUTestSuite`/`Case`/`InputSpec`) + a `register_op_test` decorator; per-case `required`/`heavy`/`golden_fn`, per-suite `golden_dtype`.
- `cases.py` — the declarative suites; registers the landed `add` + `rms_norm`; later ops append one entry each.
- `generate_op_tests.py` — per case: export via `VulkanPartitioner` to `.pte`, compute the fp64 torch golden (dual-oracle gate), serialize inputs+golden as fp32, emit `manifest.json`.
- `op_test_driver.cpp` + `driver_util.{h,cpp}` — generic gtest driver: one test per manifest entry, runs forward on-device, abs/rel + shape + reconciliation checks.
- `CMakeLists`/`ci.sh` — `webgpu_op_test` + device-free `webgpu_op_test_util_test`, wired into the Dawn(Tint)+SwiftShader CI.
ghstack-source-id: 394712836
@exported-using-ghexport

Differential Revision: [D108816389](https://our.internmc.facebook.com/intern/diff/D108816389/)
…-test framework

Pull Request resolved: #20357

`add` and `rms_norm` already have declarative suites in the `cases.py` framework, so their standalone tests are redundant. Remove them, leaving the framework as the single home for these stateless single-forward ops (mirroring Vulkan). `update_cache` and `sdpa` stay hand-written — stateful KV-cache replay can't be a single `module -> golden` forward.

Removed:
- `test_single_add`/`test_chained_add` + their `main()` plumbing from `test_webgpu_native.cpp`.
- the standalone `test/native/test_rms_norm.cpp` binary + its CMake target.
- the add/rms_norm export+run wiring in `test_webgpu_native_ci.sh` + `test_build_webgpu.sh`.
- `webgpu_native_test` re-gated on the executorch wheel being importable (was the add `.pte`); it still hosts the quantized_linear/SDPA/update_cache/symint sweeps.
ghstack-source-id: 394625048
@exported-using-ghexport

Differential Revision: [D108821384](https://our.internmc.facebook.com/intern/diff/D108821384/)
Pull Request resolved: #20358

Adds `aten.mul.Tensor` to the WebGPU delegate with full PyTorch broadcast, plus the shared `runtime/ops/TensorMeta.h` per-tensor uniform that broadcast ops reuse. Mul is on the Llama critical path — `F.silu` decomposes to `sigmoid` + `mul`, and SwiGLU multiplies two same-shape activations (the fast path).

Composition (single dispatch):
- `TensorMeta.h` (NEW) — 48-byte std140 `{ndim, numel, sizes[4], strides[4]}` UBO mirroring Vulkan's per-tensor `BufferMetadata`; `fill_tensor_meta_broadcast` right-aligns operand dims (rank>4 throws); `static_assert(sizeof==48)`.
- `mul/BinaryOp.cpp` — builds 3 `TensorMeta` UBOs (out/in1/in2 at bindings 3/4/5), guards fp32 + rank≤4, 1D-dispatches over `compute_1d_workgroup_count(numel)`, releases all uniforms after the bind group.
- `mul/binary_mul.wgsl` — same-shape fast path + a broadcast path (delinearize output index, clamp each input coord per-dim to size-1, relinearize on input strides).
- `WebGPUUtils.h` — adds the shared `utils::make_uniform` helper (first use).
ghstack-source-id: 394848336
@exported-using-ghexport

Differential Revision: [D108793167](https://our.internmc.facebook.com/intern/diff/D108793167/)
@ghost ghost requested review from kirklandsign and larryliu0820 as code owners June 18, 2026 21:13
@pytorch-bot

pytorch-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 18, 2026
@ghost ghost requested a review from SS-JIA June 18, 2026 21:13
@ghost ghost temporarily deployed to cadence June 18, 2026 21:13 — with GitHub Actions Inactive
@ghost ghost temporarily deployed to cadence June 18, 2026 21:13 — with GitHub Actions Inactive
@github-actions

Copy link
Copy Markdown

@SS-JIA SS-JIA merged commit 0e65ba6 into main Jun 18, 2026
170 of 176 checks passed
@SS-JIA SS-JIA deleted the webgpu-misc-ops-manual-merge branch June 18, 2026 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants