[ExecuTorch][WebGPU] Dynamic-shape integration test (allocate-at-max + per-op resize) by JulianCloudNTH · Pull Request #20582 · pytorch/executorch · GitHub
Skip to content

[ExecuTorch][WebGPU] Dynamic-shape integration test (allocate-at-max + per-op resize)#20582

Open
JulianCloudNTH wants to merge 9 commits into
gh/JulianCloudNTH/74/basefrom
gh/JulianCloudNTH/74/head
Open

[ExecuTorch][WebGPU] Dynamic-shape integration test (allocate-at-max + per-op resize)#20582
JulianCloudNTH wants to merge 9 commits into
gh/JulianCloudNTH/74/basefrom
gh/JulianCloudNTH/74/head

Conversation

@JulianCloudNTH

@JulianCloudNTH JulianCloudNTH commented Jun 28, 2026

Copy link
Copy Markdown
Contributor

Stack from ghstack (oldest at bottom):

End-to-end validation that one graph built at the upper-bound seq-len serves every smaller live shape, matching the torch golden.

Problem: the dynamic-resize engine (allocate-at-max buffers + per-op resize hooks + output resize) had unit-level reasoning but no single oracle proving a graph built at S=MAX runs correctly at S<MAX without reallocating buffers (which would invalidate bind groups).

Solution: a native test that builds each toy model at S=MAX and runs it at several live S, asserting the output matches a torch-computed golden and that the output EValue is resized to the live shape.

  • Cases A-D: dynamic + static rms_norm (resize shrinks the dispatch; one reused graph across S proves buffers never move; static path unchanged).
  • Cases F-H: rms(rms(x)) cascade, rms(x)+x (rms->add cascade), rms(x)*x (mul).
  • Cases I-L: dynamic linear_q4gsw (GEMM at several M), sdpa_with_kv_cache (GQA prefill at several S), embedding_q4gsw (int64 ids), apply_rotary_emb (two outputs).
  • Cases M-N: dynamic sigmoid (elementwise) and select_copy(0, -1) (negative index resolved against the live leading dim each call).
  • Graph-reuse variants: every dynamic op above (rms_norm incl. a grow-first smallest→largest order, the rms(rms(x)) cascade, linear_q4gsw, embedding_q4gsw, apply_rotary_emb, sigmoid, select_copy) also runs ONE loaded graph across multiple live shapes — proving buffers never move so bind groups stay valid across every resize.

Implementation:

  • test/ops/dynamic_shape/test_dynamic_shape_export.py exports each toy model through VulkanPartitioner with a dynamic dim and writes per-S torch goldens; reuses the existing op-test helpers for quant/sdpa/embedding/rope.
  • test/native/test_dynamic_shape.cpp loads each .pte, runs each live S, and compares at the per-op tolerance (rms 1e-3, quant 5e-3, sdpa 2e-3). Reuse tests split each per-op helper into load-once + run-at-shape so a single Module serves the whole shape sweep.
  • Multi-output ops select their output by full shape, never numel.

Constraints: numerics computed with torch (no hand-rolled reference); toy models stay within the 65535 1D-dispatch cap; SDPA case is skipped gracefully if sym_size.int/copy_ op coverage is incomplete (does not fail the suite).

Co-authored-with: Claude Code.
@exported-using-ghexport

Differential Revision: D109906090

Differential Revision: D109906090

[ghstack-poisoned]
@pytorch-bot

pytorch-bot Bot commented Jun 28, 2026

Copy link
Copy Markdown

@github-actions

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 28, 2026
@JulianCloudNTH

Copy link
Copy Markdown
Contributor Author

@claude review and check for any areas or opportunities for modularization

@claude

claude Bot commented Jun 29, 2026

Copy link
Copy Markdown

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]

@SS-JIA SS-JIA left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review automatically exported from Phabricator review in Meta.

[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants