Manual merge of PRs #20394–#20397 (slice_copy + permute_copy) by JulianCloudNTH · Pull Request #20550 · pytorch/executorch · GitHub
Skip to content

Manual merge of PRs #20394–#20397 (slice_copy + permute_copy)#20550

Merged
JulianCloudNTH merged 5 commits into
pytorch:mainfrom
JulianCloudNTH:webgpu-slice-permute-manual-merge
Jun 26, 2026
Merged

Manual merge of PRs #20394–#20397 (slice_copy + permute_copy)#20550
JulianCloudNTH merged 5 commits into
pytorch:mainfrom
JulianCloudNTH:webgpu-slice-permute-manual-merge

Conversation

@JulianCloudNTH

@JulianCloudNTH JulianCloudNTH commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Manual merge of four WebGPU-delegate op PRs that landed internally but could not auto-merge
to main. These are stacked ghstack PRs — when the lower PRs in the stack merged, their head
branches were deleted and these four PRs' base branches were orphaned, so the orig-PR
proposer failed with 422 base invalid. This PR re-lands the same four commits (identical
content to the originals, flat test layout) as a clean stack on top of current main:

  • #20394 — Add slice_copy op
    (aten.slice_copy.Tensor)
  • #20395slice_copy op test suite
    (cases.py op-test framework)
  • #20396 — Add permute_copy + IntList
    graph support (aten.permute_copy.default)
  • #20397permute_copy op test suite
    (cases.py op-test framework)

Test plan

Each op ships with its cases.py op-test suite (exported via VulkanPartitioner, compared
to a torch golden on Dawn) plus an export-delegation smoke test, exercised by the WebGPU
op-test CI (etvk-*). Verified internally; content is identical to the original four PRs.

@diff-train-skip-merge

Pull Request resolved: pytorch#20394

Adds `aten.slice_copy.Tensor` to the WebGPU delegate as a gather: each output element is mapped back to its source input element along the sliced dim via `start + coord * step`.

Composition (single compute dispatch):
- `runtime/ops/slice/Slice.cpp` — reads `args = [self, dim, start, end, step, out]` via `read_scalar` (static `Int`/`Null`-sentinel default; throws on dynamic `SymInt`); normalizes negative `dim`/`start`, clamps `start` to `[0, in_size]`; builds two `TensorMeta` UBOs + a `SliceParams{dim, start, step}` uniform; guards fp32; dispatches over `compute_1d_workgroup_count(out.numel)` with `override wg_size`; releases all uniforms after the bind group.
- `runtime/ops/slice/slice.wgsl` — delinearizes the output index over the contiguous output strides, maps the sliced-dim coordinate back to the input (`start + coord*step`), relinearizes over the input strides.
ghstack-source-id: 397026527
@exported-using-ghexport

Differential Revision: [D108793168](https://our.internmc.facebook.com/intern/diff/D108793168/)
…work)

Pull Request resolved: pytorch#20395

Registers `aten.slice_copy.Tensor` in the `cases.py` op-test framework: a `_slice_suite` of 4 configs (leading-dim slice `[:,1:5]`, last-dim slice `[...,1:3]`, step-2 `[:,0:8:2]`, negative-end `[:,1:-1]`) that `generate_op_tests` exports via `VulkanPartitioner` and compares to a torch golden on Dawn. Also adds `test/ops/slice/test_slice.py` (`SliceModule` + `CONFIGS` + export-delegation/eager smoke test) and the `aten.slice_copy.Tensor` partitioner-allowlist entry in `tester.py`.
ghstack-source-id: 397026537
@exported-using-ghexport

Differential Revision: [D108793151](https://our.internmc.facebook.com/intern/diff/D108793151/)
…ermute_copy.default)

Pull Request resolved: pytorch#20396

Adds `aten.permute_copy.default` (a coordinate-reorder gather) to the WebGPU delegate, and the `IntList` graph value type it needs to read its `dims` argument.

Composition:
- `runtime/WebGPUGraph.{h,cpp}` — adds `ValueType::IntList` backed by `std::vector<std::vector<int64_t>> int_lists_` + `get_int_list(int)`; `build()` deserializes `vkgraph::GraphTypes::IntList` via `value_as_IntList()->items()` (int64, matching the FlatBuffer `[long]`); mirrors the existing scalar value plumbing.
- `runtime/ops/permute/Permute.cpp` — reads the permutation via `get_int_list`, normalizes negative dims, validates it is a permutation of `[0, ndim)`, builds two `TensorMeta` UBOs + a `PermuteParams{perm: vec4<u32>}` uniform, guards fp32 + rank≤4, dispatches over `compute_1d_workgroup_count(out.numel)` with `override wg_size`; releases all uniforms after the bind group.
- `runtime/ops/permute/permute.wgsl` — delinearizes the output index over the contiguous output strides, reads `input` at `in.strides[perm[d]]` per dim (mirrors Vulkan `permute_buffer.glsl`).
- Registers both `aten.permute_copy.default` and `aten.permute.default` to the same handler.
ghstack-source-id: 397026548
@exported-using-ghexport

Differential Revision: [D108793162](https://our.internmc.facebook.com/intern/diff/D108793162/)
…mework)

Pull Request resolved: pytorch#20397

Registers `aten.permute_copy.default` in the `cases.py` op-test framework: a `_permute_suite` of 4 configs (3D rotation, 4D middle-dim transpose, 2D transpose, full 4D shuffle) that `generate_op_tests` exports via `VulkanPartitioner` and compares to a torch golden on Dawn. Also adds `test/ops/permute/test_permute.py` (`PermuteModule` + `CONFIGS` + `_op_delegated` smoke test) and the `aten.permute_copy.default` partitioner-allowlist entry in `tester.py`.
ghstack-source-id: 397026550
@exported-using-ghexport

Differential Revision: [D108793156](https://our.internmc.facebook.com/intern/diff/D108793156/)
@pytorch-bot

pytorch-bot Bot commented Jun 26, 2026

Copy link
Copy Markdown

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 26, 2026
@github-actions

Copy link
Copy Markdown

@psiddh psiddh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving this to unblock the diff train

@JulianCloudNTH JulianCloudNTH merged commit a03f97b into pytorch:main Jun 26, 2026
181 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants