iframe-proxy

zonglinpeng · 2026-06-24T23:40:03Z

Summary:

When the input and output buffers are 16-byte aligned (dequant_simd_aligned), the per-tensor path runs an inline PDX SIMD loop (xb_vecMxf32/xb_vecMx32/PDX_MUL_MXF32); otherwise it falls back to the NNLib path (xa_nn_elm_dequantize_*). The result is numerically identical to the original op — the same float-domain affine (x - zero_point) * scale.

pytorch-bot · 2026-06-24T23:40:07Z

linux-foundation-easycla · 2026-06-24T23:40:10Z

The committers listed above are authorized under a signed CLA.

✅ login: zonglinpeng / name: Zonglin Peng (3772ece)

meta-codesync · 2026-06-24T23:40:10Z

…20499) Summary: Recreates the optimized Fusion-G3 dequantize from D108798741, but instead of shipping it as a separate devmate kernel wired in through `operator_fallback.bzl`, it places the PDX SIMD fast path directly into the existing executorch operator `dequantize_per_tensor_out` in `executorch/backends/cadence/fusion_g3/operators/op_dequantize.cpp` (per-tensor function only; `per_channel`/`tensor`/`tensor_args` variants are untouched). When the input and output buffers are 16-byte aligned (`dequant_simd_aligned`), the per-tensor path runs an inline PDX SIMD loop (`xb_vecMxf32`/`xb_vecMx32`/`PDX_MUL_MXF32`); otherwise it falls back to the NNLib path (`xa_nn_elm_dequantize_*`). The result is numerically identical to the original op — the same float-domain affine `(x - zero_point) * scale`. This intentionally does NOT include the mvartanian integer-subtract change (D109458111, `PDX_SUB_MX32`); it uses the float-domain asymmetric path from D108798741 as requested. The macro fast paths (`ASYM_DEQUANTIZE_IMPL_CHANNEL`/`SYM_DEQUANTIZE_IMPL_CHANNEL`) get the `static_cast<CTYPE_OUT>((x - zp) * scale)` parenthesization required to build clean under the G3 `dev` mode's `-Werror,-Wdouble-promotion`. For A/B measurement this also adds `op_dequantize_baseline.cpp` under the Jarvis operator test dir: a benchmark-only snapshot of the ORIGINAL executorch op (pre-SIMD, with only the `-Wdouble-promotion` fix). It defines `impl::G3::native::dequantize_per_tensor_out`, so the shared benchmark source from D109441948 is linked into two binaries — `_optimized` (against the real executorch op) and `_stock` (against the snapshot) — and compared on the cycle-accurate G3 ISS. `operators_header` visibility is extended to the Jarvis test package so the snapshot can include `operators.h`. Reviewed By: mvartani-meta Differential Revision: D109500113

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 24, 2026

meta-codesync Bot added the meta-exported label Jun 24, 2026

meta-codesync Bot temporarily deployed to cadence June 24, 2026 23:40 Inactive

mvartani-meta approved these changes Jun 25, 2026

View reviewed changes

meta-codesync Bot changed the title ~~Inline per-tensor SIMD fast path in fusion_g3 op_dequantize (recreate D108798741)~~ Inline per-tensor SIMD fast path in fusion_g3 op_dequantize (#20499) Jun 25, 2026

zonglinpeng force-pushed the export-D109500113 branch from 60754f2 to 3772ece Compare June 25, 2026 18:58

zonglinpeng added the release notes: cadence Changes to the Cadence backend delegate label Jun 25, 2026

zonglinpeng temporarily deployed to cadence June 25, 2026 19:01 — with GitHub Actions Inactive

aliafzal approved these changes Jun 25, 2026

View reviewed changes

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inline per-tensor SIMD fast path in fusion_g3 op_dequantize (#20499)#20499

Inline per-tensor SIMD fast path in fusion_g3 op_dequantize (#20499)#20499
zonglinpeng wants to merge 1 commit into
pytorch:mainfrom
zonglinpeng:export-D109500113

zonglinpeng commented Jun 24, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

linux-foundation-easycla Bot commented Jun 24, 2026 •

edited

Loading

Uh oh!

meta-codesync Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

zonglinpeng commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20499

⏳ 2 Pending, 1 Unrelated Failure

Uh oh!

linux-foundation-easycla Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meta-codesync Bot commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zonglinpeng commented Jun 24, 2026 •

edited

Loading

pytorch-bot Bot commented Jun 24, 2026 •

edited

Loading

linux-foundation-easycla Bot commented Jun 24, 2026 •

edited

Loading