iframe-proxy

digantdesai · 2023-06-28T00:11:41Z

Summary: Also adds support for backend_config

Reviewed By: mcr229

Differential Revision: D47043207

Summary: Also adds support for backend_config Reviewed By: mcr229 Differential Revision: D47043207 fbshipit-source-id: 509bd4c02eb7ff5d3d47762522debd827bee7240

facebook-github-bot · 2023-06-28T00:12:38Z

facebook-github-bot · 2023-06-28T19:14:35Z

This pull request was exported from Phabricator. Differential Revision: D47043207

facebook-github-bot · 2023-06-30T05:13:04Z

Summary: Pull Request resolved: pytorch#104309 X-link: pytorch/executorch#1 Also adds support for backend_config Test Plan: `buck test fbcode//mode/dev-nosan fbcode//executorch/backends/xnnpack/test:` Reviewed By: mcr229 Differential Revision: D47043207 fbshipit-source-id: 3e2f7b614713ae5c3fba6ea3056376f15826de17

Summary: X-link: pytorch/pytorch#104309 Pull Request resolved: #1 Also adds support for backend_config Reviewed By: mcr229 Differential Revision: D47043207 fbshipit-source-id: 51abd266bba7441c28578f6c58686a3d021d9d2a

Before: Each node contains a `UniformParamsBuffer`. After: Each node contains a `std::vector<std::shared_ptr<UniformParamsBuffer>>`. In follow up changes, we will break up parameters to be passed via multiple UniformParamsBuffer, since 1. some are tensor-specific (e.g. image extents) and 2. others are operator-specific (e.g. alpha for binary ops). Hence, we need **`std::vector`**. We are adding the methods for #1 in #2340. Since #1 and #2 will be owned by different objects, we need **pointers**. Since #1 is owned by `vTensor` which is non-copyable, we can't use unique_ptr so we need **`std::shared_ptr`**. Differential Revision: [D54691831](https://our.internmc.facebook.com/intern/diff/D54691831/) [ghstack-poisoned]

Before: Each node contains a `UniformParamsBuffer`. After: Each node contains a `std::vector<std::shared_ptr<UniformParamsBuffer>>`. In follow up changes, we will break up parameters to be passed via multiple UniformParamsBuffer, since 1. some are tensor-specific (e.g. image extents) and 2. others are operator-specific (e.g. alpha for binary ops). Hence, we need **`std::vector`**. We are adding the methods for #1 in #2340. Since #1 and #2 will be owned by different objects, we need **pointers**. Since #1 is owned by `vTensor` which is non-copyable, we can't use unique_ptr so we need **`std::shared_ptr`**. Differential Revision: [D54691831](https://our.internmc.facebook.com/intern/diff/D54691831/) ghstack-source-id: 218195447 Pull Request resolved: #2348

Summary: bypass-github-export-checks Pull Request resolved: #2348 Before: Each node contains a `UniformParamsBuffer`. After: Each node contains a `std::vector<std::shared_ptr<UniformParamsBuffer>>`. In follow up changes, we will break up parameters to be passed via multiple UniformParamsBuffer, since 1. some are tensor-specific (e.g. image extents) and 2. others are operator-specific (e.g. alpha for binary ops). Hence, we need **`std::vector`**. We are adding the methods for #1 in #2340. Since #1 and #2 will be owned by different objects, we need **pointers**. Since #1 is owned by `vTensor` which is non-copyable, we can't use unique_ptr so we need **`std::shared_ptr`**. ghstack-source-id: 218195447 exported-using-ghexport Reviewed By: SS-JIA Differential Revision: D54691831 fbshipit-source-id: 84ab9f777e247fd56234290ed7f7343b9701c73f

In #2271, we already added - IntList - DoubleList - BoolList - ValueList to the schema and the runtime's Value class. Their serialization was incomplete missing two components: 1. Receiving a list in `torch.fx.Node.args`. 2. Receiving a non-tensor in `torch.fx.Node`. This change completes #1. Also, this change fixes a bug where values type `bool` matches both types `bool` and `int` and hence were being added twice. If our type support grows more complex, we can consider using our own types similar to the core Executorch runtime: https://github.com/pytorch/executorch/blob/689796499024fc4a133318d707f4c10db73da967/exir/emit/_emitter.py#L158-L166 Differential Revision: [D54708353](https://our.internmc.facebook.com/intern/diff/D54708353/) [ghstack-poisoned]

Summary: bypass-github-export-checks Pull Request resolved: #2404 In #2271, we already added - IntList - DoubleList - BoolList - ValueList to the schema and the runtime's Value class. Their serialization was incomplete missing two components: 1. Receiving a list in `torch.fx.Node.args`. 2. Receiving a non-tensor in `torch.fx.Node`. This change completes #1. Also, this change fixes a bug where values type `bool` matches both types `bool` and `int` and hence were being added twice. If our type support grows more complex, we can consider using our own types similar to the core Executorch runtime: https://github.com/pytorch/executorch/blob/689796499024fc4a133318d707f4c10db73da967/exir/emit/_emitter.py#L158-L166 ghstack-source-id: 218539049 exported-using-ghexport Reviewed By: SS-JIA Differential Revision: D54708353 fbshipit-source-id: 8641647b515e201ea63db67115c01c1532ad6566

Reviewed By: itamaro Differential Revision: D51566750

Summary: Pull Request resolved: pytorch#3763 Reviewed By: itamaro Differential Revision: D51566750

Summary: Pull Request resolved: #3763 Reviewed By: itamaro, tarun292 Differential Revision: D51566750 fbshipit-source-id: 654c426d479833867e93083e9b55786abfc24a32

…) (g4-vision-cuda) Second branch of the g4-vision decomposition (parent: g4-vision-quant). Enables vision on the CUDA backend end-to-end. The exported CUDA contract changes from main's token prefill/decode to the embeddings-based 4-method form; MLX stays text-only this branch (vision lands in g4-vision-mlx). Scope ----- export.py: - _export_cuda now exports the 4-method contract: embed_text (tokens -> embeds), vision_encoder (pixel_values, pixel_position_ids -> image_embeds, pooler_mask; dynamic vision_num_groups dim), prefill (inputs_embeds), and decode (tokens). Adds get_max_vision_soft_tokens constant + --max-vision-soft-tokens CLI arg. The branch-1 CUDA fake-prefill wrapper is removed. - _export_mlx KEEPS main's single token-input `forward` contract via the fake-prefill wrapper and still drops the vision head (MLX text-only this branch). mlx_source_transformations.py is unchanged (still main's version). main.cpp — backend fork: - #ifdef EXECUTORCH_BUILD_CUDA: embed_text -> (optional vision_encoder splice) -> prefill -> decode flow with on-device sampling, plus the --image_path path (stb image load + patchify + vision_encoder). The stb includes + image path are CUDA-gated. - #else (MLX): main's token-input `forward` text-only path with host sampling (llm::logits_to_token). --image_path errors out on the MLX build this branch. - Review finding #1 fix: the C++ image splice now reads the vision_encoder pooler_mask and inserts one <image> placeholder per VALID soft token, skipping invalid/padded rows during the splice — matching inference.py exactly (the old code copied N rows sequentially and ignored the mask). CMakeLists.txt: - stb_image / stb_image_resize fetch is added, gated on EXECUTORCH_BUILD_CUDA (MLX build needs no stb this branch). R2 (C++ half) — shared chat-template: - New examples/models/gemma4/runner/chat_template.h (#pragma once, namespace executorch::examples::gemma4): the 6 special-token IDs + a templated build_vision_input_ids(tokenizer, prompt, num_vision_tokens, bos_id). Values mirror the Python examples/models/gemma4/chat_template.py. - gemma4_31b/main.cpp includes it (its local token-ID constants + the local build_vision_input_ids copy are removed). - gemma4/runner/gemma4_runner.{h,cpp} single-source the chat-template IDs from the header and delegate build_vision_input_ids to the shared free function (audio/eos ids stay local). Behavior-identical (values were already equal). CI: - .github/workflows/cuda.yml, .ci/scripts/export_model_artifact.sh, .ci/scripts/test_model_e2e.sh: point at the vision checkpoint and run the CUDA image smoke (describe docs/source/_static/img/et-logo.png, EXPECTED_OUTPUT "chip"). mlx.yml stays text-only (MLX vision is branch 3). Verification ------------ - CPU: pytest test_vision_tower.py test_vision_quant_roundtrip.py test_pipeline.py -> 20 passed. - CUDA (A100): pytest test_cuda_pipeline.py -> 8 passed, including both export tests now producing the 4-method contract on a tiny model, plus the chunked-prefill / int4 inference contract tests. - 31B 4-method export from the real prequant checkpoint (/home/gasoonjia/models/gemma-4-31B-it-int4) was confirmed to export all four methods (prefill on inputs_embeds, decode on tokens, embed_text, vision_encoder P in [9,2520]) and enter CUDA lowering; the host-side lowering/serialization step was killed by the dev box's oomd (other processes held ~260/353 GB), an environment limit, not a code issue. The identical code path serializes successfully at tiny scale (test_cuda_pipeline). - flake8 + ufmt clean (python); clang-format clean (C++); chat_template.h parses. - The C++ runner build + on-device image e2e could not be run locally (the installed wheel ships no static libs to link the runner, and a full CUDA core build is impractical under the box's memory pressure); covered by cuda.yml CI. MLX export/runtime e2e is run by the user on macOS.

Block-sparse early-exit in _sdpa_fwd_kernel_body: skip KV blocks that are entirely masked (sliding-window via HAS_MASK sum==0, causal via start_n>max_seq_pos). Exact (skipped blocks are x1,+0 no-ops). Prefill +46-88% all lengths; decode safe; SDPA nsys 58.1%->18.5%. Numerically bf16-exact vs dense+mask (unit test).

Add support for quantized LeakyReLU

e276b24

Summary: Also adds support for backend_config Reviewed By: mcr229 Differential Revision: D47043207 fbshipit-source-id: 509bd4c02eb7ff5d3d47762522debd827bee7240

facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Jun 28, 2023

digantdesai closed this Jun 28, 2023

digantdesai mentioned this pull request Oct 6, 2023

[TOSA] Replace Linear lowering of using Matmul with FullyConnected #616

Closed

adonnini mentioned this pull request Nov 29, 2023

Android app fails with ETensor rank is immutable error #1306

Closed

junpi3 mentioned this pull request Mar 11, 2024

[ET-VK] Support multiple UniformParamsBuffer #2348

Closed

junpi3 mentioned this pull request Mar 13, 2024

[ET-VK] Serialize list types from function args #2404

Closed

8Keep added a commit to 8Keep/executorch that referenced this pull request May 29, 2024

Mass migrate to pybind11 2.10.4 pytorch#1

34d1c3a

Reviewed By: itamaro Differential Revision: D51566750

8Keep added a commit to 8Keep/executorch that referenced this pull request May 29, 2024

Mass migrate to pybind11 2.10.4 pytorch#1 (pytorch#3763)

2c8a29d

Summary: Pull Request resolved: pytorch#3763 Reviewed By: itamaro Differential Revision: D51566750

BESTTOOLBOX mentioned this pull request Jul 12, 2024

Segmentation Fault when implementing llama/stories110M Android phone deployment #4237

Closed

haowhsu-quic referenced this pull request in CodeLinaro/executorch Jul 16, 2024

apply review comments #1

8cd43ed

haowhsu-quic referenced this pull request in CodeLinaro/executorch Jul 17, 2024

apply review comments #1

c80c123

sam-wother mentioned this pull request Aug 1, 2024

SDK + Inspector output time format is inconsistent with delegates #4504

Closed

claude Bot mentioned this pull request Jun 4, 2026

Add Apple-accelerated implementations for ImageProcessor (#20037) #20037

Merged

This was referenced Jun 8, 2026

[ExecuTorch][WebGPU] Add update_cache op (llama.update_cache) #20083

Merged

Qualcomm AI Engine Direct - Support Windows native build #20052

Merged

Add OSS CI to cross-compile and run the Cadence Xtensa backend (#20208) #20208

Merged

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add support for quantized LeakyReLU#1

Add support for quantized LeakyReLU#1
digantdesai wants to merge 1 commit into
pytorch:mainfrom
digantdesai:export-D47043207

digantdesai commented Jun 28, 2023

Uh oh!

facebook-github-bot commented Jun 28, 2023

Uh oh!

facebook-github-bot commented Jun 28, 2023

Uh oh!

facebook-github-bot commented Jun 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

digantdesai commented Jun 28, 2023

Uh oh!

facebook-github-bot commented Jun 28, 2023

Uh oh!

facebook-github-bot commented Jun 28, 2023

Uh oh!

facebook-github-bot commented Jun 30, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants