Add support for quantized LeakyReLU by digantdesai · Pull Request #1 · pytorch/executorch · GitHub
Skip to content

Add support for quantized LeakyReLU#1

Closed
digantdesai wants to merge 1 commit into
pytorch:mainfrom
digantdesai:export-D47043207
Closed

Add support for quantized LeakyReLU#1
digantdesai wants to merge 1 commit into
pytorch:mainfrom
digantdesai:export-D47043207

Conversation

@digantdesai

Copy link
Copy Markdown
Contributor

Summary: Also adds support for backend_config

Reviewed By: mcr229

Differential Revision: D47043207

Summary: Also adds support for backend_config

Reviewed By: mcr229

Differential Revision: D47043207

fbshipit-source-id: 509bd4c02eb7ff5d3d47762522debd827bee7240
@facebook-github-bot facebook-github-bot added CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported labels Jun 28, 2023
@facebook-github-bot

Copy link
Copy Markdown
Contributor

@facebook-github-bot

Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D47043207

@facebook-github-bot

Copy link
Copy Markdown
Contributor

digantdesai added a commit to digantdesai/pytorch that referenced this pull request Jun 30, 2023
Summary:
Pull Request resolved: pytorch#104309

X-link: pytorch/executorch#1

Also adds support for backend_config

Test Plan: `buck test fbcode//mode/dev-nosan fbcode//executorch/backends/xnnpack/test:`

Reviewed By: mcr229

Differential Revision: D47043207

fbshipit-source-id: 3e2f7b614713ae5c3fba6ea3056376f15826de17
facebook-github-bot pushed a commit that referenced this pull request Jun 30, 2023
Summary:
X-link: pytorch/pytorch#104309

Pull Request resolved: #1

Also adds support for backend_config

Reviewed By: mcr229

Differential Revision: D47043207

fbshipit-source-id: 51abd266bba7441c28578f6c58686a3d021d9d2a
junpi3 pushed a commit that referenced this pull request Mar 11, 2024
Before: Each node contains a `UniformParamsBuffer`.
After: Each node contains a `std::vector<std::shared_ptr<UniformParamsBuffer>>`.

In follow up changes, we will break up parameters to be passed via multiple UniformParamsBuffer, since
1. some are tensor-specific (e.g. image extents) and
2. others are operator-specific (e.g. alpha for binary ops).

Hence, we need **`std::vector`**.

We are adding the methods for #1 in #2340. Since #1 and #2 will be owned by different objects, we need **pointers**. Since #1 is owned by `vTensor` which is non-copyable, we can't use unique_ptr so we need **`std::shared_ptr`**.

Differential Revision: [D54691831](https://our.internmc.facebook.com/intern/diff/D54691831/)

[ghstack-poisoned]
junpi3 pushed a commit that referenced this pull request Mar 11, 2024
Before: Each node contains a `UniformParamsBuffer`.
After: Each node contains a `std::vector<std::shared_ptr<UniformParamsBuffer>>`.

In follow up changes, we will break up parameters to be passed via multiple UniformParamsBuffer, since
1. some are tensor-specific (e.g. image extents) and
2. others are operator-specific (e.g. alpha for binary ops).

Hence, we need **`std::vector`**.

We are adding the methods for #1 in #2340. Since #1 and #2 will be owned by different objects, we need **pointers**. Since #1 is owned by `vTensor` which is non-copyable, we can't use unique_ptr so we need **`std::shared_ptr`**.

Differential Revision: [D54691831](https://our.internmc.facebook.com/intern/diff/D54691831/)

ghstack-source-id: 218195447
Pull Request resolved: #2348
facebook-github-bot pushed a commit that referenced this pull request Mar 11, 2024
Summary:
bypass-github-export-checks

Pull Request resolved: #2348

Before: Each node contains a `UniformParamsBuffer`.
After: Each node contains a `std::vector<std::shared_ptr<UniformParamsBuffer>>`.

In follow up changes, we will break up parameters to be passed via multiple UniformParamsBuffer, since
1. some are tensor-specific (e.g. image extents) and
2. others are operator-specific (e.g. alpha for binary ops).

Hence, we need **`std::vector`**.

We are adding the methods for #1 in #2340. Since #1 and #2 will be owned by different objects, we need **pointers**. Since #1 is owned by `vTensor` which is non-copyable, we can't use unique_ptr so we need **`std::shared_ptr`**.
ghstack-source-id: 218195447
exported-using-ghexport

Reviewed By: SS-JIA

Differential Revision: D54691831

fbshipit-source-id: 84ab9f777e247fd56234290ed7f7343b9701c73f
junpi3 pushed a commit that referenced this pull request Mar 13, 2024
In #2271, we already added
- IntList
- DoubleList
- BoolList
- ValueList

to the schema and the runtime's Value class. Their serialization was incomplete missing two components:
1. Receiving a list in `torch.fx.Node.args`.
2. Receiving a non-tensor in `torch.fx.Node`.

This change completes #1.


Also, this change fixes a bug where values type `bool` matches both types `bool` and `int` and hence were being added twice.

If our type support grows more complex, we can consider using our own types similar to the core Executorch runtime: https://github.com/pytorch/executorch/blob/689796499024fc4a133318d707f4c10db73da967/exir/emit/_emitter.py#L158-L166

Differential Revision: [D54708353](https://our.internmc.facebook.com/intern/diff/D54708353/)

[ghstack-poisoned]
facebook-github-bot pushed a commit that referenced this pull request Mar 13, 2024
Summary:
bypass-github-export-checks

Pull Request resolved: #2404

In #2271, we already added
- IntList
- DoubleList
- BoolList
- ValueList

to the schema and the runtime's Value class. Their serialization was incomplete missing two components:
1. Receiving a list in `torch.fx.Node.args`.
2. Receiving a non-tensor in `torch.fx.Node`.

This change completes #1.

Also, this change fixes a bug where values type `bool` matches both types `bool` and `int` and hence were being added twice.

If our type support grows more complex, we can consider using our own types similar to the core Executorch runtime: https://github.com/pytorch/executorch/blob/689796499024fc4a133318d707f4c10db73da967/exir/emit/_emitter.py#L158-L166
ghstack-source-id: 218539049
exported-using-ghexport

Reviewed By: SS-JIA

Differential Revision: D54708353

fbshipit-source-id: 8641647b515e201ea63db67115c01c1532ad6566
8Keep added a commit to 8Keep/executorch that referenced this pull request May 29, 2024
Reviewed By: itamaro

Differential Revision: D51566750
8Keep added a commit to 8Keep/executorch that referenced this pull request May 29, 2024
Summary: Pull Request resolved: pytorch#3763

Reviewed By: itamaro

Differential Revision: D51566750
facebook-github-bot pushed a commit that referenced this pull request May 29, 2024
Summary: Pull Request resolved: #3763

Reviewed By: itamaro, tarun292

Differential Revision: D51566750

fbshipit-source-id: 654c426d479833867e93083e9b55786abfc24a32
haowhsu-quic referenced this pull request in CodeLinaro/executorch Jul 16, 2024
haowhsu-quic referenced this pull request in CodeLinaro/executorch Jul 17, 2024
Gasoonjia added a commit that referenced this pull request Jun 5, 2026
…) (g4-vision-cuda)

Second branch of the g4-vision decomposition (parent: g4-vision-quant). Enables
vision on the CUDA backend end-to-end. The exported CUDA contract changes from
main's token prefill/decode to the embeddings-based 4-method form; MLX stays
text-only this branch (vision lands in g4-vision-mlx).

Scope
-----
export.py:
- _export_cuda now exports the 4-method contract: embed_text (tokens -> embeds),
  vision_encoder (pixel_values, pixel_position_ids -> image_embeds, pooler_mask;
  dynamic vision_num_groups dim), prefill (inputs_embeds), and decode (tokens).
  Adds get_max_vision_soft_tokens constant + --max-vision-soft-tokens CLI arg.
  The branch-1 CUDA fake-prefill wrapper is removed.
- _export_mlx KEEPS main's single token-input `forward` contract via the
  fake-prefill wrapper and still drops the vision head (MLX text-only this
  branch). mlx_source_transformations.py is unchanged (still main's version).

main.cpp — backend fork:
- #ifdef EXECUTORCH_BUILD_CUDA: embed_text -> (optional vision_encoder splice) ->
  prefill -> decode flow with on-device sampling, plus the --image_path path
  (stb image load + patchify + vision_encoder). The stb includes + image path
  are CUDA-gated.
- #else (MLX): main's token-input `forward` text-only path with host sampling
  (llm::logits_to_token). --image_path errors out on the MLX build this branch.
- Review finding #1 fix: the C++ image splice now reads the vision_encoder
  pooler_mask and inserts one <image> placeholder per VALID soft token, skipping
  invalid/padded rows during the splice — matching inference.py exactly (the old
  code copied N rows sequentially and ignored the mask).

CMakeLists.txt:
- stb_image / stb_image_resize fetch is added, gated on EXECUTORCH_BUILD_CUDA
  (MLX build needs no stb this branch).

R2 (C++ half) — shared chat-template:
- New examples/models/gemma4/runner/chat_template.h (#pragma once, namespace
  executorch::examples::gemma4): the 6 special-token IDs + a templated
  build_vision_input_ids(tokenizer, prompt, num_vision_tokens, bos_id). Values
  mirror the Python examples/models/gemma4/chat_template.py.
- gemma4_31b/main.cpp includes it (its local token-ID constants + the local
  build_vision_input_ids copy are removed).
- gemma4/runner/gemma4_runner.{h,cpp} single-source the chat-template IDs from
  the header and delegate build_vision_input_ids to the shared free function
  (audio/eos ids stay local). Behavior-identical (values were already equal).

CI:
- .github/workflows/cuda.yml, .ci/scripts/export_model_artifact.sh,
  .ci/scripts/test_model_e2e.sh: point at the vision checkpoint and run the CUDA
  image smoke (describe docs/source/_static/img/et-logo.png, EXPECTED_OUTPUT
  "chip"). mlx.yml stays text-only (MLX vision is branch 3).

Verification
------------
- CPU: pytest test_vision_tower.py test_vision_quant_roundtrip.py
  test_pipeline.py -> 20 passed.
- CUDA (A100): pytest test_cuda_pipeline.py -> 8 passed, including both export
  tests now producing the 4-method contract on a tiny model, plus the
  chunked-prefill / int4 inference contract tests.
- 31B 4-method export from the real prequant checkpoint
  (/home/gasoonjia/models/gemma-4-31B-it-int4) was confirmed to export all four
  methods (prefill on inputs_embeds, decode on tokens, embed_text, vision_encoder
  P in [9,2520]) and enter CUDA lowering; the host-side lowering/serialization
  step was killed by the dev box's oomd (other processes held ~260/353 GB), an
  environment limit, not a code issue. The identical code path serializes
  successfully at tiny scale (test_cuda_pipeline).
- flake8 + ufmt clean (python); clang-format clean (C++); chat_template.h parses.
- The C++ runner build + on-device image e2e could not be run locally (the
  installed wheel ships no static libs to link the runner, and a full CUDA core
  build is impractical under the box's memory pressure); covered by cuda.yml CI.
  MLX export/runtime e2e is run by the user on macOS.
Gasoonjia pushed a commit that referenced this pull request Jun 12, 2026
Block-sparse early-exit in _sdpa_fwd_kernel_body: skip KV blocks that are
entirely masked (sliding-window via HAS_MASK sum==0, causal via start_n>max_seq_pos).
Exact (skipped blocks are x1,+0 no-ops). Prefill +46-88% all lengths; decode safe;
SDPA nsys 58.1%->18.5%. Numerically bf16-exact vs dense+mask (unit test).
Gasoonjia added a commit that referenced this pull request Jun 16, 2026
Block-sparse early-exit in _sdpa_fwd_kernel_body: skip KV blocks that are
entirely masked (sliding-window via HAS_MASK sum==0, causal via start_n>max_seq_pos).
Exact (skipped blocks are x1,+0 no-ops). Prefill +46-88% all lengths; decode safe;
SDPA nsys 58.1%->18.5%. Numerically bf16-exact vs dense+mask (unit test).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants