iframe-proxy

bobrenjc93 · 2026-04-23T18:58:45Z

Stack from ghstack (oldest at bottom):

Collapse _RuntimeCompiledFnInvoker.run,
_RuntimeForwardEpilogue.capture_orig_inputs,
increment_mutation_versions, and finalize into a single codegen'd
function with all branches resolved at compile time.

The generated function inlines:

capture_orig_inputs: dict comprehension → baked {idx: args[idx]} literal
increment_mutation_versions: conditional + generator → baked tuple
compiled_invoker.run: trace_joint branch + detach indices inlined
output arity validation: baked expected count
split mutated inputs: baked slice index
apply mutations / replay aliases: delegate to existing codegen'd functions
dynamic dims: baked per-output dim sets
grad_enabled_mutation: baked boolean

Generated code for inference (0 mutations, 1 alias, 1 input):

def _runtime_wrapper(_compiled_fn_, _first_ctx_, _on_before_call_, args):
    orig_inputs = {0: args[0]}
    with _first_ctx_():
        grad_enabled = torch.is_grad_enabled()
        try:
            if grad_enabled: torch._C._set_grad_enabled(False)
            _on_before_call_()
            all_outs = _normalize_as_list_(_compiled_fn_(args))
        finally:
            if grad_enabled: torch._C._set_grad_enabled(True)
    del args
    if len(all_outs) != 1:
        raise AssertionError(...)
    fw_outs = all_outs
    ret_outs = _replay_aliases_(orig_inputs, fw_outs)
    return ret_outs

RuntimeWrapper orchestration step in isolation (us/call):

Collapse _RuntimeCompiledFnInvoker.run, _RuntimeForwardEpilogue.capture_orig_inputs, increment_mutation_versions, and finalize into a single codegen'd function with all branches resolved at compile time. The generated function inlines: - capture_orig_inputs: dict comprehension → baked {idx: args[idx]} literal - increment_mutation_versions: conditional + generator → baked tuple - compiled_invoker.run: trace_joint branch + detach indices inlined - output arity validation: baked expected count - split mutated inputs: baked slice index - apply mutations / replay aliases: delegate to existing codegen'd functions - dynamic dims: baked per-output dim sets - grad_enabled_mutation: baked boolean Generated code for inference (0 mutations, 1 alias, 1 input): def _runtime_wrapper(_compiled_fn_, _first_ctx_, _on_before_call_, args): orig_inputs = {0: args[0]} with _first_ctx_(): grad_enabled = torch.is_grad_enabled() try: if grad_enabled: torch._C._set_grad_enabled(False) _on_before_call_() all_outs = _normalize_as_list_(_compiled_fn_(args)) finally: if grad_enabled: torch._C._set_grad_enabled(True) del args if len(all_outs) != 1: raise AssertionError(...) fw_outs = all_outs ret_outs = _replay_aliases_(orig_inputs, fw_outs) return ret_outs RuntimeWrapper orchestration step in isolation (us/call): | Case | Before (method dispatch) | After (codegen) | Speedup | |---|---|---|---| | 0 alias, 0 mut, 5 args | 0.35 us | 0.17 us | 2.1x | | 2 alias, 0 mut, 5 args | 0.41 us | 0.17 us | 2.5x | | 0 alias, 2 mut, 5 args | 0.74 us | 0.25 us | 3.0x | | 3 alias, 1 mut, 10 args | 0.79 us | 0.25 us | 3.2x | | 5 alias, 3 mut, 20 args | 0.93 us | 0.32 us | 2.9x | [ghstack-poisoned]

pytorch-bot · 2026-04-23T18:58:49Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/181271

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull & trunk workflows in PyTorch main

❌ 15 New Failures

As of commit cdcd85d with merge base b627bfb ():

NEW FAILURES - The following jobs have failed:

pull / linux-jammy-aarch64-py3.10 / test (default, 1, 5, lf.linux.arm64.m8g.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestPartitioning::test_force_save_effectful_ops
pull / linux-jammy-aarch64-py3.10 / test (default, 2, 5, lf.linux.arm64.m8g.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestPartitioning::test_force_save_effectful_ops_nested_tuple
pull / linux-jammy-py3.10-clang18 / test (crossref, 2, 2, lf.linux.2xlarge) (gh)
test/functorch/test_aotdispatch.py::TestPartitioning::test_force_save_effectful_ops_nested_tuple
pull / linux-jammy-py3.10-clang18 / test (default, 1, 5, lf.linux.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestPartitioning::test_force_save_effectful_ops
pull / linux-jammy-py3.10-clang18 / test (default, 2, 5, lf.linux.4xlarge) (gh)
test/dynamo/test_structured_trace.py::StructuredTraceTest::test_codecache
pull / linux-jammy-py3.10-clang18-asan / test (default, 2, 7, lf.linux.4xlarge) (gh)
test/dynamo/test_structured_trace.py::StructuredTraceTest::test_codecache
pull / linux-jammy-py3.10-clang18-asan / test (default, 6, 7, lf.linux.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestPartitioning::test_force_save_effectful_ops_nested_tuple
pull / linux-jammy-py3.10-gcc11 / test (default, 1, 5, lf.linux.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestPartitioning::test_force_save_effectful_ops
pull / linux-jammy-py3.10-gcc11 / test (default, 2, 5, lf.linux.4xlarge) (gh)
test/dynamo/test_structured_trace.py::StructuredTraceTest::test_codecache
pull / linux-jammy-py3.14-clang18 / test (crossref, 2, 2, lf.linux.2xlarge) (gh)
test/functorch/test_aotdispatch.py::TestPartitioning::test_force_save_effectful_ops_nested_tuple
pull / linux-jammy-py3.14-clang18 / test (default, 2, 5, lf.linux.4xlarge) (gh)
test/dynamo/test_structured_trace.py::StructuredTraceTest::test_codecache
pull / linux-jammy-py3.14-clang18 / test (default, 3, 5, lf.linux.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestPartitioning::test_force_save_effectful_ops_nested_tuple
pull / linux-jammy-py3.14t-clang18 / test (crossref, 2, 2, lf.linux.2xlarge) (gh)
test/functorch/test_aotdispatch.py::TestPartitioning::test_force_save_effectful_ops_nested_tuple
pull / linux-jammy-py3.14t-clang18 / test (default, 1, 5, lf.linux.4xlarge) (gh)
test/functorch/test_aotdispatch.py::TestPartitioning::test_force_save_effectful_ops
pull / linux-jammy-py3.14t-clang18 / test (default, 2, 5, lf.linux.4xlarge) (gh)
test/dynamo/test_structured_trace.py::StructuredTraceTest::test_codecache

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2026-04-23T18:58:53Z

Collapse _RuntimeCompiledFnInvoker.run, _RuntimeForwardEpilogue.capture_orig_inputs, increment_mutation_versions, and finalize into a single codegen'd function with all branches resolved at compile time. The generated function inlines: - capture_orig_inputs: dict comprehension → baked {idx: args[idx]} literal - increment_mutation_versions: conditional + generator → baked tuple - compiled_invoker.run: trace_joint branch + detach indices inlined - output arity validation: baked expected count - split mutated inputs: baked slice index - apply mutations / replay aliases: delegate to existing codegen'd functions - dynamic dims: baked per-output dim sets - grad_enabled_mutation: baked boolean Generated code for inference (0 mutations, 1 alias, 1 input): def _runtime_wrapper(_compiled_fn_, _first_ctx_, _on_before_call_, args): orig_inputs = {0: args[0]} with _first_ctx_(): grad_enabled = torch.is_grad_enabled() try: if grad_enabled: torch._C._set_grad_enabled(False) _on_before_call_() all_outs = _normalize_as_list_(_compiled_fn_(args)) finally: if grad_enabled: torch._C._set_grad_enabled(True) del args if len(all_outs) != 1: raise AssertionError(...) fw_outs = all_outs ret_outs = _replay_aliases_(orig_inputs, fw_outs) return ret_outs RuntimeWrapper orchestration step in isolation (us/call): | Case | Before (method dispatch) | After (codegen) | Speedup | |---|---|---|---| | 0 alias, 0 mut, 5 args | 0.35 us | 0.17 us | 2.1x | | 2 alias, 0 mut, 5 args | 0.41 us | 0.17 us | 2.5x | | 0 alias, 2 mut, 5 args | 0.74 us | 0.25 us | 3.0x | | 3 alias, 1 mut, 10 args | 0.79 us | 0.25 us | 3.2x | | 5 alias, 3 mut, 20 args | 0.93 us | 0.32 us | 2.9x | [ghstack-poisoned]

Collapse _RuntimeCompiledFnInvoker.run, _RuntimeForwardEpilogue.capture_orig_inputs, increment_mutation_versions, and finalize into a single codegen'd function with all branches resolved at compile time. The generated function inlines: - capture_orig_inputs: dict comprehension → baked {idx: args[idx]} literal - increment_mutation_versions: conditional + generator → baked tuple - compiled_invoker.run: trace_joint branch + detach indices inlined - output arity validation: baked expected count - split mutated inputs: baked slice index - apply mutations / replay aliases: delegate to existing codegen'd functions - dynamic dims: baked per-output dim sets - grad_enabled_mutation: baked boolean Generated code for inference (0 mutations, 1 alias, 1 input): def _runtime_wrapper(_compiled_fn_, _first_ctx_, _on_before_call_, args): orig_inputs = {0: args[0]} with _first_ctx_(): grad_enabled = torch.is_grad_enabled() try: if grad_enabled: torch._C._set_grad_enabled(False) _on_before_call_() all_outs = _normalize_as_list_(_compiled_fn_(args)) finally: if grad_enabled: torch._C._set_grad_enabled(True) del args if len(all_outs) != 1: raise AssertionError(...) fw_outs = all_outs ret_outs = _replay_aliases_(orig_inputs, fw_outs) return ret_outs RuntimeWrapper orchestration step in isolation (us/call): | Case | Before (method dispatch) | After (codegen) | Speedup | |---|---|---|---| | 0 alias, 0 mut, 5 args | 0.35 us | 0.17 us | 2.1x | | 2 alias, 0 mut, 5 args | 0.41 us | 0.17 us | 2.5x | | 0 alias, 2 mut, 5 args | 0.74 us | 0.25 us | 3.0x | | 3 alias, 1 mut, 10 args | 0.79 us | 0.25 us | 3.2x | | 5 alias, 3 mut, 20 args | 0.93 us | 0.32 us | 2.9x | ghstack-source-id: e255dd9 Pull Request resolved: #181271

bobrenjc93 mentioned this pull request Apr 23, 2026

Codegen FunctionalizedRngRuntimeWrapper in aot_autograd #181250

Open

pytorch-bot Bot added the ciflow/inductor label Apr 23, 2026

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codegen RuntimeWrapper orchestration into single function#181271

Codegen RuntimeWrapper orchestration into single function#181271
bobrenjc93 wants to merge 2 commits intogh/bobrenjc93/875/basefrom
gh/bobrenjc93/875/head

bobrenjc93 commented Apr 23, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Case	Before (method dispatch)	After (codegen)	Speedup
0 alias, 0 mut, 5 args	0.35 us	0.17 us	2.1x
2 alias, 0 mut, 5 args	0.41 us	0.17 us	2.5x
0 alias, 2 mut, 5 args	0.74 us	0.25 us	3.0x
3 alias, 1 mut, 10 args	0.79 us	0.25 us	3.2x
5 alias, 3 mut, 20 args	0.93 us	0.32 us	2.9x

Sunbelt Computer Software

PL/B Language Development and Support

Conversation

bobrenjc93 commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/181271

❗ 1 Active SEVs

❌ 15 New Failures

Uh oh!

pytorch-bot Bot commented Apr 23, 2026

This PR needs a release notes: label

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bobrenjc93 commented Apr 23, 2026 •

edited

Loading

pytorch-bot Bot commented Apr 23, 2026 •

edited

Loading

This PR needs a `release notes:` label