iframe-proxy

wzhao18 · 2026-07-03T18:14:21Z

No description provided.

github-actions · 2026-07-03T18:14:31Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

感谢你的贡献！对于 vLLM 与 SGLang，请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致，请先创建一个 PR，之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准，使整个 ML 社区都能从你的辛勤工作中受益！谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动（flake），重新运行失败的任务即可解决。如果选择重新运行失败的任务，PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档：https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言，PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准，然后再请求核心维护者审阅。

如需更多帮助，PR 作者可通过 Slack 联系核心维护者。

claude · 2026-07-03T18:25:15Z

 # launch_b300-nv.sh); the EAGLE3 draft is downloaded to the writable models dir.
 minimaxm3-fp4-b300-vllm-mtp:
-  image: vllm/vllm-openai:vllm-minimax-m3-perf-x86_64-13.0.1-8b00f41
+  image: vllm/vllm-openai:nightly-93d8f834dd8acf33eb0e2a75b2711b628cb6e226


🔴 Missing perf-changelog.yaml entry — per AGENTS.md line 126, an image-tag bump combined with env-var/config-param changes requires an appended perf-changelog.yaml entry to trigger the full-sweep re-benchmark. Without it, the sweep won't re-select minimaxm3-fp4-b300-vllm-mtp on merge and the new image / spec-token map never actually get measured. Marking as this may just be pending the [WIP] → ready transition, but flagging so it isn't missed at merge time.

Extended reasoning...

What the bug is

AGENTS.md:126 ("Updating Docker images" section) explicitly states:

Update the image tag in the relevant configs/*-master.yaml and/or benchmarks/*.sh, update any related env vars / config params, and append a perf-changelog.yaml entry (required — triggers benchmarks).

This PR does all three of the qualifying trigger conditions but touches only 2 files (configs/nvidia-master.yaml and benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_b300_mtp.sh) — no perf-changelog.yaml entry is appended.

How it manifests / step-by-step proof

On merge, the full-sweep driver reads perf-changelog.yaml to decide which configs need re-benchmarking (per AGENTS.md:54).

The last perf-changelog.yaml entry covering this recipe key would be for minimaxm3-fp4-b300-vllm-mtp — nothing new is appended in this diff.

Sweep sees no new changelog entry ⇒ does not re-select this config ⇒ no benchmark is produced for the new image (nightly-93d8f834dd8acf33eb0e2a75b2711b628cb6e226), the new VLLM_FLASHINFER_ALLREDUCE_BACKEND=trtllm env var, the raised --gpu-memory-utilization 0.95, the restructured search space, or the per-op-point NUM_SPEC_TOKENS_MAP.

The whole purpose of the PR — measuring minimaxm3-fp4-b300-vllm-mtp with the new EAGLE3-tuned config — never runs.

Why existing code doesn't cover it

There is no wildcard changelog entry that catches this recipe key; the closest existing entry is for minimaxm3-fp4-b300-vllm (the non-mtp sibling from #1990) which does not match minimaxm3-fp4-b300-vllm-mtp.

Precedent from sibling PRs

Every recent sibling recipe-update PR that bumped an image and/or reshuffled the search space appended a perf-changelog.yaml entry: #1990 (non-mtp b300), #1978 (b200 variant), #1962, #1966, and #2001. This PR is the only outlier so far.

How to fix

Append a perf-changelog.yaml entry naming minimaxm3-fp4-b300-vllm-mtp with a one-liner describing the changes (new nightly image, TRT-LLM allreduce backend, gpu-mem-util 0.90 → 0.95, per-op-point spec-token map, refined search space). Once that lands the full-sweep will re-select this config on merge and the intended benchmarks will run.

claude · 2026-07-03T18:25:17Z

+# Key: ISL:TP:EP_SIZE:DP_ATTENTION:CONC  (exactly the env vars the launcher sets;
+# for dp-attn configs TP holds the data-parallel size, as in PARALLEL_ARGS below).


🟡 The new comment at line 72 says as in PARALLEL_ARGS below, but the PARALLEL_ARGS if/elif/else block that sets --data-parallel-size=$TP is at lines 57-63, immediately above this comment. Trivial s/below/above/ (or drop the directional word).

Extended reasoning...

What the bug is. In the newly-added block introducing NUM_SPEC_TOKENS_MAP, the comment ends with:

... for dp-attn configs TP holds the data-parallel size, as in PARALLEL_ARGS below).

The referent — the PARALLEL_ARGS if/elif/else block that actually demonstrates the TP-as-data-parallel-size relationship — is not below this comment; it is above it.

Proof by walkthrough. In benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_b300_mtp.sh after this PR:

Lines 57–63 contain the if [ "${DP_ATTENTION}" = "true" ]; then PARALLEL_ARGS="--tensor-parallel-size=1 --data-parallel-size=$TP --enable-expert-parallel" ... block. Line 58 in particular is what the comment is pointing at (it uses $TP as the data-parallel size when DP_ATTENTION=true).

Lines 65–71 are the new explanatory comment.

Line 72 is the offending line: ... as in PARALLEL_ARGS below).

Line 73 begins declare -A NUM_SPEC_TOKENS_MAP=(.

The only occurrence of the token PARALLEL_ARGS after the comment is the bare $PARALLEL_ARGS expansion in the vllm serve invocation (~line 105), which does not itself illustrate the TP-as-data-parallel-size definition.

So the definition being referenced sits ~9 lines above the comment, and the only PARALLEL_ARGS mention below is a bare variable expansion that carries no such demonstration. below is directionally wrong.

Impact. Zero runtime impact — this is a pure documentation error in a freshly-written comment. It only matters because a future reader following the reference (as in PARALLEL_ARGS below) will scan downward and hit $PARALLEL_ARGS in the vllm serve line, which does not explain why TP names the DP size — momentary confusion, then they scroll up. Worth flagging while the author is right there in this region, but not merge-blocking.

Suggested fix. Change below to above, or drop the directional word entirely (as in PARALLEL_ARGS). Speculative cause: the comment was likely drafted expecting the map declaration to sit above the PARALLEL_ARGS block, then the block ordering was swapped during authoring but the wording wasn't updated.

claude · 2026-07-03T18:25:19Z

 # launch_b300-nv.sh); the EAGLE3 draft is downloaded to the writable models dir.
 minimaxm3-fp4-b300-vllm-mtp:
-  image: vllm/vllm-openai:vllm-minimax-m3-perf-x86_64-13.0.1-8b00f41
+  image: vllm/vllm-openai:nightly-93d8f834dd8acf33eb0e2a75b2711b628cb6e226


🟡 The rationale comments above and inside this recipe are now stale after this PR: (a) the yaml header at lines 12625-12632 still claims the recipe uses "3 speculative tokens" and that NVFP4 support is "baked into the perf container image", and (b) the shell-script header at benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_b300_mtp.sh lines 3-7 repeats the "perf container image" phrasing. Both are stale after this diff swapped the perf-branded image for a generic nightly and replaced the constant NUM_SPEC_TOKENS=3 with a per-operating-point NUM_SPEC_TOKENS_MAP; update both to reference the new image (e.g. note PR #46380 is now merged upstream and included in commit 93d8f83) and the varying token count.

Extended reasoning...

This PR makes two changes whose rationale comments were left behind, and they show up in two adjacent locations.

1. The yaml header at configs/nvidia-master.yaml lines 12625-12632 (immediately above the modified minimaxm3-fp4-b300-vllm-mtp: key at 12633) says the recipe pairs the target with "the Inferact/MiniMax-M3-EAGLE3 draft head (3 speculative tokens)" and that MiniMax-M3 modelopt NVFP4 support is "baked into the perf container image". After this PR both claims are stale:

The image on line 12634 is bumped from vllm-minimax-m3-perf-x86_64-13.0.1-8b00f41 to nightly-93d8f834dd8acf33eb0e2a75b2711b628cb6e226. There is no longer a "perf container image" — this recipe is on a generic upstream nightly.

The recipe shell replaces the constant NUM_SPEC_TOKENS=3 with a NUM_SPEC_TOKENS_MAP keyed on ISL:TP:EP_SIZE:DP_ATTENTION:CONC that yields 1, 2, or 4 tokens for many operating points (3 is only the default fallback).

2. The shell-script header at benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_b300_mtp.sh lines 3-7 (outside the diff window but still present in the file after this PR) also says MiniMax-M3 modelopt NVFP4 support (vllm-project/vllm PR #46380) is baked into the perf container image, so no runtime patch is needed. This rationale was written to justify why the FP4 recipe skips the inline MSA-contiguity patch that its FP8 sibling uses; the whole justification hinges on "the perf container image" still existing here, which is no longer true.

Impact. Neither comment affects runtime behavior — the recipe still runs. But two failure modes for future readers are worth avoiding:

Someone hunts for "the perf container image" (e.g. to reproduce or rebase) and finds nothing, because it's been replaced with a generic upstream nightly.

Someone assumes the FP4 recipe still relies on a custom-baked NVFP4 patch (and therefore treats it as fragile / non-portable), when in fact PR #46380 is now available in the upstream nightly and no baking is required.

Fix. Update the yaml header to (a) drop or rephrase "baked into the perf container image" — e.g. "vllm PR #46380 is now merged upstream and included in nightly commit 93d8f83" — and (b) replace "(3 speculative tokens)" with a note that the count varies per operating point (1..4) via NUM_SPEC_TOKENS_MAP in the recipe shell. Apply the same image-rationale update to the .sh header at lines 3-7.

Step-by-step proof.

Read configs/nvidia-master.yaml lines 12625-12634. The comment block above the key still says "3 speculative tokens" and "baked into the perf container image".

Look at line 12634 in the diff — the image tag has changed from vllm-minimax-m3-perf-... to nightly-93d8f83.... No "perf" branding remains.

Read benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_b300_mtp.sh in this PR. Lines 3-7 (comment header) are unchanged and still say "baked into the perf container image".

Look at lines ~65-96 of the same file after the diff: NUM_SPEC_TOKENS=3 is gone; the value now comes from NUM_SPEC_TOKENS_MAP with 3 as a default fallback and many keys set to 1, 2, or 4.

Conclude both rationale comments describe a state that no longer exists after this PR.

Note on the duplicate objection. One refutation argued this collapses into a single .sh-header nit. Kept as a merged finding because the yaml header carries a distinct stale claim ("3 speculative tokens") that the .sh header does not, and it lives in a different physical comment block; combining them makes it likelier the author touches both while updating rationale rather than only the .sh.

github-actions · 2026-07-03T19:43:59Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28677170095
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28677170095

github-actions · 2026-07-03T20:05:32Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28677170095
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28677170095

github-actions · 2026-07-03T20:30:57Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28677170095
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28677170095

github-actions · 2026-07-03T22:30:55Z

update

a54d9ba

wzhao18 requested a review from a team July 3, 2026 18:14

wzhao18 requested a review from Ankur-singh as a code owner July 3, 2026 18:14

github-project-automation Bot added this to InferenceMAX Board Jul 3, 2026

wzhao18 requested review from jgangani and kedarpotdar-nv as code owners July 3, 2026 18:14

wzhao18 added the full-sweep-enabled label Jul 3, 2026

wzhao18 and others added 2 commits July 3, 2026 11:16

update

305ea25

Update perf-changelog.yaml

bab8412

claude Bot reviewed Jul 3, 2026

View reviewed changes

fixup

938a134

Update nvidia-master.yaml

47bd6ca

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Update Minimax M3 FP4 B300 Eagle#2006

[WIP] Update Minimax M3 FP4 B300 Eagle#2006
wzhao18 wants to merge 5 commits into
mainfrom
wzhao/m3-fp4-b300-eagle

wzhao18 commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

claude Bot Jul 3, 2026

Uh oh!

claude Bot Jul 3, 2026

Uh oh!

claude Bot Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		# Key: ISL:TP:EP_SIZE:DP_ATTENTION:CONC (exactly the env vars the launcher sets;
		# for dp-attn configs TP holds the data-parallel size, as in PARALLEL_ARGS below).

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

Conversation

wzhao18 commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

claude Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant