[NV] perf: update MiniMax-M3 FP4 B300 vLLM#1990
Conversation
| - config-keys: | ||
| - minimaxm3-fp4-b300-vllm | ||
| description: | ||
| - "Update Minimax M3 b300 vllm image tag" | ||
| - "Update search space to cover more configs" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/tree/codex/minimax-m3-b300-fp4-vllm-update |
There was a problem hiding this comment.
🟡 The new perf-changelog.yaml entry has two issues: (1) pr-link uses a branch URL (tree/codex/minimax-m3-b300-fp4-vllm-update) instead of the canonical pull/1990 form used by every other entry in the file — this will 404 once the branch is deleted after merge; (2) the description says "Update search space to cover more configs", but the diff actually narrows the sweep (isl 1024: 7→4 entries, isl 8192: 6→4 entries, all TP8/EP8 and TP4/EP4 lanes dropped). Consider phrasing similar to the neighboring dsr1-fp4-b200-sglang entry (e.g. "Refocus/narrow the search space and drop TP8/EP8 and TP4/EP4 sweeps").
Extended reasoning...
Issue 1 — branch URL in pr-link:\n\nThe new entry at perf-changelog.yaml line 4435 sets:\n\nyaml\npr-link: https://github.com/SemiAnalysisAI/InferenceX/tree/codex/minimax-m3-b300-fp4-vllm-update\n\n\nOf the ~554 pr-link values in this file, this is the only one that points to a branch tree/... URL — all others use the canonical https://github.com/SemiAnalysisAI/InferenceX/pull/<num> form (see the four most recent entries at lines 4402–4428 for examples). Since branches are typically deleted after merge, this link will 404 shortly after this PR lands. The correct value is https://github.com/SemiAnalysisAI/InferenceX/pull/1990.\n\nIssue 2 — description contradicts the diff:\n\nThe entry claims:\n\nyaml\n- "Update search space to cover more configs"\n\n\nBut the diff clearly narrows the sweep. Step-by-step count from the diff on .github/configs/nvidia-master.yaml:\n\nisl 1024, osl 1024:\n- Before (7 entries): {tp:8, c1-64}, {tp:8/ep:8, c1-512}, {tp:4, c1-64}, {tp:4/ep:4, c64-512}, {tp:4/ep:4/dp-attn, c128-512}, {tp:2/ep:2, c16-128}, {tp:8/ep:8/dp-attn, c256-1024}\n- After (4 entries): {tp:8, c1-2}, {tp:4, c2-2}, {tp:2, c4-256}, {tp:2/ep:2/dp-attn, c512-1024}\n\nisl 8192, osl 1024:\n- Before (6 entries): {tp:8, c1-64}, {tp:8/ep:8, c1-512}, {tp:4, c1-128}, {tp:4/ep:4, c64-256}, {tp:4/ep:4/dp-attn, c64-128}, {tp:8/ep:8/dp-attn, c128-256}\n- After (4 entries): {tp:8, c1-2}, {tp:4, c2-2}, {tp:2, c4-256}, {tp:2/ep:2/dp-attn, c512-1024}\n\nEvery TP8/EP8 and TP4/EP4 lane is dropped, and the TP8/TP4 lanes are drastically clipped (tp:8 from c1-64 down to c1-2; tp:4 from c1-64 down to a single c2 point). While tp:2 and tp:2/ep:2/dp-attn are genuinely new, the aggregate coverage is significantly smaller (roughly 38 concurrency points → ~12 at isl 1024). Calling this "cover more configs" is the opposite of what the change actually does.\n\nImpact:\n\nBoth issues are purely documentation quality — no runtime effect, and no CI schema check enforces the URL form. But the perf-changelog is a user-facing history document that people read to understand why a config changed; a stale link plus a misleading description defeats that purpose.\n\nFix:\n\nyaml\n- config-keys:\n - minimaxm3-fp4-b300-vllm\n description:\n - "Update Minimax M3 b300 vllm image tag"\n - "Refocus/narrow the search space and drop TP8/EP8 and TP4/EP4 sweeps."\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1990\n\n\nThe neighboring dsr1-fp4-b200-sglang entry at lines 4422–4428 uses similar honest phrasing when reducing coverage ("Drop the TP8/EP8 single-concurrency points.").
| @@ -12809,7 +12809,7 @@ minimaxm3-fp8-b300-vllm: | |||
| # weights are pre-staged read-only at /scratch/models/MiniMax-M3-NVFP4 (added to | |||
| # the STAGED_MODELS allow-list in launch_b300-nv.sh). | |||
| minimaxm3-fp4-b300-vllm: | |||
| image: vllm/vllm-openai:vllm-minimax-m3-perf-x86_64-13.0.1-8b00f41 | |||
| image: vllm/vllm-openai:nightly-93d8f834dd8acf33eb0e2a75b2711b628cb6e226 | |||
There was a problem hiding this comment.
🟡 The block comment just above this line still says the NVFP4 support 'is baked into the perf container image', but the PR swaps the image from the bespoke vllm-minimax-m3-perf-* tag to a mainline nightly-* tag. Update the comment to say the support has landed in vLLM main and is picked up from nightly so future readers don't chase a nonexistent perf image. Note: the same wording appears at lines 12839-12840 for the EAGLE3 variant (whose image is NOT changed in this PR), so that comment remains accurate and should be left as-is.
Extended reasoning...
What the bug is
At .github/configs/nvidia-master.yaml lines 12805-12810, immediately above the minimaxm3-fp4-b300-vllm entry, there is a block comment that states:
MiniMax-M3 modelopt NVFP4 support (vllm-project/vllm PR #46380) is baked into the perf container image, so no runtime patch is needed.
That wording was true when the config used the bespoke perf-tagged image vllm/vllm-openai:vllm-minimax-m3-perf-x86_64-13.0.1-8b00f41. This PR replaces that tag with vllm/vllm-openai:nightly-93d8f834dd8acf33eb0e2a75b2711b628cb6e226 — a mainline nightly image, not a perf-variant image. After the swap, the phrase "perf container image" no longer describes what is actually being pulled.
Why the invariant still holds, but the comment is misleading
The underlying claim — "no runtime patch is needed" — is still true, because vllm-project/vllm PR #46380 has presumably landed in vLLM main and the nightly image picks it up. So there is no runtime effect and nothing breaks at benchmark time. The issue is purely that the comment cites the wrong reason: a future reader inspecting this entry will look for a "perf container image" that no longer exists in the config and be confused.
Step-by-step proof
- Before this PR, line 12812 reads
image: vllm/vllm-openai:vllm-minimax-m3-perf-x86_64-13.0.1-8b00f41. The tag literally containsminimax-m3-perf, matching "perf container image". - This PR changes line 12812 to
image: vllm/vllm-openai:nightly-93d8f834dd8acf33eb0e2a75b2711b628cb6e226. The tag isnightly-<sha>; there is no "perf" in the tag name. - Lines 12805-12810 (comment above the entry) still literally say
baked into the perf container image, unchanged by this PR. - Therefore the comment and the code disagree about which image class is in use.
Fix
Update the comment to reflect that MiniMax-M3 NVFP4 support has landed upstream and is now picked up from the mainline nightly image, e.g.:
MiniMax-M3 modelopt NVFP4 support (vllm-project/vllm PR #46380) has landed in vLLM main and is picked up from the nightly image below, so no runtime patch is needed.
Scope note
The same wording appears at lines 12839-12840 above the EAGLE3 variant (minimaxm3-fp4-eagle3-b300-vllm), which does not change its image in this PR. That comment remains accurate and should be left as-is; only the copy above minimaxm3-fp4-b300-vllm (lines 12805-12810) needs updating.
Severity
This is a documentation-only inconsistency with no runtime effect on the benchmark. Marking as nit.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28612763227 |
|
/reuse-sweep-run |
|
As a PR reviewer and CODEOWNER, I have reviewed this and have:
Additional detail section:
Signed: |

No description provided.