[WIP] [do not merge] Add MiniMax-M3 FP4 B200 Dynamo-vLLM disagg config#1982
[WIP] [do not merge] Add MiniMax-M3 FP4 B200 Dynamo-vLLM disagg config#1982jasonlizhengjian wants to merge 3 commits into
Conversation
|
Claude finished @jasonlizhengjian's task in 3m 55s —— View job Review: MiniMax-M3 FP4 B200 Dynamo-vLLM disagg config
LGTM — no blocking issues found. This is a clean, well-structured port. Details verified:
Minor (non-blocking, FYI only): the |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28599128708 |
|
Claude finished @jasonlizhengjian's task in 1m 47s —— View job Re-review: MiniMax-M3 FP4 B200 Dynamo-vLLM disagg config
LGTM — no blocking issues found. Since the previous review, the only new change is commit
Minor (non-blocking, still open from prior review): the |
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28599246606 |
| model: | ||
| path: "nvidia/MiniMax-M3-NVFP4" | ||
| container: "vllm/vllm-openai:vllm-minimax-m3-perf-x86_64-13.0.1-8b00f41" | ||
| precision: "fp4" |
There was a problem hiding this comment.
🔴 The launcher's model-path alias key and the recipe's model.path don't match for this new b200-dgxc + minimaxm3-fp4 pairing: runners/launch_b200-dgxc.sh:74 exports SRT_SLURM_MODEL_PREFIX="minimax-m3-nvfp4" but the new recipe at line 4 uses path: "nvidia/MiniMax-M3-NVFP4", so srtctl'''s model_paths lookup misses. Every other analogous case in the tree (b300-nv minimaxm3-fp4, b200 minimaxm2.5-fp4/fp8, b200 dsv4-fp4) has the launcher prefix exactly equal to the recipe path. Fix by changing one side to match the other — e.g. set SRT_SLURM_MODEL_PREFIX="nvidia/MiniMax-M3-NVFP4" for the minimaxm3-fp4 branch on b200-dgxc (mirroring runners/launch_b300-nv.sh:52), or change the new recipe'''s model.path to "minimax-m3-nvfp4".
Extended reasoning...
The mismatch
runners/launch_b200-dgxc.sh:71-74 (pre-existing from PR #1932) sets up the minimaxm3-fp4 model resolution:
elif [[ $MODEL_PREFIX == "minimaxm3" && $PRECISION == "fp4" ]]; then
# NVFP4 checkpoint, pre-staged on the b200-dgxc scratch tree.
export MODEL_PATH="/scratch/fsw/models/MiniMax-M3-NVFP4"
export SRT_SLURM_MODEL_PREFIX="minimax-m3-nvfp4"The launcher then writes srtslurm.yaml (around line 157):
model_paths:
"minimax-m3-nvfp4": "/scratch/fsw/models/MiniMax-M3-NVFP4"But the new recipe at benchmarks/multi_node/srt-slurm-recipes/vllm/minimax-m3/b200-fp4/8k1k/2p1d-dep2-dep8-8k1k.yaml:4 uses:
model:
path: "nvidia/MiniMax-M3-NVFP4"srtctl looks up "nvidia/MiniMax-M3-NVFP4" in model_paths — the only registered alias is "minimax-m3-nvfp4", so the lookup misses.
Why this PR is the trigger
The pre-existing lines 71-74 were previously only exercised through the single-node code path (the else branch at the bottom of the launcher), which never touches SRT_SLURM_MODEL_PREFIX — it mounts $MODEL_PATH directly via --container-mounts and sets MODEL=$MODEL_PATH. So the mismatch was benign.
This PR adds the new elif at launch_b200-dgxc.sh:116-121 that first routes minimaxm3-fp4 through the srtctl / srt-slurm code path, which is the path that actually consumes SRT_SLURM_MODEL_PREFIX as an alias key in srtslurm.yaml. So the pre-existing but previously-latent mismatch becomes load-bearing exactly at this PR.
Cross-check with every other b200/b300 case
| Launcher case | SRT_SLURM_MODEL_PREFIX |
Recipe model.path |
Match? |
|---|---|---|---|
b300-nv minimaxm3-fp4 (launch_b300-nv.sh:52) |
nvidia/MiniMax-M3-NVFP4 |
nvidia/MiniMax-M3-NVFP4 |
✅ |
b300-nv minimaxm3-fp8 (launch_b300-nv.sh:55) |
MiniMaxAI/MiniMax-M3-MXFP8 |
MiniMaxAI/MiniMax-M3-MXFP8 |
✅ |
| b200-dgxc minimaxm2.5-fp4 | minimax-m2.5-nvfp4 |
minimax-m2.5-nvfp4 |
✅ |
| b200-dgxc minimaxm2.5-fp8 | minimax-m2.5-fp8 |
minimax-m2.5-fp8 |
✅ |
| b200-dgxc dsv4-fp4 | deepseek-v4-pro |
deepseek-v4-pro |
✅ |
| b200-dgxc minimaxm3-fp4 (this PR) | minimax-m3-nvfp4 |
nvidia/MiniMax-M3-NVFP4 |
❌ |
Every other pairing in the tree matches exactly; the new b200-dgxc minimaxm3-fp4 case is the sole outlier. The b300 minimaxm3-fp4 case in particular is instructive because the new b200 recipe is a direct port of the b300 4p2d-dep2-dep8-8k1k recipe (per the PR description), so it inherits nvidia/MiniMax-M3-NVFP4 — which matches on b300 but not on b200.
Step-by-step proof of the failure
- CI dispatches
minimaxm3-fp4-b200-dynamo-vllmon theb200-multinoderunner. launch_b200-dgxc.shruns withIS_MULTINODE=true,MODEL_PREFIX=minimaxm3,PRECISION=fp4,FRAMEWORK=dynamo-vllm.- Line 74 exports
SRT_SLURM_MODEL_PREFIX="minimax-m3-nvfp4"andMODEL_PATH="/scratch/fsw/models/MiniMax-M3-NVFP4". - The new elif at lines 116-121 fires, clones srt-slurm, and copies the recipe into
recipes/vllm/minimax-m3/b200-fp4/. - The
cat > srtslurm.yaml <<EOFblock writesmodel_paths: { "minimax-m3-nvfp4": "/scratch/fsw/models/MiniMax-M3-NVFP4" }. srtctl apply -f $CONFIG_FILEis invoked; srtctl parses the recipe and readsmodel.path: "nvidia/MiniMax-M3-NVFP4".- srtctl checks
model_pathsfor the key"nvidia/MiniMax-M3-NVFP4"— not present. - Outcome A: srtctl errors on unknown alias and the job fails immediately at
srtctl apply. Outcome B: srtctl treats the unmatched value as a HuggingFace hub identifier and attempts to downloadnvidia/MiniMax-M3-NVFP4from the hub on every job invocation, negating the pre-staging that the comment onlaunch_b200-dgxc.sh:72explicitly relies on.
Either outcome makes the full-sweep check fail — either the job errors out at model resolution, or the HF pull blows the container FS / times out the runner. The PR is labeled full-sweep-fail-fast-no-canary, so this will surface as a fail-fast failure.
Fix
One-liner in either direction. The b300 side of the tree is the reference implementation, so the least surprising change is to line 74 of runners/launch_b200-dgxc.sh:
elif [[ $MODEL_PREFIX == "minimaxm3" && $PRECISION == "fp4" ]]; then
# NVFP4 checkpoint, pre-staged on the b200-dgxc scratch tree.
export MODEL_PATH="/scratch/fsw/models/MiniMax-M3-NVFP4"
- export SRT_SLURM_MODEL_PREFIX="minimax-m3-nvfp4"
+ export SRT_SLURM_MODEL_PREFIX="nvidia/MiniMax-M3-NVFP4"This also matches runners/launch_b300-nv.sh:52 verbatim, keeping the two clusters consistent for the same model+precision.

Summary
b200-multinoderunnermax-cudagraph-capture-sizeandmax-num-batched-tokensfrom the prefill configurationConfiguration
Validation
generate_sweep_configs.pyslice for MiniMax-M3 FP4 Dynamo-vLLM onb200-multinodepython -m pytest utils/matrix_logic/ -v(163 passed)bash -n runners/launch_b200-dgxc-slurm.shgit diff --check