[AMD] DeepSeek-V4 FP4 MI355X vLLM MTP: bump image to latest nightly#1981
[AMD] DeepSeek-V4 FP4 MI355X vLLM MTP: bump image to latest nightly#1981Fangzhou-Ai wants to merge 5 commits into
Conversation
Update dsv4-fp4-mi355x-vllm-mtp from vllm/vllm-openai-rocm:v0.22.0 to the latest nightly (nightly-09663abde0f50944a8d5ea30120666024b503faa). Note two-stage attention kernels and AITER MLA in the changelog.
| # build, which already contains the MTP commit. | ||
| dsv4-fp4-mi355x-vllm-mtp: | ||
| image: vllm/vllm-openai-rocm:v0.22.0 | ||
| image: vllm/vllm-openai-rocm:nightly-09663abde0f50944a8d5ea30120666024b503faa |
There was a problem hiding this comment.
🟡 The block comment immediately above this entry (lines 1978-1980) still reads "Image reuses the base entry's v0.22.0 ROCm build, which already contains the MTP commit." With this bump, the MTP variant is now on a nightly while the base entry dsv4-fp4-mi355x-vllm stays on v0.22.0, so that rationale is stale. Consider replacing those two sentences with a note about the intentional divergence and the new rationale (two-stage attention kernels + AITER MLA) already documented in the PR description and perf-changelog.
Extended reasoning...
What's stale. The trailing sentences of the block comment at .github/configs/amd-master.yaml:1978-1980 claim:\n\n> Image reuses the base entry's v0.22.0 ROCm build, which already contains the MTP commit.\n\nThat rationale explained why the two entries could share an image tag. It no longer holds.\n\nStep-by-step proof of the divergence.\n\n1. Base entry dsv4-fp4-mi355x-vllm at line 1955 still pins image: vllm/vllm-openai-rocm:v0.22.0 (unchanged by this PR).\n2. This PR changes the MTP variant at line 1982 from vllm/vllm-openai-rocm:v0.22.0 to vllm/vllm-openai-rocm:nightly-09663abde0f50944a8d5ea30120666024b503faa.\n3. Therefore the two image strings now differ, and "reuses the base entry's v0.22.0 ROCm build" is factually wrong.\n\nWhy the existing wording will mislead. A future reader landing on this recipe will read the block comment, see "reuses the base entry's v0.22.0 ROCm build," and assume the two entries track the same image — for example when doing a future bump they might touch only one entry and expect the other to follow. The PR description already spells out the real reason for the bump (nightly enables two-stage attention kernels / split-KV decode and the AITER MLA backend for the DSv4 MLA path), and the perf-changelog entry restates it. That rationale belongs in the inline comment now that the images have diverged.\n\nImpact. Documentation-only — no functional change, sweep behavior is unaffected. Filing as nit since it's worth fixing while the change is fresh (the author has the context right now) but does not need to block merge.\n\nSuggested fix. Replace the trailing two sentences of the comment (roughly lines 1978-1980) with something like:\n\n> Previously reused the base entry's v0.22.0 image; bumped to a nightly to pick up two-stage attention kernels (split-KV decode) and the AITER MLA backend for the DSv4 MLA path. Base entry stays pinned to v0.22.0 intentionally.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28591721383 |

Summary
Companion to #1980 for the MTP variant. Bumps the DeepSeek-V4-Pro FP4 MI355X single-node vLLM MTP recipe (
dsv4-fp4-mi355x-vllm-mtp) image to the latestvllm/vllm-openai-rocmnightly.vllm/vllm-openai-rocm:v0.22.0vllm/vllm-openai-rocm:nightly-09663abde0f50944a8d5ea30120666024b503faa(latest nightly, 2026-07-02)The nightly enables two-stage attention kernels (split-KV decode) and employs the AITER MLA attention backend for the DeepSeek-V4 MLA path. The MTP search space (TP8, conc 4-512, 1k1k + 8k1k,
spec-decoding: mtp) is unchanged; the new nightly still contains the ROCm DeepSeek-V4 MTP commit (vllm-project/vllm#43385).AI assistance (Claude) was used to prepare this change.
Made with Cursor