[AMD] DeepSeek-V4 FP4 MI355X vLLM MTP: bump image to latest nightly by Fangzhou-Ai · Pull Request #1981 · SemiAnalysisAI/InferenceX · GitHub
Skip to content

[AMD] DeepSeek-V4 FP4 MI355X vLLM MTP: bump image to latest nightly#1981

Open
Fangzhou-Ai wants to merge 5 commits into
mainfrom
amd/dsv4-fp4-mi355x-vllm-mtp-image-bump
Open

[AMD] DeepSeek-V4 FP4 MI355X vLLM MTP: bump image to latest nightly#1981
Fangzhou-Ai wants to merge 5 commits into
mainfrom
amd/dsv4-fp4-mi355x-vllm-mtp-image-bump

Conversation

@Fangzhou-Ai

Copy link
Copy Markdown
Collaborator

Summary

Companion to #1980 for the MTP variant. Bumps the DeepSeek-V4-Pro FP4 MI355X single-node vLLM MTP recipe (dsv4-fp4-mi355x-vllm-mtp) image to the latest vllm/vllm-openai-rocm nightly.

  • From: vllm/vllm-openai-rocm:v0.22.0
  • To: vllm/vllm-openai-rocm:nightly-09663abde0f50944a8d5ea30120666024b503faa (latest nightly, 2026-07-02)

The nightly enables two-stage attention kernels (split-KV decode) and employs the AITER MLA attention backend for the DeepSeek-V4 MLA path. The MTP search space (TP8, conc 4-512, 1k1k + 8k1k, spec-decoding: mtp) is unchanged; the new nightly still contains the ROCm DeepSeek-V4 MTP commit (vllm-project/vllm#43385).

AI assistance (Claude) was used to prepare this change.

Made with Cursor

Update dsv4-fp4-mi355x-vllm-mtp from vllm/vllm-openai-rocm:v0.22.0 to the latest
nightly (nightly-09663abde0f50944a8d5ea30120666024b503faa). Note two-stage
attention kernels and AITER MLA in the changelog.
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

# build, which already contains the MTP commit.
dsv4-fp4-mi355x-vllm-mtp:
image: vllm/vllm-openai-rocm:v0.22.0
image: vllm/vllm-openai-rocm:nightly-09663abde0f50944a8d5ea30120666024b503faa

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The block comment immediately above this entry (lines 1978-1980) still reads "Image reuses the base entry's v0.22.0 ROCm build, which already contains the MTP commit." With this bump, the MTP variant is now on a nightly while the base entry dsv4-fp4-mi355x-vllm stays on v0.22.0, so that rationale is stale. Consider replacing those two sentences with a note about the intentional divergence and the new rationale (two-stage attention kernels + AITER MLA) already documented in the PR description and perf-changelog.

Extended reasoning...

What's stale. The trailing sentences of the block comment at .github/configs/amd-master.yaml:1978-1980 claim:\n\n> Image reuses the base entry's v0.22.0 ROCm build, which already contains the MTP commit.\n\nThat rationale explained why the two entries could share an image tag. It no longer holds.\n\nStep-by-step proof of the divergence.\n\n1. Base entry dsv4-fp4-mi355x-vllm at line 1955 still pins image: vllm/vllm-openai-rocm:v0.22.0 (unchanged by this PR).\n2. This PR changes the MTP variant at line 1982 from vllm/vllm-openai-rocm:v0.22.0 to vllm/vllm-openai-rocm:nightly-09663abde0f50944a8d5ea30120666024b503faa.\n3. Therefore the two image strings now differ, and "reuses the base entry's v0.22.0 ROCm build" is factually wrong.\n\nWhy the existing wording will mislead. A future reader landing on this recipe will read the block comment, see "reuses the base entry's v0.22.0 ROCm build," and assume the two entries track the same image — for example when doing a future bump they might touch only one entry and expect the other to follow. The PR description already spells out the real reason for the bump (nightly enables two-stage attention kernels / split-KV decode and the AITER MLA backend for the DSv4 MLA path), and the perf-changelog entry restates it. That rationale belongs in the inline comment now that the images have diverged.\n\nImpact. Documentation-only — no functional change, sweep behavior is unaffected. Filing as nit since it's worth fixing while the change is fresh (the author has the context right now) but does not need to block merge.\n\nSuggested fix. Replace the trailing two sentences of the comment (roughly lines 1978-1980) with something like:\n\n> Previously reused the base entry's v0.22.0 image; bumped to a nightly to pick up two-stage attention kernels (split-KV decode) and the AITER MLA backend for the DSv4 MLA path. Base entry stays pinned to v0.22.0 intentionally.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant