Skip to content

[codex] add MiniMax M3 FP4 MI355X vLLM MTP benchmark#1939

Merged
functionstackx merged 2 commits into
mainfrom
codex/minimax-m3-fp4-mi355x-vllm-mtp
Jun 26, 2026
Merged

[codex] add MiniMax M3 FP4 MI355X vLLM MTP benchmark#1939
functionstackx merged 2 commits into
mainfrom
codex/minimax-m3-fp4-mi355x-vllm-mtp

Conversation

@functionstackx

Copy link
Copy Markdown
Collaborator

What changed

  • add minimaxm3-fp4-mi355x-vllm-mtp to the AMD master config
  • pair amd/MiniMax-M3-MXFP4 with Inferact/MiniMax-M3-EAGLE3 using three speculative tokens
  • add the MI355X vLLM MTP launcher with text-only serving, TRITON_ATTN, default KV-cache dtype, automatic MoE backend selection, and VLLM_USE_BREAKABLE_CUDAGRAPH=0
  • pass --use-chat-template for realistic EAGLE3 acceptance
  • mirror the existing MI355X MXFP8 MTP TP/EP/DP-attention sweep at 1k1k and 8k1k
  • append the required performance changelog trigger

Why

The MXFP4 target recipe is now on main. This adds the corresponding EAGLE3 speculative-decoding coverage on MI355X using the same pinned ROCm nightly and comparable MTP search space as MXFP8.

Validation

  • bash -n benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x_vllm_mtp.sh
  • generated 53 filtered MTP sweep entries successfully
  • python -m pytest utils/matrix_logic/ -q (180 passed)

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.


感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。

如需更多帮助,PR 作者可通过 Slack 联系核心维护者。

@functionstackx functionstackx force-pushed the codex/minimax-m3-fp4-mi355x-vllm-mtp branch from aad5226 to a9d0e82 Compare June 25, 2026 23:07
@functionstackx functionstackx marked this pull request as ready for review June 25, 2026 23:08
@functionstackx functionstackx requested a review from a team June 25, 2026 23:08
@claude

claude Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Claude finished @functionstackx's task in 1m 39s —— View job


Review: PR #1939 — MiniMax-M3 FP4 MI355X vLLM MTP benchmark

  • Read changed files
  • Compare against existing MXFP8 MTP + base FP4 reference scripts
  • Validate config keys, perf-changelog, EP pattern, MTP requirements
  • Post review

LGTM — no blocking issues found.

The new script is a clean derivative of minimaxm3_fp4_mi355x_vllm.sh with the correct MTP additions (EAGLE3 --speculative-config, draft-model download, NUM_SPEC_TOKENS=3, --use-chat-template), and the spec-config/parallel patterns match the proven minimaxm3_fp8_mi355x_mtp.sh reference.

Validation against the repo's review checklist:

  • MTP --use-chat-template present in the benchmark client (minimaxm3_fp4_mi355x_vllm_mtp.sh:90)
  • Expert parallelism is conditional on EP_SIZE/DP_ATTENTION, not hardcoded (:44-53)
  • Server launch args on separate lines (vllm serve ...)
  • perf-changelog.yaml updated and the new entry is appended to the end of the file
  • Image publicly accessiblevllm/vllm-openai-rocm:nightly-3f5a1e1… (Docker Hub), and identical to the on-main base minimaxm3-fp4-mi355x-vllm entry
  • Config key naming maps cleanly to the script filename (minimaxm3-fp4-mi355x-vllm-mtpminimaxm3_fp4_mi355x_vllm_mtp.sh); model-prefix: minimaxm3 is consistent with the existing base entry
  • ✅ Unconditional hf download "$DRAFT_MODEL" is safe here since MODEL is always a bare HF id

Note: the FP4 nightly is claimed to already include AMD's SupportsEagle3 support, so the new script correctly omits the in-place model.py patch that the FP8 MI355X MTP variant carries. CI green confirms whether that assumption holds on the runner.

@claude claude Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — straightforward MiniMax-M3 MXFP4 MTP recipe addition mirroring the MXFP8 sister entry.

Extended reasoning...

Overview

This PR adds the EAGLE3 speculative-decoding (MTP) variant of the existing minimaxm3-fp4-mi355x-vllm recipe. Three changes: a new config entry in amd-master.yaml (mirrors the MXFP8 MTP search space, trimmed at extreme concurrency), a new launcher script minimaxm3_fp4_mi355x_vllm_mtp.sh (text-only serving, TRITON_ATTN, EAGLE3 with 3 spec tokens, --use-chat-template), and the standard perf-changelog entry.

Security risks

None. This is a benchmark config + launcher script — no auth, crypto, network-exposed code paths, or production logic. The launcher runs inside the benchmark container.

Level of scrutiny

Low. Pure additive benchmark recipe following an established pattern across the existing MTP family (minimaxm3_fp8_mi{300,325,355}x_mtp.sh). PR was validated with bash -n, matrix-logic pytest, and the sweep entry generation. No critical code paths touched.

Other factors

The only finding is a nit about the DRAFT_MODEL fetch being unconditional rather than guarded by the same if [[ "$MODEL" != /* ]] block as its three sister MTP scripts. The PR author has the inline comment to act on — it's a minor consistency cleanup that doesn't impact the current CI path (where MODEL is passed as a bare HF id). Shadow-approving since this is exactly the kind of mechanical recipe addition that follows established patterns.

Comment on lines +29 to +30
if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi
hf download "$DRAFT_MODEL"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 The draft-model fetch on line 30 is unconditional, but the three sister MTP recipes (minimaxm3_fp8_mi300x_mtp.sh:40-44, minimaxm3_fp8_mi325x_mtp.sh:52-56, minimaxm3_fp8_mi355x_mtp.sh:49-53) all wrap both hf download "$MODEL" and hf download "$DRAFT_MODEL" inside the same if [[ "$MODEL" != /* ]]; then ... fi guard, with an explicit comment that local-path MODEL implies an offline-pre-staged cache. The new script breaks that invariant — a runner with MODEL set to a local path still hits HuggingFace to pull the EAGLE3 draft, which fails on offline-staged runners. Fix: move hf download "$DRAFT_MODEL" inside the existing if block to match the family pattern.

Extended reasoning...

What the bug is

The new minimaxm3_fp4_mi355x_vllm_mtp.sh recipe handles the target and draft model downloads inconsistently:

# Line 29-30
if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi
hf download "$DRAFT_MODEL"

$MODEL is gated on the "not a local path" check, but $DRAFT_MODEL is fetched unconditionally — every invocation reaches out to HuggingFace for Inferact/MiniMax-M3-EAGLE3.

How sister scripts handle this

All three existing sister MTP recipes wrap both downloads under the same guard, with an explicit comment documenting the invariant. From minimaxm3_fp8_mi355x_mtp.sh:47-53:

# MODEL stays a bare HF id on the mi355x single-node runner (weights are
# pre-staged in the mounted NFS HF cache, so this is a fast cache hit). The
# EAGLE3 draft is not staged; fetch it into the same cache.
if [[ "$MODEL" != /* ]]; then
  hf download "$MODEL"
  hf download "$DRAFT_MODEL"
fi

minimaxm3_fp8_mi300x_mtp.sh:40-44 and minimaxm3_fp8_mi325x_mtp.sh:52-56 use the identical pattern with the same comment. The convention is clear: a local-path MODEL is the offline-pre-staged signal — when it's set, the runner has no business hitting HF for anything.

Why this matters

On a runner with MODEL=/some/local/path and no HF network/auth (the exact scenario the local-path mode is designed for), the unconditional hf download "$DRAFT_MODEL" on line 30 will fail. The other MTP recipes in the family correctly skip the draft fetch in that case under the assumption that the offline staging includes the draft.

Step-by-step proof

  1. Operator pre-stages both amd/MiniMax-M3-MXFP4 and Inferact/MiniMax-M3-EAGLE3 to a local path /staged/models/... for an air-gapped runner.
  2. They set MODEL=/staged/models/amd/MiniMax-M3-MXFP4 and run the recipe.
  3. Line 29: "$MODEL" != /* is false (MODEL does start with /), so the target download is skipped — correct.
  4. Line 30: hf download "$DRAFT_MODEL" runs unconditionally and attempts to reach huggingface.co for Inferact/MiniMax-M3-EAGLE3.
  5. On an offline runner: command fails with a network error. On a runner without HF auth for that repo: command fails with 401.
  6. Compare with running minimaxm3_fp8_mi355x_mtp.sh in the same setup — its if block is fully skipped because MODEL is a local path, no network call is made, and serving proceeds.

Practical impact

Limited in current usage — the AMD master config passes amd/MiniMax-M3-MXFP4 as a bare HF id, so the != /* branch is taken and both downloads run together. The divergence only manifests when someone runs in offline-staged mode, which isn't the current CI path. Still worth fixing for family consistency since the pattern (and its comment) is established across all sister MTP scripts.

Fix

Move hf download "$DRAFT_MODEL" inside the existing if block — matching the existing 3-script pattern:

if [[ "$MODEL" != /* ]]; then
  hf download "$MODEL"
  hf download "$DRAFT_MODEL"
fi

@github-actions

Copy link
Copy Markdown
Contributor

@functionstackx

Copy link
Copy Markdown
Collaborator Author

/reuse-sweep-run

@functionstackx functionstackx force-pushed the codex/minimax-m3-fp4-mi355x-vllm-mtp branch from 590ba00 to 05ffa9a Compare June 26, 2026 00:58
@functionstackx functionstackx merged commit ee4d8aa into main Jun 26, 2026
36 of 46 checks passed
@functionstackx functionstackx deleted the codex/minimax-m3-fp4-mi355x-vllm-mtp branch June 26, 2026 00:59
@functionstackx

Copy link
Copy Markdown
Collaborator Author

following https://recipes.vllm.ai/MiniMaxAI/MiniMax-M3?features=tool_calling%2Creasoning%2Cspec_decoding&hardware=mi355x&variant=mxfp4 as of 5:59pm PT

using the latest avaialble upstream docker repo container available currently vllm/vllm-openai-rocm:nightly-3f5a1e1733200760169ff31ebe60a271072b199e
verified evals https://github.com/SemiAnalysisAI/InferenceX/actions/runs/28206198366/job/83557430139 matches https://huggingface.co/amd/MiniMax-M3-MXFP4

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

1 participant