[codex] add MiniMax M3 FP4 MI355X vLLM MTP benchmark by functionstackx · Pull Request #1939 · SemiAnalysisAI/InferenceX

functionstackx · 2026-06-25T23:06:53Z

What changed

add minimaxm3-fp4-mi355x-vllm-mtp to the AMD master config
pair amd/MiniMax-M3-MXFP4 with Inferact/MiniMax-M3-EAGLE3 using three speculative tokens
add the MI355X vLLM MTP launcher with text-only serving, TRITON_ATTN, default KV-cache dtype, automatic MoE backend selection, and VLLM_USE_BREAKABLE_CUDAGRAPH=0
pass --use-chat-template for realistic EAGLE3 acceptance
mirror the existing MI355X MXFP8 MTP TP/EP/DP-attention sweep at 1k1k and 8k1k
append the required performance changelog trigger

Why

The MXFP4 target recipe is now on main. This adds the corresponding EAGLE3 speculative-decoding coverage on MI355X using the same pinned ROCm nightly and comparable MTP search space as MXFP8.

Validation

bash -n benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x_vllm_mtp.sh
generated 53 filtered MTP sweep entries successfully
python -m pytest utils/matrix_logic/ -q (180 passed)

github-actions · 2026-06-25T23:07:00Z

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

感谢你的贡献！对于 vLLM 与 SGLang，请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致，请先创建一个 PR，之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准，使整个 ML 社区都能从你的辛勤工作中受益！谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动（flake），重新运行失败的任务即可解决。如果选择重新运行失败的任务，PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档：https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言，PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准，然后再请求核心维护者审阅。

如需更多帮助，PR 作者可通过 Slack 联系核心维护者。

claude · 2026-06-25T23:09:02Z

Claude finished @functionstackx's task in 1m 39s —— View job

Review: PR #1939 — MiniMax-M3 FP4 MI355X vLLM MTP benchmark

Read changed files
Compare against existing MXFP8 MTP + base FP4 reference scripts
Validate config keys, perf-changelog, EP pattern, MTP requirements
Post review

LGTM — no blocking issues found.

The new script is a clean derivative of minimaxm3_fp4_mi355x_vllm.sh with the correct MTP additions (EAGLE3 --speculative-config, draft-model download, NUM_SPEC_TOKENS=3, --use-chat-template), and the spec-config/parallel patterns match the proven minimaxm3_fp8_mi355x_mtp.sh reference.

Validation against the repo's review checklist:

✅ MTP --use-chat-template present in the benchmark client (minimaxm3_fp4_mi355x_vllm_mtp.sh:90)
✅ Expert parallelism is conditional on EP_SIZE/DP_ATTENTION, not hardcoded (:44-53)
✅ Server launch args on separate lines (vllm serve ...)
✅ perf-changelog.yaml updated and the new entry is appended to the end of the file
✅ Image publicly accessible — vllm/vllm-openai-rocm:nightly-3f5a1e1… (Docker Hub), and identical to the on-main base minimaxm3-fp4-mi355x-vllm entry
✅ Config key naming maps cleanly to the script filename (minimaxm3-fp4-mi355x-vllm-mtp → minimaxm3_fp4_mi355x_vllm_mtp.sh); model-prefix: minimaxm3 is consistent with the existing base entry
✅ Unconditional hf download "$DRAFT_MODEL" is safe here since MODEL is always a bare HF id

Note: the FP4 nightly is claimed to already include AMD's SupportsEagle3 support, so the new script correctly omits the in-place model.py patch that the FP8 MI355X MTP variant carries. CI green confirms whether that assumption holds on the runner.

claude

LGTM — straightforward MiniMax-M3 MXFP4 MTP recipe addition mirroring the MXFP8 sister entry.

Extended reasoning...

Overview

This PR adds the EAGLE3 speculative-decoding (MTP) variant of the existing minimaxm3-fp4-mi355x-vllm recipe. Three changes: a new config entry in amd-master.yaml (mirrors the MXFP8 MTP search space, trimmed at extreme concurrency), a new launcher script minimaxm3_fp4_mi355x_vllm_mtp.sh (text-only serving, TRITON_ATTN, EAGLE3 with 3 spec tokens, --use-chat-template), and the standard perf-changelog entry.

Security risks

None. This is a benchmark config + launcher script — no auth, crypto, network-exposed code paths, or production logic. The launcher runs inside the benchmark container.

Level of scrutiny

Low. Pure additive benchmark recipe following an established pattern across the existing MTP family (minimaxm3_fp8_mi{300,325,355}x_mtp.sh). PR was validated with bash -n, matrix-logic pytest, and the sweep entry generation. No critical code paths touched.

Other factors

The only finding is a nit about the DRAFT_MODEL fetch being unconditional rather than guarded by the same if [[ "$MODEL" != /* ]] block as its three sister MTP scripts. The PR author has the inline comment to act on — it's a minor consistency cleanup that doesn't impact the current CI path (where MODEL is passed as a bare HF id). Shadow-approving since this is exactly the kind of mechanical recipe addition that follows established patterns.

claude · 2026-06-25T23:21:11Z

+if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi
+hf download "$DRAFT_MODEL"


🟡 The draft-model fetch on line 30 is unconditional, but the three sister MTP recipes (minimaxm3_fp8_mi300x_mtp.sh:40-44, minimaxm3_fp8_mi325x_mtp.sh:52-56, minimaxm3_fp8_mi355x_mtp.sh:49-53) all wrap both hf download "$MODEL" and hf download "$DRAFT_MODEL" inside the same if [[ "$MODEL" != /* ]]; then ... fi guard, with an explicit comment that local-path MODEL implies an offline-pre-staged cache. The new script breaks that invariant — a runner with MODEL set to a local path still hits HuggingFace to pull the EAGLE3 draft, which fails on offline-staged runners. Fix: move hf download "$DRAFT_MODEL" inside the existing if block to match the family pattern.

Extended reasoning...

What the bug is

The new minimaxm3_fp4_mi355x_vllm_mtp.sh recipe handles the target and draft model downloads inconsistently:

# Line 29-30 if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi hf download "$DRAFT_MODEL"

$MODEL is gated on the "not a local path" check, but $DRAFT_MODEL is fetched unconditionally — every invocation reaches out to HuggingFace for Inferact/MiniMax-M3-EAGLE3.

How sister scripts handle this

All three existing sister MTP recipes wrap both downloads under the same guard, with an explicit comment documenting the invariant. From minimaxm3_fp8_mi355x_mtp.sh:47-53:

# MODEL stays a bare HF id on the mi355x single-node runner (weights are # pre-staged in the mounted NFS HF cache, so this is a fast cache hit). The # EAGLE3 draft is not staged; fetch it into the same cache. if [[ "$MODEL" != /* ]]; then hf download "$MODEL" hf download "$DRAFT_MODEL" fi

minimaxm3_fp8_mi300x_mtp.sh:40-44 and minimaxm3_fp8_mi325x_mtp.sh:52-56 use the identical pattern with the same comment. The convention is clear: a local-path MODEL is the offline-pre-staged signal — when it's set, the runner has no business hitting HF for anything.

Why this matters

On a runner with MODEL=/some/local/path and no HF network/auth (the exact scenario the local-path mode is designed for), the unconditional hf download "$DRAFT_MODEL" on line 30 will fail. The other MTP recipes in the family correctly skip the draft fetch in that case under the assumption that the offline staging includes the draft.

Step-by-step proof

Operator pre-stages both amd/MiniMax-M3-MXFP4 and Inferact/MiniMax-M3-EAGLE3 to a local path /staged/models/... for an air-gapped runner.

They set MODEL=/staged/models/amd/MiniMax-M3-MXFP4 and run the recipe.

Line 29: "$MODEL" != /* is false (MODEL does start with /), so the target download is skipped — correct.

Line 30: hf download "$DRAFT_MODEL" runs unconditionally and attempts to reach huggingface.co for Inferact/MiniMax-M3-EAGLE3.

On an offline runner: command fails with a network error. On a runner without HF auth for that repo: command fails with 401.

Compare with running minimaxm3_fp8_mi355x_mtp.sh in the same setup — its if block is fully skipped because MODEL is a local path, no network call is made, and serving proceeds.

Practical impact

Limited in current usage — the AMD master config passes amd/MiniMax-M3-MXFP4 as a bare HF id, so the != /* branch is taken and both downloads run together. The divergence only manifests when someone runs in offline-staged mode, which isn't the current CI path. Still worth fixing for family consistency since the pattern (and its comment) is established across all sister MTP scripts.

Fix

Move hf download "$DRAFT_MODEL" inside the existing if block — matching the existing 3-script pattern:

if [[ "$MODEL" != /* ]]; then hf download "$MODEL" hf download "$DRAFT_MODEL" fi

github-actions · 2026-06-26T00:35:50Z

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28206198366
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28206198366

functionstackx · 2026-06-26T00:55:35Z

/reuse-sweep-run

# Conflicts: # perf-changelog.yaml

functionstackx · 2026-06-26T01:01:10Z

following https://recipes.vllm.ai/MiniMaxAI/MiniMax-M3?features=tool_calling%2Creasoning%2Cspec_decoding&hardware=mi355x&variant=mxfp4 as of 5:59pm PT

using the latest avaialble upstream docker repo container available currently vllm/vllm-openai-rocm:nightly-3f5a1e1733200760169ff31ebe60a271072b199e
verified evals https://github.com/SemiAnalysisAI/InferenceX/actions/runs/28206198366/job/83557430139 matches https://huggingface.co/amd/MiniMax-M3-MXFP4

github-project-automation Bot added this to InferenceMAX Board Jun 25, 2026

feat: add MiniMax M3 FP4 MI355X vLLM MTP benchmark

a9d0e82

functionstackx force-pushed the codex/minimax-m3-fp4-mi355x-vllm-mtp branch from aad5226 to a9d0e82 Compare June 25, 2026 23:07

functionstackx added the full-sweep-fail-fast label Jun 25, 2026

functionstackx marked this pull request as ready for review June 25, 2026 23:08

functionstackx requested a review from a team June 25, 2026 23:08

functionstackx requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners June 25, 2026 23:08

claude Bot reviewed Jun 25, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into pr-1939-reuse-41338

05ffa9a

# Conflicts: # perf-changelog.yaml

functionstackx added a commit that referenced this pull request Jun 26, 2026

chore: refresh PR #1939 for sweep reuse [skip-sweep]

590ba00

functionstackx force-pushed the codex/minimax-m3-fp4-mi355x-vllm-mtp branch from 590ba00 to 05ffa9a Compare June 26, 2026 00:58

functionstackx merged commit ee4d8aa into main Jun 26, 2026
36 of 46 checks passed

functionstackx deleted the codex/minimax-m3-fp4-mi355x-vllm-mtp branch June 26, 2026 00:59

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[codex] add MiniMax M3 FP4 MI355X vLLM MTP benchmark#1939

[codex] add MiniMax M3 FP4 MI355X vLLM MTP benchmark#1939
functionstackx merged 2 commits into
mainfrom
codex/minimax-m3-fp4-mi355x-vllm-mtp

functionstackx commented Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

claude Bot commented Jun 25, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot Jun 25, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

functionstackx commented Jun 26, 2026

Uh oh!

Uh oh!

functionstackx commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if [[ "$MODEL" != /* ]]; then hf download "$MODEL"; fi
		hf download "$DRAFT_MODEL"

Uh oh!

Conversation

functionstackx commented Jun 25, 2026

What changed

Why

Validation

Uh oh!

github-actions Bot commented Jun 25, 2026

Uh oh!

claude Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review: PR #1939 — MiniMax-M3 FP4 MI355X vLLM MTP benchmark

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

claude Bot Jun 25, 2026

Choose a reason for hiding this comment

What the bug is

How sister scripts handle this

Why this matters

Step-by-step proof

Practical impact

Fix

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

functionstackx commented Jun 26, 2026

Uh oh!

Uh oh!

functionstackx commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Jun 25, 2026 •

edited

Loading