[AMD] Add MiniMax-M3-MXFP4 MI355X single-node vLLM recipe by andyluo7 · Pull Request #1936 · SemiAnalysisAI/InferenceX

andyluo7 · 2026-06-25T19:28:04Z

Single-node vLLM benchmark for amd/MiniMax-M3-MXFP4 on MI355X (gfx950), served via the AITER MoE backend. Complements the FP4 disagg recipe (#1914) and the FP4 ATOM recipe (minimaxm3-fp4-mi355x-atom) with a plain single-node vLLM path.

Recipe

benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x.sh: launcher. Mirrors the MXFP8 single-node script (minimaxm3_fp8_mi355x.sh) — block-size 128 (MSA), TRITON_ATTN, --language-model-only, --no-enable-prefix-caching, minimax_m3 tool/reasoning parsers — with the FP4 adjustments: --moe-backend aiter + VLLM_ROCM_USE_AITER{,_MOE}=1, and no --kv-cache-dtype fp8 (this checkpoint ships no calibrated KV scales).
amd-master.yaml: minimaxm3-fp4-mi355x-vllm config, image rocm/vllm-dev:vllm-0.23.1-rocm723-mi35x-mori-0625 (same proven MXFP4 image as [AMD] Add MiniMax-M3-MXFP4 MI355X vLLM disagg recipe #1914).

Search space

Pure TP, 1k/1k and 8k/1k, conc 1–256:

TP8 — low-concurrency / low-latency regime, conc 1–32.
TP4 — full sweep, conc 1–256.

Validation

Validated single-node on 8x MI355X (gfx950), TP=4, with this exact image (vLLM 0.23.1): vllm serve amd/MiniMax-M3-MXFP4 … reaches Application startup complete, engine reports quantization=quark / moe_backend=aiter, and chat completions return coherent output with the minimax_m3 reasoning parser splitting reasoning from content. I also confirmed that --kv-cache-dtype fp8 does not crash on vLLM (unlike ATOM) but falls back to an uncalibrated KV scale of 1.0 with an accuracy warning — hence the default KV dtype here.

Upstream recipe alignment: vllm-project/recipes#579 adds the matching mxfp4 variant to the official MiniMax-M3 recipe.

Test plan

generate_sweep_configs full-sweep --single-node … --framework vllm expands to the expected 30-job matrix (TP8 1–32 + TP4 1–256 at 1k1k/8k1k)
CI single-node MI355X sweep passes

Single-node vLLM benchmark for amd/MiniMax-M3-MXFP4 on MI355X (gfx950), served via the AITER MoE backend. Mirrors the MXFP8 single-node recipe (block-size 128 for MSA, TRITON_ATTN, minimax_m3 parsers) with the FP4 adjustments validated on hardware: --moe-backend aiter + AITER env, and no fp8 KV cache (the checkpoint ships no calibrated KV scales). Search space: pure TP, conc 1-256. TP8 covers the low-concurrency / low-latency regime (conc 1-32); TP4 carries the full sweep (conc 1-256). Both 1k/1k and 8k/1k. Upstream recipe: https://recipes.vllm.ai/MiniMaxAI/MiniMax-M3 (mxfp4 variant).

functionstackx · 2026-06-25T19:31:19Z

@andyluo7 i have this PR open with upstream vllm #1935

claude

LGTM — straightforward FP4 sibling of the existing MXFP8 MI355X single-node recipe.

Extended reasoning...

Overview

This PR adds a new vLLM single-node benchmark recipe for amd/MiniMax-M3-MXFP4 on MI355X. The changes are confined to three files: a new config entry in amd-master.yaml, a new launcher script in benchmarks/single_node/fixed_seq_len/, and a corresponding entry in perf-changelog.yaml. The launcher is a near-clone of the sibling MXFP8 script (minimaxm3_fp8_mi355x.sh) with the three advertised FP4 deltas: --moe-backend aiter, VLLM_ROCM_USE_AITER{,_MOE}=1 env vars, and removal of --kv-cache-dtype fp8 (the checkpoint has no calibrated KV scales).

Security risks

None. This is benchmark plumbing — YAML config, a launcher shell script that invokes vllm serve with a fixed flag set, and a changelog entry. No auth, crypto, network exposure, or user-facing code paths.

Level of scrutiny

Low. Recipe additions like this follow a well-established template across many sibling files in this directory (the minimaxm3_fp*_*.sh family), and the diff matches that template byte-for-byte except for the deliberate FP4 differences. The author validated the configuration locally on 8x MI355X TP=4 with this exact image. CI sweep validation is still pending per the test plan checkbox, but that gates merge separately from this review.

Other factors

The image (rocm/vllm-dev:vllm-0.23.1-rocm723-mi35x-mori-0625) is the same one already in use by the FP4 disagg recipe (#1914), so no new image surface is being introduced. The search space (TP8 low-conc + TP4 full sweep) and concurrency ranges are consistent with neighboring FP4 recipes.

andyluo7 · 2026-06-25T20:00:01Z

Closing in favor of #1935 (functionstackx), which uses the public upstream nightly image with merged Quark MXFP4 support (vllm-project/vllm#45794) — better for upstream recipe alignment. I've left my hardware-validation findings (notably the missing VLLM_USE_BREAKABLE_CUDAGRAPH=0 and the explicit AITER MoE backend) as a comment on #1935 to fold in. Upstream recipe variant: vllm-project/recipes#579.

andyluo7 requested a review from a team June 25, 2026 19:28

andyluo7 requested review from 1am9trash, billishyahao, chunfangamd, seungrokj and yctseng0211 as code owners June 25, 2026 19:28

github-project-automation Bot added this to InferenceMAX Board Jun 25, 2026

perf-changelog: add minimaxm3-fp4-mi355x-vllm entry (#1936)

5f33404

claude Bot reviewed Jun 25, 2026

View reviewed changes

andyluo7 mentioned this pull request Jun 25, 2026

[codex] add MiniMax M3 FP4 MI355X vLLM benchmark #1935

Merged

andyluo7 closed this Jun 25, 2026

andyluo7 deleted the feat/minimaxm3-fp4-mi355x-vllm-singlenode branch June 25, 2026 20:00

github-project-automation Bot moved this to Done in InferenceMAX Board Jun 25, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMD] Add MiniMax-M3-MXFP4 MI355X single-node vLLM recipe#1936

[AMD] Add MiniMax-M3-MXFP4 MI355X single-node vLLM recipe#1936
andyluo7 wants to merge 2 commits into
mainfrom
feat/minimaxm3-fp4-mi355x-vllm-singlenode

andyluo7 commented Jun 25, 2026

Uh oh!

functionstackx commented Jun 25, 2026

Uh oh!

claude Bot left a comment

Uh oh!

andyluo7 commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

andyluo7 commented Jun 25, 2026

Recipe

Search space

Validation

Test plan

Uh oh!

functionstackx commented Jun 25, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Overview

Security risks

Level of scrutiny

Other factors

Uh oh!

andyluo7 commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants