[AMD] Add MiniMax-M3-MXFP4 MI355X single-node vLLM recipe#1936
[AMD] Add MiniMax-M3-MXFP4 MI355X single-node vLLM recipe#1936andyluo7 wants to merge 2 commits into
Conversation
Single-node vLLM benchmark for amd/MiniMax-M3-MXFP4 on MI355X (gfx950), served via the AITER MoE backend. Mirrors the MXFP8 single-node recipe (block-size 128 for MSA, TRITON_ATTN, minimax_m3 parsers) with the FP4 adjustments validated on hardware: --moe-backend aiter + AITER env, and no fp8 KV cache (the checkpoint ships no calibrated KV scales). Search space: pure TP, conc 1-256. TP8 covers the low-concurrency / low-latency regime (conc 1-32); TP4 carries the full sweep (conc 1-256). Both 1k/1k and 8k/1k. Upstream recipe: https://recipes.vllm.ai/MiniMaxAI/MiniMax-M3 (mxfp4 variant).
There was a problem hiding this comment.
LGTM — straightforward FP4 sibling of the existing MXFP8 MI355X single-node recipe.
Extended reasoning...
Overview
This PR adds a new vLLM single-node benchmark recipe for amd/MiniMax-M3-MXFP4 on MI355X. The changes are confined to three files: a new config entry in amd-master.yaml, a new launcher script in benchmarks/single_node/fixed_seq_len/, and a corresponding entry in perf-changelog.yaml. The launcher is a near-clone of the sibling MXFP8 script (minimaxm3_fp8_mi355x.sh) with the three advertised FP4 deltas: --moe-backend aiter, VLLM_ROCM_USE_AITER{,_MOE}=1 env vars, and removal of --kv-cache-dtype fp8 (the checkpoint has no calibrated KV scales).
Security risks
None. This is benchmark plumbing — YAML config, a launcher shell script that invokes vllm serve with a fixed flag set, and a changelog entry. No auth, crypto, network exposure, or user-facing code paths.
Level of scrutiny
Low. Recipe additions like this follow a well-established template across many sibling files in this directory (the minimaxm3_fp*_*.sh family), and the diff matches that template byte-for-byte except for the deliberate FP4 differences. The author validated the configuration locally on 8x MI355X TP=4 with this exact image. CI sweep validation is still pending per the test plan checkbox, but that gates merge separately from this review.
Other factors
The image (rocm/vllm-dev:vllm-0.23.1-rocm723-mi35x-mori-0625) is the same one already in use by the FP4 disagg recipe (#1914), so no new image surface is being introduced. The search space (TP8 low-conc + TP4 full sweep) and concurrency ranges are consistent with neighboring FP4 recipes.
|
Closing in favor of #1935 (functionstackx), which uses the public upstream nightly image with merged Quark MXFP4 support (vllm-project/vllm#45794) — better for upstream recipe alignment. I've left my hardware-validation findings (notably the missing |
Single-node vLLM benchmark for
amd/MiniMax-M3-MXFP4on MI355X (gfx950), served via the AITER MoE backend. Complements the FP4 disagg recipe (#1914) and the FP4 ATOM recipe (minimaxm3-fp4-mi355x-atom) with a plain single-node vLLM path.Recipe
benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x.sh: launcher. Mirrors the MXFP8 single-node script (minimaxm3_fp8_mi355x.sh) — block-size 128 (MSA),TRITON_ATTN,--language-model-only,--no-enable-prefix-caching,minimax_m3tool/reasoning parsers — with the FP4 adjustments:--moe-backend aiter+VLLM_ROCM_USE_AITER{,_MOE}=1, and no--kv-cache-dtype fp8(this checkpoint ships no calibrated KV scales).amd-master.yaml:minimaxm3-fp4-mi355x-vllmconfig, imagerocm/vllm-dev:vllm-0.23.1-rocm723-mi35x-mori-0625(same proven MXFP4 image as [AMD] Add MiniMax-M3-MXFP4 MI355X vLLM disagg recipe #1914).Search space
Pure TP, 1k/1k and 8k/1k, conc 1–256:
Validation
Validated single-node on 8x MI355X (gfx950), TP=4, with this exact image (vLLM 0.23.1):
vllm serve amd/MiniMax-M3-MXFP4 …reachesApplication startup complete, engine reportsquantization=quark/moe_backend=aiter, and chat completions return coherent output with theminimax_m3reasoning parser splittingreasoningfromcontent. I also confirmed that--kv-cache-dtype fp8does not crash on vLLM (unlike ATOM) but falls back to an uncalibrated KV scale of 1.0 with an accuracy warning — hence the default KV dtype here.Upstream recipe alignment: vllm-project/recipes#579 adds the matching
mxfp4variant to the official MiniMax-M3 recipe.Test plan
generate_sweep_configs full-sweep --single-node … --framework vllmexpands to the expected 30-job matrix (TP8 1–32 + TP4 1–256 at 1k1k/8k1k)