Skip to content

[codex] add MiniMax M3 FP4 MI355X vLLM benchmark#1935

Merged
functionstackx merged 2 commits into
mainfrom
codex/minimax-m3-fp4-mi355x-vllm
Jun 25, 2026
Merged

[codex] add MiniMax M3 FP4 MI355X vLLM benchmark#1935
functionstackx merged 2 commits into
mainfrom
codex/minimax-m3-fp4-mi355x-vllm

Conversation

@functionstackx

@functionstackx functionstackx commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

What changed

  • add minimaxm3-fp4-mi355x-vllm to the AMD master config using amd/MiniMax-M3-MXFP4
  • pin the latest immutable ROCm vLLM nightly, nightly-3f5a1e1733200760169ff31ebe60a271072b199e
  • add the MI355X vLLM launcher using the text-only language-model path, TRITON_ATTN, and MiniMax-M3 tool/reasoning parsers while leaving MoE backend selection to vLLM
  • mirror the existing MXFP8 MI355X TP/EP/DP-attention sweep for direct precision comparisons
  • append the required performance changelog trigger

Why

AMD's MiniMax M3 MXFP4 checkpoint now has upstream vLLM support through vllm-project/vllm#45794. This adds benchmark coverage for that path on MI355X while keeping the FP8 and FP4 sweep shapes comparable.

doesn't enable AITER as it is still WIP and as of June 25, 2026 12:42pm PT it is not accessible to upstream docker containers vllm-project/vllm#46419

Validation

  • bash -n benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x_vllm.sh
  • generated the filtered FP4 MI355X vLLM sweep successfully
  • python -m pytest utils/matrix_logic/ -v (180 passed)

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.


感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。

如需更多帮助,PR 作者可通过 Slack 联系核心维护者。

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.


感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。

如需更多帮助,PR 作者可通过 Slack 联系核心维护者。

@claude

claude Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Claude finished @functionstackx's task in 1m 29s —— View job


PR Review: MiniMax M3 FP4 MI355X vLLM benchmark

  • Read changed files (config, script, changelog)
  • Validate master config + perf-changelog rules
  • Validate benchmark script (EP, formatting, parsers)
  • Post review

LGTM — no blocking issues found.

I compared this against the already-merged minimaxm3-fp8-mi355x-vllm recipe it claims to mirror. Findings:

  • Master config + perf-changelog: Both updated together. New changelog entry is correctly appended to the end of perf-changelog.yaml and references PR [codex] add MiniMax M3 FP4 MI355X vLLM benchmark #1935. ✅
  • Image: vllm/vllm-openai-rocm:nightly-3f5a1e1… is a public Docker Hub reference (immutable nightly digest). ✅
  • Expert parallelism: --enable-expert-parallel is conditional on EP_SIZE > 1 (and the DP-attention branch), not hardcoded — matches the required pattern. ✅
  • Script formatting: vllm serve args are on separate lines. ✅
  • Search space: Exactly mirrors the FP8 MI355X TP/EP/DP-attention sweep at 1k/1k and 8k/1k, as stated — enables a clean precision comparison. ✅
  • Intentional FP4 deltas vs FP8 (all consistent and documented in the changelog): drops --kv-cache-dtype fp8 (retains default), drops VLLM_USE_BREAKABLE_CUDAGRAPH=0, and adds --trust-remote-code to vllm serve. All reasonable for the MXFP4 path. ✅
  • model-prefix: minimaxm3 matches the established convention for all other MiniMax-M3 entries in this config. ✅

No bugs, logic errors, or resource issues found. The script faithfully reuses the working FP8 recipe shape.

Comment on lines +32 to +34
SERVER_LOG=/workspace/server.log
export VLLM_ENGINE_READY_TIMEOUT_S=3600

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new minimaxm3_fp4_mi355x_vllm.sh is missing export VLLM_USE_BREAKABLE_CUDAGRAPH=0 after the VLLM_ENGINE_READY_TIMEOUT_S line, which every other MiniMax-M3 vLLM recipe in the repo sets (including the MXFP4 multi-node disagg entry at models_vllm.yaml:44 for the SAME amd/MiniMax-M3-MXFP4 model). Without it, the M3 decode path silently falls back to eager mode via the breakable-cudagraph fallback, invalidating the "direct precision comparison" with the MXFP8 baseline (which DOES run with CUDA graphs) that the PR description names as the motivation. Fix: add export VLLM_USE_BREAKABLE_CUDAGRAPH=0 at line 34, matching minimaxm3_fp8_mi355x.sh:33.

Extended reasoning...

The bug

The new benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x_vllm.sh script (lines 32-34) sets SERVER_LOG and VLLM_ENGINE_READY_TIMEOUT_S=3600, but does NOT export VLLM_USE_BREAKABLE_CUDAGRAPH=0. Every other MiniMax-M3 vLLM recipe in this repo sets this env var:

File Line
minimaxm3_fp8_mi300x.sh 35
minimaxm3_fp8_mi300x_mtp.sh 52
minimaxm3_fp8_mi325x.sh 33
minimaxm3_fp8_mi325x_mtp.sh 64
minimaxm3_fp8_mi355x.sh 33 (the direct sibling this PR claims to mirror)
minimaxm3_fp8_mi355x_mtp.sh 63
benchmarks/multi_node/amd_utils/models_vllm.yaml 44 (MiniMax-M3-MXFP4 disagg)

The inline comment in those scripts identifies it as a MiniMax-M3 model-specific (not precision-specific) workaround: "VLLM_USE_BREAKABLE_CUDAGRAPH=0 avoids the M3-decode breakable-cudagraph path that previously forced eager execution."

Why this is not specific to MXFP8

The disagg config at benchmarks/multi_node/amd_utils/models_vllm.yaml:44 uses the exact same model (amd/MiniMax-M3-MXFP4) and explicitly sets VLLM_USE_BREAKABLE_CUDAGRAPH=0 in its env string. So the requirement is tied to the MiniMax-M3 model + ROCm decode path, not to the weight quantization. PRs #1750/#1754/#1755/#1756 (recorded in perf-changelog.yaml) landed this fix "per AMD guidance" across every MiniMax-M3 single-node recipe at the time; this new MXFP4 single-node recipe breaks the established pattern without justification.

Concrete trigger walkthrough

  1. Sweep launcher runs bash minimaxm3_fp4_mi355x_vllm.sh with one of the TP/EP shapes from amd-master.yaml.
  2. Script exports only VLLM_ENGINE_READY_TIMEOUT_S=3600; VLLM_USE_BREAKABLE_CUDAGRAPH is unset (default: enabled).
  3. vllm serve is invoked without --enforce-eager, so vLLM normally captures CUDA graphs for decode.
  4. On MiniMax-M3, the decode path hits the "breakable cudagraph" fallback (the issue the env var was added to suppress, per the inline comments in all sister scripts).
  5. Decode silently runs eager-mode while the FP8 MI355X baseline runs with CUDA graphs enabled (since its script DOES export the var).
  6. The PR description explicitly states the motivation is to "mirror the existing MXFP8 MI355X TP/EP/DP-attention sweep for direct precision comparisons" — but the comparison is no longer apples-to-apples: FP4 measures eager-mode decode while FP8 measures graph-captured decode.

Impact

This silently invalidates the benchmark's stated purpose. The numbers will look worse than they should because eager-mode decode throughput is substantially below graph-captured decode on MoE models. Anyone comparing these results to the MXFP8 baseline will draw incorrect conclusions about MXFP4's quality/perf trade-off. This is a normal-severity bug because the measurement validity is the explicit goal of this PR.

Fix

One-line addition at benchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x_vllm.sh:34:

SERVER_LOG=/workspace/server.log
export VLLM_ENGINE_READY_TIMEOUT_S=3600
export VLLM_USE_BREAKABLE_CUDAGRAPH=0  # avoids the M3-decode breakable-cudagraph path that previously forced eager execution

This exactly mirrors the layout in minimaxm3_fp8_mi355x.sh:31-33, which the PR description claims to mirror.

@andyluo7

Copy link
Copy Markdown
Collaborator

I had a duplicate of this in #1936 (same config key minimaxm3-fp4-mi355x-vllm) — closing mine in favor of this one since the public upstream nightly image (Quark MXFP4 via vllm-project/vllm#45794) is the better, recipe-alignment-friendly choice. Two findings from validating the FP4 single-node path on 8x MI355X (gfx950) that are worth folding in here:

  1. VLLM_USE_BREAKABLE_CUDAGRAPH=0 is missing. The merged MI355X CUDA-graph work (Add minimax M3 MXFP8 MI355X vLLM EAGLE3 (related PR for upstreaming patch https://github.com/vllm-project/vllm/pull/45546) #1745/[AMD] perf: enable MiniMax M3 CUDA graphs on MI355X #1754) and the MXFP8 single-node script set this env on AMD specifically to keep CUDA graphs on for MiniMax-M3; without it the decode path can fall back to eager on gfx950 (perf loss). Recommend adding it to the script (or extra_env). If the sweep logs show eager decode, this is the cause.

  2. MoE backend is left to vLLM's default. I validated explicit --moe-backend aiter + VLLM_ROCM_USE_AITER=1 VLLM_ROCM_USE_AITER_MOE=1 (this is what [AMD] Add MiniMax-M3-MXFP4 MI355X vLLM disagg recipe #1914's FP4 disagg path uses too). Worth confirming the nightly defaults to AITER for MXFP4 MoE on gfx950 — otherwise throughput will differ from the disagg numbers.

Other things I confirmed that corroborate this PR's choices: block-size 128 + TRITON_ATTN MSA work; the minimax_m3 reasoning parser splits reasoning/content correctly; and keeping the default KV-cache dtype is correct--kv-cache-dtype fp8 does not crash on vLLM (unlike ATOM), it just falls back to an uncalibrated scale of 1.0 with an accuracy warning.

FYI the matching upstream recipe variant is up at vllm-project/recipes#579.

One open question for you: search space. This mirrors the MXFP8 grid (EP + conc→1024); my closed PR used pure TP4+TP8 capped at conc 256 per the original ask. Either is fine — flagging in case you want to trim.

@andyluo7

Copy link
Copy Markdown
Collaborator

Heads up — the pinned image is likely a blocker for MXFP4, separate from the flags above. Per the AMD MXFP4 enablement owners:

A plain ROCm nightly won't bring up M3 MXFP4: it needs aiter 0.1.16.post2 (vllm-project/vllm#46692, merged) and the MoE enablement vllm-project/vllm#46419, which isn't merged yet. Until #46419 lands, build from qli88's branch qiang_minimax_mxfp4_aiter on the post2 image; switch to a plain nightly once #46419 merges.

This PR pins vllm/vllm-openai-rocm:nightly-3f5a1e1733200760169ff31ebe60a271072b199e (a plain nightly), so the running sweep will probably fail or fall back on MXFP4 MoE. Two options:

  • swap to a qli88-qiang_minimax_mxfp4_aiter / post2 build, or
  • reuse rocm/vllm-dev:vllm-0.23.1-rocm723-mi35x-mori-0625 — I verified that image does serve amd/MiniMax-M3-MXFP4 (engine reports quantization=quark + moe_backend=aiter, loads minimax_m3_fp4_tuned_fmoe.csv, coherent output). It already carries the MXFP4 aiter-MoE path.

Required serve flags (from #46419, matches what I validated):

VLLM_ROCM_USE_AITER=1 VLLM_ROCM_USE_AITER_MOE=1 VLLM_USE_BREAKABLE_CUDAGRAPH=0 \
vllm serve <model> --block-size 128 -tp 4 \
  --attention-backend TRITON_ATTN \
  --tool-call-parser minimax_m3 --enable-auto-tool-choice \
  --reasoning-parser minimax_m3 --moe-backend aiter

Accuracy target to validate against: gsm8k 5-shot 0.940 flexible / 0.941 strict.

@functionstackx

functionstackx commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator Author

MoE backend is left to vLLM's default. I validated explicit --moe-backend aiter + VLLM_ROCM_USE_AITER=1 VLLM_ROCM_USE_AITER_MOE=1

hi @andyluo7

we do not enable AITER as it is still WIP and as of June 25, 2026 12:42pm PT it is not accessible to upstream docker containers vllm-project/vllm#46419

I am happy to update this PR once it is merged and accesisble into an https://hub.docker.com/r/vllm/ docker image

swap to a qli88-qiang_minimax_mxfp4_aiter / post2 build, or

while i am glad, that there is development build that it works on, it is not accessible to upstream https://hub.docker.com/r/vllm/ docker image . Feel free to create an update PR once it is accessible to upstream docker image

@ChuanLi1101 ChuanLi1101 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @functionstackx. +1 to the blocker @claude / @andyluo7 already flagged — VLLM_USE_BREAKABLE_CUDAGRAPH=0 is required (model-specific, set by every other M3 recipe incl. the MXFP4 disagg at models_vllm.yaml:44). Please add at line 34 and I'm good to approve. Two things to add on top of their reviews:

1. Make the "AITER off" status explicit (builds on @andyluo7's MoE-backend point). Leaving MoE to vLLM's default is the right interim call: the MXFP4 AITER path isn't in the upstream docker yet (vllm-project/vllm#46419, gated behind the aiter bump #46692). But that means these are non-AITER MXFP4 baseline numbers and won't match the AITER FP4 disagg results in #1914. Asks: (a) call this out in the changelog entry so it's not read as the optimized path; (b) plan an AITER-enabled follow-up once #46419/#46474/#46692 land. (If the nightly already carries AITER and you want it now, set it explicitly per Andy's note rather than relying on the default.)

2. The aiter 0.1.16.post2 mla_reduce_v1 regression does NOT affect this PR. That break only hits MLA models (DSR1/Kimi). M3 is not MLA (TRITON_ATTN here), so it never touches the MLA decode/reduce path — irrelevant to this benchmark now and after we flip AITER on. Flagging so nobody blocks this on the war-room regression.

Once VLLM_USE_BREAKABLE_CUDAGRAPH=0 is in, LGTM.

@functionstackx

Copy link
Copy Markdown
Collaborator Author

FYI the matching upstream recipe variant is up at vllm-project/recipes#579.

hi Andy, following ur recipe as of 1pm ET June 25, 2026 it doesn't work

@github-actions

Copy link
Copy Markdown
Contributor

@functionstackx

Copy link
Copy Markdown
Collaborator Author

Thanks @functionstackx. +1 to the blocker @claude / @andyluo7 already flagged — VLLM_USE_BREAKABLE_CUDAGRAPH=0 is required (model-specific, set by every other M3 recipe incl. the MXFP4 disagg at models_vllm.yaml:44). Please add at line 34 and I'm good to approve. Two things to add on top of their reviews:

1. Make the "AITER off" status explicit (builds on @andyluo7's MoE-backend point). Leaving MoE to vLLM's default is the right interim call: the MXFP4 AITER path isn't in the upstream docker yet (vllm-project/vllm#46419, gated behind the aiter bump #46692). But that means these are non-AITER MXFP4 baseline numbers and won't match the AITER FP4 disagg results in #1914. Asks: (a) call this out in the changelog entry so it's not read as the optimized path; (b) plan an AITER-enabled follow-up once #46419/#46474/#46692 land. (If the nightly already carries AITER and you want it now, set it explicitly per Andy's note rather than relying on the default.)

2. The aiter 0.1.16.post2 mla_reduce_v1 regression does NOT affect this PR. That break only hits MLA models (DSR1/Kimi). M3 is not MLA (TRITON_ATTN here), so it never touches the MLA decode/reduce path — irrelevant to this benchmark now and after we flip AITER on. Flagging so nobody blocks this on the war-room regression.

Thanks @ChuanLi1101 for seeing that AITER MoE is not accessible to any upstream https://hub.docker.com/r/vllm/ docker image thus not enabling it is an accurate reflection of the upstream performance as of jun 25, 1:30pm PT. Happy to accept follow PRs to track the performance once it is accessible to upstream vLLM docker

Once VLLM_USE_BREAKABLE_CUDAGRAPH=0 is in, LGTM.

thanks @ChuanLi1101 i have implement ur single suggestion of VLLM_USE_BREAKABLE_CUDAGRAPH=0

@Klaud-Cold

Klaud-Cold commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

Claude finished @functionstackx's task in 52s —— View job


Verified — VLLM_USE_BREAKABLE_CUDAGRAPH=0 is in ✅

  • Read trigger comment
  • Confirm VLLM_USE_BREAKABLE_CUDAGRAPH=0 is in the script
  • Check changelog entry
  • Post summary

Confirmed on the latest commit (87cf65e9):

1. Scriptbenchmarks/single_node/fixed_seq_len/minimaxm3_fp4_mi355x_vllm.sh:34 now exports the env var in the right spot, exactly mirroring minimaxm3_fp8_mi355x.sh:33:

SERVER_LOG=/workspace/server.log
export VLLM_ENGINE_READY_TIMEOUT_S=3600
export VLLM_USE_BREAKABLE_CUDAGRAPH=0

This resolves my earlier 🔴 blocker — the M3 decode path will now run with CUDA graphs instead of silently falling back to eager, so the FP4↔FP8 precision comparison stays apples-to-apples. ✅

2. Changelogperf-changelog.yaml already records the env var and the "let vLLM select the MoE backend / default KV-cache dtype" choices for minimaxm3-fp4-mi355x-vllm. ✅

On @ChuanLi1101's two follow-up points:

  • (2) MLA regression — agreed, irrelevant. M3 runs TRITON_ATTN (not MLA), so the aiter 0.1.16.post2 mla_reduce_v1 break can't touch this path. Not a blocker.
  • (1) "AITER-off" labeling — the changelog already states MoE backend is left to vLLM's default, which is the accurate reflection of upstream-docker behavior as of now (the MXFP4 AITER MoE path in [ROCm]Enable AITER MoE backend for MiniMax-M3-MXFP4 vllm-project/vllm#46419 isn't in any hub.docker.com/r/vllm/ image yet). If you want it to be unmistakable that these are non-AITER baseline numbers vs. the AITER FP4 disagg in [AMD] Add MiniMax-M3-MXFP4 MI355X vLLM disagg recipe #1914, a one-line note in the changelog entry would make it explicit — optional, not blocking. Happy to push that small wording tweak if you'd like, otherwise this is good to go.

The intentional FP4 deltas (default KV-cache dtype, --trust-remote-code) remain consistent and documented. No blocking issues remain.
• branch codex/minimax-m3-fp4-mi355x-vllm

@andyluo7

Copy link
Copy Markdown
Collaborator

@functionstackx thanks for re-testing — two things:

1. Your 1pm ET test predated my fix. I pushed recipes#579 92fe4c6 at ~4:40pm ET, after your run. The version you hit had no variant-level image pin (it fell back to the generic vllm/vllm-openai-rocm:minimax-m3) and still forced --moe-backend aiter → that's the crash. The updated variant now pins rocm/vllm-dev:vllm-0.23.1-rocm723-mi35x-mori-0625, which I verified serves amd/MiniMax-M3-MXFP4 on AITER (engine reports quantization=quark + moe_backend=aiter, gsm8k 5-shot 0.939 strict / 0.939 flex). If you re-pull #579 the AITER path should come up on that image.

2. On official hub.docker.com/r/vllm/ images, use the emulation backend — the recipe documents this. I fully agree AITER MXFP4 MoE isn't in any official upstream vllm/ docker yet (vllm-project/vllm#46419 unmerged), so the AITER path requires the dev image until it lands. But the recipe also has a --moe-backend emulation path: it's the non-AITER reference backend (no AITER kernels, pure vLLM) that AMD used to measure accuracy, and it runs on current upstream images — just slower. So "following the recipe" on an official image does work via emulation; only the perf path needs the dev image. If your run used --moe-backend aiter on an official nightly, that's the mismatch.

This keeps the two PRs consistent: your #1935 = non-AITER baseline on official docker (correct, already merged), and #579 documents both the emulation path (official images, today) and the AITER perf path (dev image now, official nightly once #46419 ships). Happy to do an AITER-enabled InferenceX follow-up once it's in an official image.

@functionstackx

Copy link
Copy Markdown
Collaborator Author

**2. On official hub.docker.com/r/vllm/ images, use the emulation backend

ur AI wants me to use MXFP4 emulation even tho emulation is slower & even then non-emulation MXFP4 works on upstream now and passes evals!? can u please upgrade ur ai to codex 5.5 or opus 4.8 xhigh plz

 https://github.com/SemiAnalysisAI/InferenceX/actions/runs/28195297568/job/83520505469

@functionstackx

Copy link
Copy Markdown
Collaborator Author

1. Your 1pm ET test predated my fix. I pushed recipes#579 92fe4c6 at ~4:40pm ET, after your run.

ur recipe was/is wrong, as addressed in vllm-project/recipes#579 (review) i am glad u were able to address my suggestions of where this bugs

@github-actions

Copy link
Copy Markdown
Contributor

@functionstackx

Copy link
Copy Markdown
Collaborator Author
  1. following the recipe as of june 25, 2026 3:56pm PT https://recipes.vllm.ai/MiniMaxAI/MiniMax-M3?hardware=mi355x&variant=mxfp4
  2. using the latest avaialble upstream docker repo container available currently vllm/vllm-openai-rocm:nightly-3f5a1e1733200760169ff31ebe60a271072b199e
  3. verified evals https://github.com/SemiAnalysisAI/InferenceX/actions/runs/28195297568/job/83520505469 matches https://huggingface.co/amd/MiniMax-M3-MXFP4
image

@functionstackx

Copy link
Copy Markdown
Collaborator Author

/reuse-sweep-run

@functionstackx functionstackx force-pushed the codex/minimax-m3-fp4-mi355x-vllm branch from 13d5a23 to 87cf65e Compare June 25, 2026 23:00
@functionstackx functionstackx merged commit 43bdd5c into main Jun 25, 2026
110 checks passed
@functionstackx functionstackx deleted the codex/minimax-m3-fp4-mi355x-vllm branch June 25, 2026 23:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

4 participants