Skip to content

Add MiniMax M2.5 MXFP4 benchmark for MI355x vLLM v0.17.1 (TP=2,4)#827

Open
functionstackx wants to merge 5 commits intomainfrom
claude/issue-826-20260301-0409
Open

Add MiniMax M2.5 MXFP4 benchmark for MI355x vLLM v0.17.1 (TP=2,4)#827
functionstackx wants to merge 5 commits intomainfrom
claude/issue-826-20260301-0409

Conversation

@functionstackx
Copy link
Contributor

@functionstackx functionstackx commented Mar 1, 2026

Add MiniMax M2.5 MXFP4 benchmark config for MI355x with vLLM v0.17.1, now that AMD's MXFP4 checkpoint is out: https://huggingface.co/amd/MiniMax-M2.5-MXFP4

  • Model: amd/MiniMax-M2.5-MXFP4
  • Image: vllm/vllm-openai-rocm:v0.17.1
  • TP=2 and TP=4 (matching MiniMax M2.5 FP8 pattern)
  • VLLM_ROCM_USE_AITER=1, with AITER MoE fallback for TP>=4
  • Seq lengths: 1k1k, 1k8k, 8k1k (conc 4-64)

Closes #826

Generated with Claude Code

@functionstackx functionstackx requested a review from a team March 1, 2026 04:14
@functionstackx functionstackx changed the title [AMD] Add MiniMax M2.1 MXFP4 benchmark for MI355x vLLM (TP=2,4) [WIP] Add MiniMax M2.1 MXFP4 benchmark for MI355x vLLM (TP=2,4) Mar 1, 2026
@functionstackx
Copy link
Contributor Author

@claude can u comment out TP=4 and only do TP=2 for now since TP=4 is bugged vllm-project/vllm#35637

@Klaud-Cold
Copy link
Collaborator

Klaud-Cold commented Mar 1, 2026

Claude finished @functionstackx's task in 53s —— View job


Commented out TP=4 for MiniMax M2.1 MXFP4

Due to vllm-project/vllm#35637, TP=4 has been disabled. Only TP=2 will run.

  • Read current benchmark files
  • Comment out TP=4 in .github/configs/amd-master.yaml (all 3 seq-len configs)
  • Update perf-changelog.yaml with PR link and description
  • Commit and push changes

Changes made:

@functionstackx
Copy link
Contributor Author

@chunfangamd i am seeing good mxfp4 performance when comparing to FP8. just waiting for AMD to release mxfp4 version of MiniMax M2.5 now. currently AMD only have mxfp4 version of MiniMax M2.1.

image

@functionstackx functionstackx marked this pull request as draft March 1, 2026 23:23
@chunfangamd chunfangamd marked this pull request as ready for review March 4, 2026 09:09
Copy link
Collaborator

@chunfangamd chunfangamd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@chunfangamd chunfangamd changed the title [WIP] Add MiniMax M2.1 MXFP4 benchmark for MI355x vLLM (TP=2,4) Add MiniMax M2.1 MXFP4 benchmark for MI355x vLLM (TP=2,4) Mar 4, 2026
@chunfangamd chunfangamd enabled auto-merge (squash) March 4, 2026 09:11
@functionstackx functionstackx changed the title Add MiniMax M2.1 MXFP4 benchmark for MI355x vLLM (TP=2,4) [Do Not Merge] [WIP till AMD releases MXFP4 of MiniMax M2.5] Add MiniMax M2.1 MXFP4 benchmark for MI355x vLLM (TP=2,4) Mar 4, 2026
@functionstackx functionstackx changed the title [Do Not Merge] [WIP till AMD releases MXFP4 of MiniMax M2.5] Add MiniMax M2.1 MXFP4 benchmark for MI355x vLLM (TP=2,4) Add MiniMax M2.5 MXFP4 benchmark for MI355x vLLM v0.17.1 (TP=2,4) Mar 20, 2026
github-actions bot and others added 4 commits March 19, 2026 21:46
Add MiniMax M2.1 MXFP4 benchmark config for MI355x with vLLM v0.16.0.
- Model: amd/MiniMax-M2.1-MXFP4
- TP=2 and TP=4 (matching MiniMax M2.5 FP8 pattern)
- Only VLLM_ROCM_USE_AITER=1 env var (per Andy Luo recipe)
- Seq lengths: 1k1k, 1k8k, 8k1k (conc 4-64)

Closes #826

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
TP=4 is bugged for this model per vllm-project/vllm#35637.
Comment out TP=4 search-space entries, keeping only TP=2.

Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>
Author: hongxiayang

- Keep AITER for attention but disable it specifically for MoE, so
the fused MoE falls back to triton kernels that can handle N=384,
when TP=4 and N=192 when TP=8.
- Install the amd-quark library to fix the crash when TP=4 with
VLLM_ROCM_USE_AITER_MOE=0.
@functionstackx functionstackx force-pushed the claude/issue-826-20260301-0409 branch 2 times, most recently from bd10495 to e849d65 Compare March 20, 2026 01:50
@functionstackx functionstackx force-pushed the claude/issue-826-20260301-0409 branch from e849d65 to 86cc700 Compare March 20, 2026 01:57
- Model: amd/MiniMax-M2.1-MXFP4 → amd/MiniMax-M2.5-MXFP4
- Image: vllm/vllm-openai-rocm v0.16.0 → v0.17.1
- Rename config key and script from m2.1 to m2.5
- Update perf-changelog entry

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@functionstackx functionstackx force-pushed the claude/issue-826-20260301-0409 branch from 7dd6063 to 5bc40e6 Compare March 20, 2026 02:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

mi355 fp4 minimax vllm single node

3 participants