Skip to content

[AMD/ROCM] atom minimaxm2.5 fp4 on mi355x#1240

Merged
cquil11 merged 10 commits intomainfrom
srok/atom_minimaxm2.5_fp4
Apr 30, 2026
Merged

[AMD/ROCM] atom minimaxm2.5 fp4 on mi355x#1240
cquil11 merged 10 commits intomainfrom
srok/atom_minimaxm2.5_fp4

Conversation

@cquil11
Copy link
Copy Markdown
Collaborator

@cquil11 cquil11 commented Apr 30, 2026

hi,

WIP.
internally tested. shipping soon.

cc. @ChangLiu0709 @andyluo7 @chunfangamd @ajith-sirra-amd

Regards,
Seungrok


Note: re-merging from the same branch — original PR (#1042) was reverted because the perf changelog entries it added were incorrect. This PR will only show a diff after the revert (#1239) is merged.

Remove trailing whitespace on the line after the minimaxm2.5
pr-link and on the dsv4-fp4-gb200-dynamo-vllm config-keys line.
The original PR's diff bled into the dsv4 entry instead of being
a clean standalone insertion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread perf-changelog.yaml Outdated
Comment on lines +1965 to +1972
- config-keys:
- minimaxm2.5-fp4-mi355x-atom
description:
- "Add MiniMax-M2.5 MXFP4 MI355X Atom benchmark (rocm/atom:rocm7.2.2_ubuntu24.04_py3.12_pytorch_release_2.10.0_atom0.1.2.post)"
- "Single-node sweep: TP1–TP8, 1k/1k and 8k/1k ISL/OSL"
pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1042

- config-keys:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The new minimaxm2.5-fp4-mi355x-atom perf-changelog entry has three problems: (1) CI-breaking — it's inserted mid-file (line 1965) instead of appended, so utils/process_changelog.py:get_added_lines() feeds malformed YAML (4-space indented sequence item followed by 2-space mapping) to yaml.safe_load(), raising ParserError at line 118 (this is the same root cause that triggered the original revert per the PR description). (2) pr-link references the reverted #1042 instead of this re-merge PR. (3) Trailing whitespace on lines 1971 and 1972. Fix: move the entry to the end of the file (after line 1994), update pr-link to the re-merge PR, and strip trailing whitespace.

Extended reasoning...

What the bug is\n\nThe diff hunk for perf-changelog.yaml adds a new entry for minimaxm2.5-fp4-mi355x-atom at line 1965, before the existing PR #1163 / #1218 dsv4-fp4-gb200-dynamo-vllm entries. AGENTS.md:161 states explicitly: "The file is read in chronological order: oldest at the top, newest at the bottom. New entries MUST be appended to the END of the file — never insert in the middle or prepend."\n\nThis is not just a style violation. utils/process_changelog.py:get_added_lines() runs git diff and collects only the + lines, then feeds the joined string directly to yaml.safe_load() at line 118. The .github/workflows/run-sweep.yml workflow invokes this script in CI, so a parse failure blocks the PR.\n\n## Why mid-file insertion breaks the parser\n\nThe unchanged context line at row 1965 — - config-keys: — was originally the header for the dsv4 entry (PR #1163). After the diff, git treats it as context attached to the new minimax entry, while a freshly added - config-keys: is inserted after the minimax entry to head the dsv4 block.\n\nThe + lines therefore start with:\n\n\n - minimaxm2.5-fp4-mi355x-atom # 4-space indent (continuation of unseen parent at col 2)\n description: # 2-space indent (block mapping)\n - "Add MiniMax-M2.5 MXFP4 ..."\n - "Single-node sweep: ..."\n pr-link: https://github.com/.../pull/1042\n \n- config-keys: \n\n\nYAML cannot parse this as a single document: the first line is a 4-space-indented sequence item with no parent, and the next line drops to a 2-space mapping key. yaml.safe_load() raises:\n\n\nyaml.parser.ParserError: expected '<document start>', but found '<block mapping start>'\n in "<unicode string>", line 2, column 3:\n description:\n ^\n\n\n## Step-by-step proof\n\n1. git diff main HEAD -- perf-changelog.yaml produces the hunk shown in the PR diff.\n2. get_added_lines() strips the leading + from each added line and joins them with \n.\n3. The resulting added_yaml begins with - minimaxm2.5-fp4-mi355x-atom\n description:\n....\n4. yaml.safe_load(added_yaml) raises ParserError at column 3 of description: (verified by multiple independent reproductions).\n5. process_changelog.py line 118 propagates the exception; the CI step exits non-zero.\n\nWhen the entry is instead appended after line 1994, the + lines begin on a fresh top-level - config-keys: (column 0), producing a self-contained valid YAML document and yaml.safe_load() succeeds.\n\n## pr-link points to the reverted PR\n\npr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1042 references the PR that was reverted by #1239 because its changelog entries were incorrect — as the description for the current re-merge PR itself acknowledges. The convention across perf-changelog.yaml is that each entry links to the PR that actually lands the entry in main; the adjacent dsv4 entries (#1163, #1218) follow this convention. Pointing at #1042 (a) breaks the convention, (b) is the only /pull/1042 reference in the file, and (c) re-introduces the very metadata defect that triggered the original revert. This should reference the actual landing PR.\n\nAddressing the duplicate-of-bug_003 refutation: the refuting verifier asserted the convention "plausibly" points to the originating PR. That hypothesis is contradicted by the file itself — every existing entry links to its merging PR, including entries (e.g. #1163, #1218 immediately below) that originated from the same reverted #1042 work but were correctly relinked when re-merged. The PR description for this re-merge also explicitly identifies incorrect changelog metadata as the reason #1042 was reverted, so re-introducing the same metadata defeats the stated purpose of the re-merge.\n\n## Trailing whitespace\n\ncat -A on the file shows line 1971 = $ (two trailing spaces on the blank line between entries) and line 1972 = - config-keys: $ (two trailing spaces after the colon). The rest of perf-changelog.yaml does not have such trailing whitespace, so the PR introduces it. Cosmetic, but trivially fixable in the same change.\n\n## How to fix\n\n1. Move the new minimaxm2.5-fp4-mi355x-atom block from lines 1965–1971 to the end of the file (after line 1994).\n2. Update pr-link to reference the actual landing PR rather than the reverted #1042.\n3. Strip the trailing whitespace on the (now-relocated) blank line and - config-keys: line.\n\nAfter the move, the diff naturally begins on a fresh top-level - config-keys: line, the YAML parses, and CI passes.

@cquil11 cquil11 merged commit eaea1d1 into main Apr 30, 2026
11 of 70 checks passed
@cquil11 cquil11 deleted the srok/atom_minimaxm2.5_fp4 branch April 30, 2026 20:33
@chunfangamd chunfangamd changed the title [WIP] [AMD/ROCM] atom minimaxm2.5 fp4 on mi355x [AMD/ROCM] atom minimaxm2.5 fp4 on mi355x May 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

2 participants