[TRITON] Add Attention support to the bench_models benchmarking script by lucas-santos-amd · Pull Request #2274 · ROCm/aiter

lucas-santos-amd · 2026-03-13T18:24:40Z

Motivation

Add standalone Unified Attention Benchmark.
Add support for MHA, MLA and Unified Attention to the bench_models benchmarking script.

Technical Details

New help text of bench_models.py:

usage: bench_models.py [-h] [--batch_size BATCH_SIZE [BATCH_SIZE ...]] [--seq_len SEQ_LEN [SEQ_LEN ...]] [--TP {1,2,4,8}]
                       [--metric {throughput,bandwidth,time}] [--model MODEL] [--gemm_layout {TN,TT,NN,NT}] [--mha_layout {bshd,thd}]
                       [--output_file OUTPUT_FILE]
 
Model benchmarking tool
 
options:
  -h, --help            show this help message and exit
  --batch_size BATCH_SIZE [BATCH_SIZE ...]
                        Batch size(s) to sweep. Accepts:
                          Single value:            --batch_size 1
                          Multiple values:         --batch_size 1 2 4
                          Range start:stop:step:   --batch_size 1:8:2
                          Combinations of values and ranges are also accepted.
                        Default: 1.
  --seq_len SEQ_LEN [SEQ_LEN ...]
                        Sequence length(s) to sweep. Accepts:
                          Single value:            --seq_len 512
                          Multiple values:         --seq_len 256 512 1024
                          Range start:stop:step:   --seq_len 128:1024:128
                          Combinations of values and ranges are also accepted.
                        For non-attention kernels, M = batch_size x seq_len is passed as M.
                        Default: 4096.
  --TP {1,2,4,8}        Tensor parallel size. Default: 8.
  --metric {throughput,bandwidth,time}
                        Metric to report (throughput=TFLOPS, bandwidth=GB/s, time=ms). Default: throughput. RoPE reports total flops (total floating-point operations) in a separate column (see note in output).MLA benchmark only reports time (ms).
  --model MODEL         model name filter: case-insensitive regex matched against model name (default: all models). e.g. 'llama3' to include only Llama3 family, 'llama|qwen' to include both Llama and Qwen families, '^(?!.*deepseek)' to exclude DeepSeek family.
                        Available models: Llama3 405B, Llama3 70B, Llama3 8B, GPT-OSS 120B, DeepSeek-R1, Llama4 Maverick, Qwen3-235B-A22B.
  --gemm_layout {TN,TT,NN,NT}
                        GEMM layout. Default: TN.
  --mha_layout {bshd,thd}
                        Multi-head attention (MHA) layout (bshd or thd). Default: thd.
  --output_file OUTPUT_FILE
                        Name for the CSV output file. Default: bench_results.

Description of output CSV file:

CSV column	Description	Type	Relevant kernels
Model	LLM model name	String	All
Kernel	Kernel name	String	All
batch_size	Input batch size	Integer	MHA, MLA
seq_len	Input sequence length	Integer	RoPE, MHA, MLA
B	Batched GEMM batch size	Integer	Batched GEMM
M	M dimension	Integer	GEMM, batched GEMM, RMSNorm, MoE
N	N dimension	Integer	GEMM, batched GEMM, RMSNorm
K	K dimension	Integer	GEMM, batched GEMM
gemm_layout	GEMM layout (TN, TT, NN, NT)	String	GEMM, batched GEMM
hq	Number of Q heads	Integer	RoPE, MHA, MLA
hkv	Number of KV heads	Integer	RoPE, MHA, MLA
dqk	Head size of Q and K	Integer	MHA, MLA
dv	Head size of V	Integer	MHA, MLA
rotary_dim	RoPE rotary dimension	Integer	RoPE
rotate_style	RoPE rotate style	String	RoPE
mha_layout	Multi-head Attention layout (thd, bshd)	String	MHA
experts	Number of experts	Integer	MoE
moe_dim1	First MoE dimension (i.e. hidden_size)	Integer	MoE
moe_dim2	Second MoE dimension (i.e. moe_intermediate_size*2)	Integer	MoE
topk	Number of experts per token	Integer	MoE
Performance metric	Can be time in ms, throughput in tflops or bandwidth in gpbs	Decimal	GEMM, batched GEMM, RMSNorm, MoE, MHA
time(ms)	*Time in ms	Decimal	MLA
total_flops(tflops)	**Total floating-point operations	Decimal	RoPE

*MLA benchmark only reports Time(ms)
**RoPE reports only total floating-point operations, not throughput (TFLOPS).

Test Plan

Test Result

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

github-actions · 2026-03-13T18:24:54Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2274 --add-label <label>

lucas-santos-amd · 2026-03-13T19:50:47Z

@cagrikymk Please take a look at the unified attention benchmark when you can. I converted it to pure python and added BW/TFLOPS calculations for the reduction kernel.

op_tests/op_benchmarks/triton/model_benchmarking_tool/bench_models.py

op_tests/op_benchmarks/triton/model_benchmarking_tool/model_shapes.json

op_tests/op_benchmarks/triton/model_benchmarking_tool/bench_models.py

op_tests/op_benchmarks/triton/bench_unified_attention.py

brunomazzottiamd

@lucas-santos-amd, I have some suggestions, but no blockers. It would be nice to have a thumbs up from Cagri regarding op_tests/op_benchmarks/triton/bench_unified_attention.py.

cagrikymk · 2026-03-16T14:35:41Z

There is a related PR: #2190 @juuso-oskari

It also contains some benchmarking related updates.

brunomazzottiamd · 2026-03-16T15:04:18Z

There is a related PR: #2190 @juuso-oskari

It also contains some benchmarking related updates.

What's the best course of action in your opinion? Wait for Juuso's PR to get merged and then integrate his benchmark into the uber-benchmark?

Chi-Chu319 · 2026-03-16T15:23:20Z

There is a related PR: #2190 @juuso-oskari
It also contains some benchmarking related updates.

What's the best course of action in your opinion? Wait for Juuso's PR to get merged and then integrate his benchmark into the uber-benchmark?

We would probably prefer this way if possible. We have some other features implemented on top of our branch (#2300) while waiting for #2190 to be merged

lucas-santos-amd · 2026-03-16T16:34:00Z

There is a related PR: #2190 @juuso-oskari
It also contains some benchmarking related updates.

What's the best course of action in your opinion? Wait for Juuso's PR to get merged and then integrate his benchmark into the uber-benchmark?

We would probably prefer this way if possible. We have some other features implemented on top of our branch (#2300) while waiting for #2190 to be merged

We will do it like this then

lucas-santos-amd · 2026-03-17T19:31:49Z

Support for unified attention has been removed from this branch. It will be added back in a later PR once #2190 is merged.

brunomazzottiamd

LGTM!

brunomazzottiamd · 2026-03-17T19:40:53Z

Support for unified attention has been removed from this branch. It will be added back in a later PR once #2190 is merged.

FYI: @cagrikymk + @juuso-oskari + @Chi-Chu319

The merge-base changed after approval.

brunomazzottiamd

LGTM! Nothing has changed since my last review, approving again due to rebase issues.

brunomazzottiamd · 2026-03-19T15:09:05Z

@lucas-santos-amd, keep an eye on #2317. Check if your bench_mha.py parsing logic will keep working, given the proposed FP8 fixes to bench_mha.py.

lucas-santos-amd self-assigned this Mar 13, 2026

lucas-santos-amd added enhancement New feature or request triton labels Mar 13, 2026

lucas-santos-amd changed the title ~~[TRITON] Add Attention support to bench_models.py~~ [TRITON] Add Attention support to the bench_models benchmarking script Mar 13, 2026

lucas-santos-amd added the ci:all label Mar 13, 2026

lucas-santos-amd marked this pull request as ready for review March 13, 2026 19:14

lucas-santos-amd requested review from a team, azaidy, brunomazzottiamd and cagrikymk March 13, 2026 19:14

lucas-santos-amd force-pushed the lusantos/add_attention_to_bench_models branch from 766780b to f94f542 Compare March 13, 2026 19:37

brunomazzottiamd reviewed Mar 16, 2026

View reviewed changes

brunomazzottiamd previously approved these changes Mar 16, 2026

View reviewed changes

lucas-santos-amd dismissed brunomazzottiamd’s stale review via 4bff17c March 17, 2026 19:16

lucas-santos-amd force-pushed the lusantos/add_attention_to_bench_models branch from 71304b3 to 4bff17c Compare March 17, 2026 19:16

lucas-santos-amd requested a review from brunomazzottiamd March 17, 2026 19:27

brunomazzottiamd previously approved these changes Mar 17, 2026

View reviewed changes

lucas-santos-amd force-pushed the lusantos/add_attention_to_bench_models branch from 4bff17c to 058ab47 Compare March 19, 2026 12:30

lucas-santos-amd requested a review from brunomazzottiamd March 19, 2026 12:34

brunomazzottiamd approved these changes Mar 19, 2026

View reviewed changes

lucas-santos-amd added 5 commits March 19, 2026 14:42

Add Attention support to bench_models.py

b843311

Add MHA layout CLI arg

cf374d5

Add support for batched_gemm_a16wfp4

979d8d3

Refactor TP logic and _get_handler

eed761d

Remove unified attention from this branch

cc0da14

lucas-santos-amd force-pushed the lusantos/add_attention_to_bench_models branch from 058ab47 to cc0da14 Compare March 19, 2026 17:43

Conversation

lucas-santos-amd commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions bot commented Mar 13, 2026

🏷️ CI Guide

Uh oh!

lucas-santos-amd commented Mar 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

brunomazzottiamd left a comment

Choose a reason for hiding this comment

Uh oh!

cagrikymk commented Mar 16, 2026

Uh oh!

brunomazzottiamd commented Mar 16, 2026

Uh oh!

Chi-Chu319 commented Mar 16, 2026

Uh oh!

lucas-santos-amd commented Mar 16, 2026

Uh oh!

lucas-santos-amd commented Mar 17, 2026

Uh oh!

brunomazzottiamd left a comment

Choose a reason for hiding this comment

Uh oh!

brunomazzottiamd commented Mar 17, 2026

Uh oh!

brunomazzottiamd left a comment

Choose a reason for hiding this comment

Uh oh!

brunomazzottiamd commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lucas-santos-amd commented Mar 13, 2026 •

edited

Loading