[TRITON] Add Attention support to the bench_models benchmarking script#2274
[TRITON] Add Attention support to the bench_models benchmarking script#2274lucas-santos-amd wants to merge 5 commits intomainfrom
Conversation
🏷️ CI GuideRuns automatically on every PR:
Extended tests (opt-in via labels):
|
766780b to
f94f542
Compare
|
@cagrikymk Please take a look at the unified attention benchmark when you can. I converted it to pure python and added BW/TFLOPS calculations for the reduction kernel. |
op_tests/op_benchmarks/triton/model_benchmarking_tool/bench_models.py
Outdated
Show resolved
Hide resolved
op_tests/op_benchmarks/triton/model_benchmarking_tool/model_shapes.json
Outdated
Show resolved
Hide resolved
op_tests/op_benchmarks/triton/model_benchmarking_tool/bench_models.py
Outdated
Show resolved
Hide resolved
op_tests/op_benchmarks/triton/model_benchmarking_tool/bench_models.py
Outdated
Show resolved
Hide resolved
brunomazzottiamd
left a comment
There was a problem hiding this comment.
@lucas-santos-amd, I have some suggestions, but no blockers. It would be nice to have a thumbs up from Cagri regarding op_tests/op_benchmarks/triton/bench_unified_attention.py.
|
There is a related PR: #2190 @juuso-oskari It also contains some benchmarking related updates. |
What's the best course of action in your opinion? Wait for Juuso's PR to get merged and then integrate his benchmark into the uber-benchmark? |
We would probably prefer this way if possible. We have some other features implemented on top of our branch (#2300) while waiting for #2190 to be merged |
We will do it like this then |
71304b3 to
4bff17c
Compare
|
Support for unified attention has been removed from this branch. It will be added back in a later PR once #2190 is merged. |
FYI: @cagrikymk + @juuso-oskari + @Chi-Chu319 |
The merge-base changed after approval.
4bff17c to
058ab47
Compare
brunomazzottiamd
left a comment
There was a problem hiding this comment.
LGTM! Nothing has changed since my last review, approving again due to rebase issues.
|
@lucas-santos-amd, keep an eye on #2317. Check if your |
058ab47 to
cc0da14
Compare
Motivation
Technical Details
New help text of
bench_models.py:Description of output CSV file:
*MLA benchmark only reports Time(ms)
**RoPE reports only total floating-point operations, not throughput (TFLOPS).
Test Plan
Test Result
Submission Checklist