[ROCm] Enable aiter group quant FP8 for RDNA4 gpus#78
[ROCm] Enable aiter group quant FP8 for RDNA4 gpus#78big-yellow-duck wants to merge 8 commits intomainfrom
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
Signed-off-by: big-yellow-duck <jeffaw99@hotmail.com>
|
We need to guard all of the other ops with additional condition |
Signed-off-by: big-yellow-duck <jeffaw99@hotmail.com>
Signed-off-by: big-yellow-duck <jeffaw99@hotmail.com>
Signed-off-by: big-yellow-duck <jeffaw99@hotmail.com>
vllm/_aiter_ops.py
Outdated
|
|
||
| import vllm.envs as envs | ||
| from vllm.platforms import current_platform | ||
| from vllm.platforms.rocm import on_gfx12x, on_mi3xx |
There was a problem hiding this comment.
we cannot do this a module level import
from vllm.platforms.rocm import on_gfx12x, on_mi3xx
It will break our contract that we can always import _aiter_ops.py even on non-ROCm platform
Signed-off-by: big-yellow-duck <jeffaw99@hotmail.com>
Purpose
This PR combines two AITER kernel patches for gfx12xx support:
VLLM_ROCM_USE_AITER_RMSNORM=1W8A8BlockFp8LinearOpfull_run_aiterpipeline following the aiter enablement in [ROCm] Enable Aiter ck_gemm_a8w8_blockscale for RDNA4 gpus. Qwen3.5-27B-FP8 tp=2, Qwen3-0.6B-FP8 tp=1 #77Note: Aiter RMSNorm patch is only on Aiter's side.
Test Plan
Benchmark Qwen3.5-27B-FP8 with default vLLM and vLLM with AITER enabled on 2x Radeon PRO 9700
Default (baseline)
AITER with CK gemm_a8w8_blockscale + GroupQuant FP8
AITER with CK gemm_a8w8_blockscale RMSNorm + GroupQuant FP8
Test Results
Aiter ck_gemm_a8w8_blockscale + GroupQuant fp8
TTFT (ms)
TPOT (ms)
E2E Latency (ms)
Aiter ck_gemm_a8w8_blockscale + GroupQuant fp8 + RMSNorm
TTFT (ms)
TPOT (ms)
E2E Latency (ms)
Accuracy checks
GSM8K Accuracy
Aiter ck_gemm_a8w8_blockscale + GroupQuant fp8
Aiter ck_gemm_a8w8_blockscale + GroupQuant fp8 + RMSNorm
All accuracy differences are not statistically significant.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.