Dsv4 sparse indexer by Oseltamivir · Pull Request #2998 · ROCm/aiter

Oseltamivir · 2026-05-01T05:02:07Z

Motivation

DSv4 uses a sparse attention path where each query gathers a small top-k set of compressed KV entries, plus an Indexer path that scores compressed KV entries to produce those top-k indices.

The current ATOM DSv4 integration has correctness-first Torch fallbacks for both paths. Those fallbacks materialize large intermediate tensors and are too slow for serving, especially at conc > 1. This PR adds AITER Triton kernels for the DSv4 sparse MQA attention sink path and the DSv4 indexer scorer/top-k path so ATOM can avoid the Torch fallback.

Technical Details

This PR adds:

sparse_mqa_sink: DSv4 sparse MQA forward with attention-sink denominator semantics.
dsv4_indexer_topk: DSv4 Indexer scorer and causal top-k selection without materializing the Torch fallback’s [tokens, heads, committed_kv] score tensor.
A dense causal fast path for the Indexer when actual_topk == n_committed, which is common for
short-context DSv4 serving.
Tests for sparse MQA sink correctness and DSv4 Indexer correctness.

The sparse attention kernel supports DSv4’s MQA layout:

q: [num_tokens, num_heads, head_dim]
kv: [num_blocks, block_size, head_dim]
topk_indices: [num_tokens, topk]
attn_sink: [num_heads]

The Indexer kernel computes: score[t, k] = sum_h relu(q[t, h] @ kv[k]) * weights[t, h] then applies the DSv4 causal compressed-token mask and returns offset top-k indices for the downstream sparse attention gather.

Relevant downstream integration target: ROCm/ATOM DeepSeek-V4 PR650.

Test Plan

Run Python compile checks for the new modules and tests.
Run unit tests for:
- op_tests/test_sparse_mqa_sink.py
- op_tests/test_dsv4_indexer.py
Validate downstream ATOM integration by replacing DSv4 Torch sparse-attention and Indexer fallback paths with these AITER ops.

Test Result

Local syntax/import validation passed with:

python3 -m py_compile \
  aiter/ops/triton/attention/dsv4_indexer.py \
  aiter/ops/triton/attention/sparse_mqa_sink.py \
  aiter/ops/triton/_triton_kernels/attention/dsv4_indexer.py \
  aiter/ops/triton/_triton_kernels/attention/sparse_mqa_sink.py \
  op_tests/test_dsv4_indexer.py \
  op_tests/test_sparse_mqa_sink.py

The branch is clean against current ROCm/aiter:main and contains only the DSv4 sparse/indexer kernel additions plus tests.

Was tested and is being used at SemiAnalysisAI/InferenceX #1229, with runs: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25193385172

op_tests results: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25221896798

Submission Checklist

Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

github-actions · 2026-05-01T05:02:42Z

🏷️ CI Guide

Runs automatically on every PR:

✅ Pre-checks (submodule verification, code formatting)
✅ Aiter op tests (gfx942 + gfx950)
✅ Triton tests on MI35X (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label	Tests
`ci:triton-300x`	Run an additional Triton test job on MI300X in PRs; main branch always runs both MI35X and MI300X
`ci:sglang`	SGLang integration tests
`ci:atom`	ATOM benchmark (DeepSeek-R1 + GPT-OSS)
`ci:vllm`	vLLM benchmark
`ci:all`	All of the above

Add labels via the sidebar or gh pr edit 2998 --add-label <label>

Copilot

Pull request overview

Adds Triton implementations for DeepSeek-V4 (DSv4) sparse attention and Indexer top-k selection to replace slow Torch fallbacks in ATOM/serving paths.

Changes:

Introduce sparse_mqa_sink Triton op implementing DSv4 sparse MQA forward with attention-sink denominator semantics.
Introduce dsv4_indexer_topk Triton op implementing DSv4 Indexer scoring + causal top-k, including a dense causal fast path.
Add unit tests for both new ops and register the modules in Triton backward-compat import map.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
op_tests/test_sparse_mqa_sink.py	Adds correctness test comparing `sparse_mqa_sink` vs a Torch reference.
op_tests/test_dsv4_indexer.py	Adds tests for Indexer dense-causal fast path and scored top-k vs Torch reference.
aiter/ops/triton/attention/sparse_mqa_sink.py	Python wrapper for launching the sparse MQA sink Triton kernel.
aiter/ops/triton/attention/dsv4_indexer.py	Python wrapper for Indexer scoring + top-k, including dense fast path.
aiter/ops/triton/_triton_kernels/attention/sparse_mqa_sink.py	Triton kernel for sparse MQA sink with per-token top-k gather and sink denominator.
aiter/ops/triton/_triton_kernels/attention/dsv4_indexer.py	Triton kernels for dense causal indices, scoring, and finalizing offset indices.
aiter/ops/triton/init.py	Registers `dsv4_indexer` and `sparse_mqa_sink` for backward-compatible imports.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

This reverts commit 2616197.

# Conflicts: # aiter/ops/topk.py # csrc/include/rocm_ops.hpp # csrc/include/topk_per_row.h # csrc/kernels/topk_per_row_kernels.cu # op_tests/test_topk_per_row.py

Oseltamivir added 3 commits April 30, 2026 21:11

Add DSv4 sparse attention and indexer Triton ops

4d99805

Tile DSv4 sparse MQA output dimension

c2eafc5

Retile DSv4 sparse MQA sink

13b6b39

Oseltamivir requested review from a team and Copilot May 1, 2026 05:02

Copilot started reviewing on behalf of Oseltamivir May 1, 2026 05:03 View session

rm tests

2616197

Copilot AI reviewed May 1, 2026

View reviewed changes

Comment thread aiter/ops/triton/attention/sparse_mqa_sink.py

Comment thread aiter/ops/triton/_triton_kernels/attention/sparse_mqa_sink.py Outdated

valarLip requested a review from vgokhale May 1, 2026 05:48

Oseltamivir and others added 6 commits May 1, 2026 08:21

Revert "rm tests"

7c5fd02

This reverts commit 2616197.

Address sparse MQA review comments

19d07bd

Avoid sync in sparse MQA input checks

914786b

Merge branch 'main' into dsv4-sparse-indexer-pr

aeb8946

Add batched DSv4 indexer coverage

0923d27

Merge branch 'main' into dsv4-sparse-indexer-pr

aa0c5b6

functionstackx mentioned this pull request May 1, 2026

LETS GO AMD!!! SemiAnalysisAI/InferenceX#1229

Closed

This was referenced May 2, 2026

Clean up DSv4 ATOM AITER PR2998 overlay SemiAnalysisAI/InferenceX#1260

Open

Use AITER DSv4 Indexer top-k kernel ROCm/ATOM#684

Draft

Oseltamivir and others added 4 commits May 3, 2026 10:38

Format DSv4 sparse indexer files

220bd4d

Merge branch 'main' into dsv4-sparse-indexer-pr

085989c

fix: make topk per row width configurable

883ddb7

Merge remote-tracking branch 'upstream/main' into dsv4-sparse-indexer-pr

969863a

# Conflicts: # aiter/ops/topk.py # csrc/include/rocm_ops.hpp # csrc/include/topk_per_row.h # csrc/kernels/topk_per_row_kernels.cu # op_tests/test_topk_per_row.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dsv4 sparse indexer#2998

Dsv4 sparse indexer#2998
Oseltamivir wants to merge 14 commits intoROCm:mainfrom
Oseltamivir:dsv4-sparse-indexer-pr

Oseltamivir commented May 1, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Oseltamivir commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

github-actions Bot commented May 1, 2026

🏷️ CI Guide

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Oseltamivir commented May 1, 2026 •

edited

Loading