Use AITER DSv4 Indexer top-k kernel by Oseltamivir · Pull Request #684 · ROCm/ATOM

Oseltamivir · 2026-05-02T21:41:14Z

Motivation

Wire the DeepSeek-V4 CSA Indexer path in ATOM's PR #650 branch to the DSv4 Indexer scorer/top-k implementation from ROCm/aiter#2998 instead of materializing fp8_mqa_logits and running PyTorch topk in ATOM.

This is intended to make ATOM exercise the AITER PR #2998 Indexer implementation in downstream InferenceX DSv4 runs.

Technical Details

Imports aiter.ops.triton.attention.dsv4_indexer.dsv4_indexer_topk.
Applies the reference Indexer FP4 simulation to Q before scoring: RoPE, Hadamard rotate, FP4 round-trip back to BF16.
Gathers ATOM's paged Indexer KV cache, converts it into the same rotated FP4-dequantized basis, and builds the dense [num_seqs, max_committed, head_dim] view expected by AITER PR #2998.
Calls dsv4_indexer_topk(..., offset=0, seq_ids=..., kv_lens=...) so AITER handles batched per-sequence KV lengths and DSv4 causal compressed-entry visibility.
Adds ATOM's per-token sparse-attention offset afterward to preserve the existing [window || compressed] sparse-attention layout.

Note: this PR intentionally wires the Indexer half of ROCm/aiter#2998 first. The AITER sparse_mqa_sink ABI is paged KV + uniform top-k, while PR #650 currently uses ATOM's ragged packed [window || compressed] sparse-attention layout. Routing sparse attention through AITER should be a follow-up ABI/layout change rather than a per-layer repack.

Test Plan

Local syntax check: python3 -m py_compile atom/models/deepseek_v4.py atom/model_ops/attentions/deepseek_v4_attn.py
Downstream InferenceX DSv4 ATOM job will overlay this branch plus Dsv4 sparse indexer aiter#2998.

Test Result

Local syntax check passes. GPU correctness/performance validation is pending in InferenceX.

…nto dsv4-aiter-pr2998-indexer # Conflicts: # atom/model_ops/attentions/deepseek_v4_attn.py # atom/models/deepseek_v4.py

… into dsv4-aiter-pr2998-indexer # Conflicts: # atom/model_ops/attentions/deepseek_v4_attn.py

This reverts commit ac1e4ea.

Oseltamivir added 5 commits May 2, 2026 14:40

Use AITER DSv4 indexer topk

47858cc

Trim structured answer completions

9e09677

Merge remote-tracking branch 'origin/feat/deepseek-v4-pr1-skeleton' i…

2fe45ab

…nto dsv4-aiter-pr2998-indexer # Conflicts: # atom/model_ops/attentions/deepseek_v4_attn.py # atom/models/deepseek_v4.py

fix: avoid dsv4 metadata repeat sync

4634e6a

debug: add dsv4 eval component probes

6a1b7a5

Oseltamivir marked this pull request as draft May 4, 2026 07:28

Oseltamivir added 3 commits May 4, 2026 10:13

Merge remote-tracking branch 'upstream/feat/deepseek-v4-pr1-skeleton'…

94a90fb

… into dsv4-aiter-pr2998-indexer # Conflicts: # atom/model_ops/attentions/deepseek_v4_attn.py

Cap dummy warmup tokens via env

ac1e4ea

Revert "Cap dummy warmup tokens via env"

deb141f

This reverts commit ac1e4ea.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use AITER DSv4 Indexer top-k kernel#684

Use AITER DSv4 Indexer top-k kernel#684
Oseltamivir wants to merge 8 commits intoROCm:feat/deepseek-v4-pr1-skeletonfrom
Oseltamivir:dsv4-aiter-pr2998-indexer

Oseltamivir commented May 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Oseltamivir commented May 2, 2026

Motivation

Technical Details

Test Plan

Test Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant