Skip to content

Use AITER DSv4 Indexer top-k kernel#684

Draft
Oseltamivir wants to merge 8 commits intoROCm:feat/deepseek-v4-pr1-skeletonfrom
Oseltamivir:dsv4-aiter-pr2998-indexer
Draft

Use AITER DSv4 Indexer top-k kernel#684
Oseltamivir wants to merge 8 commits intoROCm:feat/deepseek-v4-pr1-skeletonfrom
Oseltamivir:dsv4-aiter-pr2998-indexer

Conversation

@Oseltamivir
Copy link
Copy Markdown

Motivation

Wire the DeepSeek-V4 CSA Indexer path in ATOM's PR #650 branch to the DSv4 Indexer scorer/top-k implementation from ROCm/aiter#2998 instead of materializing fp8_mqa_logits and running PyTorch topk in ATOM.

This is intended to make ATOM exercise the AITER PR #2998 Indexer implementation in downstream InferenceX DSv4 runs.

Technical Details

  • Imports aiter.ops.triton.attention.dsv4_indexer.dsv4_indexer_topk.
  • Applies the reference Indexer FP4 simulation to Q before scoring: RoPE, Hadamard rotate, FP4 round-trip back to BF16.
  • Gathers ATOM's paged Indexer KV cache, converts it into the same rotated FP4-dequantized basis, and builds the dense [num_seqs, max_committed, head_dim] view expected by AITER PR #2998.
  • Calls dsv4_indexer_topk(..., offset=0, seq_ids=..., kv_lens=...) so AITER handles batched per-sequence KV lengths and DSv4 causal compressed-entry visibility.
  • Adds ATOM's per-token sparse-attention offset afterward to preserve the existing [window || compressed] sparse-attention layout.

Note: this PR intentionally wires the Indexer half of ROCm/aiter#2998 first. The AITER sparse_mqa_sink ABI is paged KV + uniform top-k, while PR #650 currently uses ATOM's ragged packed [window || compressed] sparse-attention layout. Routing sparse attention through AITER should be a follow-up ABI/layout change rather than a per-layer repack.

Test Plan

  • Local syntax check: python3 -m py_compile atom/models/deepseek_v4.py atom/model_ops/attentions/deepseek_v4_attn.py
  • Downstream InferenceX DSv4 ATOM job will overlay this branch plus Dsv4 sparse indexer aiter#2998.

Test Result

Local syntax check passes. GPU correctness/performance validation is pending in InferenceX.

@Oseltamivir Oseltamivir marked this pull request as draft May 4, 2026 07:28
… into dsv4-aiter-pr2998-indexer

# Conflicts:
#	atom/model_ops/attentions/deepseek_v4_attn.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant