Skip to content

cuda: gate MMQ cp.async to tensor-split MoE

71ee5aa
Select commit
Loading
Failed to load commit list.
Sign in for the full log view
Closed

Optimize CUDA matmul for empty shards and MMQ dispatch #22170

cuda: gate MMQ cp.async to tensor-split MoE
71ee5aa
Select commit
Loading
Failed to load commit list.
labeler
succeeded Apr 20, 2026 in 16s