Skip to content

Add MoE ASM ctypes migration on top of #2255#2341

Open
zufayu wants to merge 1 commit intorefactor_bind_klfrom
refactor_bind_ctypes_all
Open

Add MoE ASM ctypes migration on top of #2255#2341
zufayu wants to merge 1 commit intorefactor_bind_klfrom
refactor_bind_ctypes_all

Conversation

@zufayu
Copy link
Contributor

@zufayu zufayu commented Mar 19, 2026

Summary

  • Migrate 8 MoE ASM kernel functions from pybind11 (torch::Tensor&) to C ABI (AiterTensor* + hipStream_t) called via ctypes
  • Functions migrated: fmoe, fmoe_int8_g1u0, fmoe_g1u1, fmoe_g1u1_tkw1, fmoe_int8_g1u0_a16, fmoe_g1u1_a16, fmoe_fp8_blockscale_g1u1, moe_stage1_g1u1
  • Split ctypes sources into separate module_moe_fmoe_asm build module to avoid torch dependency in the .so

Changes

  • csrc/py_itfs_cu/asm_fmoe.cu: Remove torch/ATen includes; use AiterTensor*, AITER_DTYPE_*, AITER_CHECK; convert template <typename T, typename T_O> to <int I_elemSize, int O_elemSize>; add extern "C" exports
  • csrc/py_itfs_cu/asm_moe_2stage.cu: Same conversion for moe_stage1_g1u1
  • csrc/include/moe_op.h: Remove fmoe pybind declarations (now extern "C" in .cu files)
  • csrc/include/rocm_ops.hpp: Remove fmoe entries from MOE_OP_PYBIND macro
  • aiter/ops/moe_op.py: Use ffi_type="ctypes" with new module_moe_fmoe_asm for migrated functions
  • aiter/jit/optCompilerConfig.json: Split ctypes sources into module_moe_fmoe_asm; keep pybind sources in module_moe_asm

Test plan

  • Verify module_moe_fmoe_asm builds without torch dependency
  • Run python op_tests/test_moe.py -t fmoe to validate fmoe kernel
  • Run python op_tests/test_moe.py -t fmoe_int8 to validate int8 path
  • Run python op_tests/test_moe.py -t fmoe_g1u1 to validate g1u1 path
  • Verify remaining pybind ops (topk_softmax, moe_align_block_size, moe_sum) still work

🤖 Generated with Claude Code

Convert fmoe, fmoe_int8_g1u0, fmoe_g1u1, fmoe_g1u1_tkw1,
fmoe_int8_g1u0_a16, fmoe_g1u1_a16, fmoe_fp8_blockscale_g1u1,
and moe_stage1_g1u1 from torch::Tensor& (pybind11) to
AiterTensor* + hipStream_t (C ABI called via ctypes).

- asm_fmoe.cu: Remove torch/ATen includes, use AiterTensor*,
  AITER_DTYPE_*, AITER_CHECK; template <int I_elemSize, int O_elemSize>
- asm_moe_2stage.cu: Same conversion for moe_stage1_g1u1
- moe_op.h: Remove fmoe pybind declarations (now extern "C")
- rocm_ops.hpp: Remove fmoe entries from MOE_OP_PYBIND macro
- moe_op.py: Use ffi_type="ctypes" with new module_moe_fmoe_asm
- optCompilerConfig.json: Split ctypes sources into module_moe_fmoe_asm

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

🏷️ CI Guide

Runs automatically on every PR:

  • ✅ Pre-checks (submodule verification, code formatting)
  • ✅ Aiter op tests (gfx942 + gfx950)
  • ✅ Triton tests (only when aiter/ops/triton/** or related paths are changed)

Extended tests (opt-in via labels):

Label Tests
ci:sglang SGLang integration tests
ci:atom ATOM benchmark (DeepSeek-R1 + GPT-OSS)
ci:vllm vLLM benchmark
ci:all All of the above

Add labels via the sidebar or gh pr edit 2341 --add-label <label>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant