Skip to content

[Feat] Support dataclass & automatic triton_kernel wrapping for triton_op registration in magi_register_custom_op#33

Open
themistbeforedawn wants to merge 2 commits into
SandAI-org:mainfrom
themistbeforedawn:feat/magi-register-op-with-nested-triton-kernels
Open

[Feat] Support dataclass & automatic triton_kernel wrapping for triton_op registration in magi_register_custom_op#33
themistbeforedawn wants to merge 2 commits into
SandAI-org:mainfrom
themistbeforedawn:feat/magi-register-op-with-nested-triton-kernels

Conversation

@themistbeforedawn
Copy link
Copy Markdown
Collaborator

🗂️ PR Category

  • ✨ New Feature
  • 🚀 Optimization (performance, memory, etc.)
  • 💥 Breaking Change
  • 🐛 Bug Fix
  • 🛠️ Development / Refactoring
  • 📚 Documentation
  • 🧹 Chore (Dependencies, CI/CD, Configuration, etc.)
  • 🧪 Testing

📝 Description

What's new

@magi_register_custom_op now auto-detects Triton kernels in the decorated function (and its helpers) and routes the registration through torch.library.triton_op instead of custom_op, so Inductor can trace through the op and fuse around the kernel rather than treating it as an opaque barrier. In the common case the user surface doesn't change:

@magi_register_custom_op()                       # auto -> triton_op
def add(x: torch.Tensor, y: torch.Tensor) -> torch.Tensor:
    out = torch.empty_like(x)
    n = x.numel()
    add_kernel[(triton.cdiv(n, 128),)](x, y, out, n, BLOCK_SIZE=128)
    return out

Internally a single AST pass walks fn + its helpers (BFS, depth-capped) and collects every Triton kernel reference and every torch.ops.<ns>.<op> call. Based on what it finds, the decorator picks the appropriate registration path — triton_op (with bare kernels shadowed via wrap_triton), custom_op, or skipping registration entirely so Inductor can inline fn and fuse across the nested ops. See the decision matrix on _decide_registration_path for the full rules.

Three new keyword args cover the cases AST can't reach or where the user wants to override:

  • extra_triton_kernels=[...] — escape hatch for kernels the AST scanner can't statically resolve (e.g. self.kernel[grid](...) — subscript-on-attribute, factory-built kernels, runtime imports). Listed kernels are treated as bare and force triton_op mode.
  • force_register_mode="triton_op" | "custom_op" — pin the path explicitly; useful when you want a deliberate fusion barrier around a Triton body, or when introspection misses your kernel.
  • max_introspect_depth=5 — how deep the helper-recursion goes (doesn't bound flat scanning of fn itself).

Friendly up-front errors at decoration time, instead of opaque schema-fingerprint failures later: duplicate / mis-namespaced op names, @triton.heuristics as the outermost decorator (which wrap_triton doesn't accept — the message tells you the right stacking order), and obvious mis-uses like mixing a bare kernel with a nested custom_op in the same body. triton_op registration failure falls back to custom_op + register_fake with a warning.

Architecture

The existing 4-slot pipeline is unchanged; this PR only changes what fills slot 2 (torch.library.triton_op instead of custom_op when appropriate) and adds a mode="none" short-circuit that returns fn directly.

Tests

50 new tests in tests/api_tests/test_register_triton_op.py cover the full kernel-discovery surface (closures, helpers, multi-level nesting, factory kernels, self.kernel on nn.Module, cross-module launchers, third-party .fn wrappers, staticmethod / classmethod, runtime imports, dedup, wrap_triton idempotence), all three registration paths + force_register_mode overrides, @triton.autotune (incl. multiple per op) and @triton.heuristics rejection, autograd + dataclass interactions through the Triton path, and an Inductor-AOT-graph check confirming the kernel is actually visible post-compile.

@themistbeforedawn themistbeforedawn changed the title Feat/magi register op with nested triton kernels [Feat] Support dataclass & automatic triton_kernel wrapping for triton_op registration in magi_register_custom_op May 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants