Skip to content

Refactor kernel#237

Draft
kevssim wants to merge 28 commits into
modelscope:mainfrom
kevssim:refactor/kernel-mapping-api
Draft

Refactor kernel#237
kevssim wants to merge 28 commits into
modelscope:mainfrom
kevssim:refactor/kernel-mapping-api

Conversation

@kevssim

@kevssim kevssim commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Refactor twinkle.kernel from registry-based API to a minimal mapping-driven API. Public surface reduced to kernelize, hub, npu_builtin.

  • kernelize(model, mappings) applies class/attr replacements onto a live model (exact type(m) is target_cls match)
  • hub(*entries) declares kernels resolved lazily from the optional kernels package
  • npu_builtin() returns the standard NPU bundle (RMSNorm, rotary, swiglu, SDPA, MoE, FLA); GMM opts in manually

Deleted legacy registry.py, function.py, layer.py, base.py, monkey_patch_npu.py; added npu_impls/ package. Migrated cookbook/transformers/{fsdp2,sp_fsdp_dense,ep_fsdp2_lora_qwen3_5_moe}.py and rewrote zh/en Kernel docs.

kevssim added 24 commits June 29, 2026 15:15
…load

- builtin.py: _install_sdpa() now only runs when torch_npu is importable,
  preventing the NPU (boolean-mask-inverting) SDPA impl from contaminating
  the global ALL_ATTENTION_FUNCTIONS['sdpa'] registry on CUDA/CPU hosts.
- builtin.py: drop dead _SdpaPatchSentinel + add/pop scaffolding.
- fla.py: flip is_flash_linear_attention_available only after the MindSpeed
  kernel imports successfully; previously a MindSpeed-missing NPU host would
  be left with FLA flagged available but no kernel installed -> Qwen3.5
  runtime failure.
This reverts commit 126efc3.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Twinkle kernel module to introduce a mapping-driven kernel replacement API, exposing kernelize, hub, and npu_builtin while removing legacy registration and patch helpers. It also modularizes NPU-specific optimizations under src/twinkle/kernel/npu_impls/ and updates documentation and tests. A critical issue was identified in src/twinkle/kernel/core.py where the helper function _infer_device is missing, causing an ImportError in the test suite.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/twinkle/kernel/core.py
@kevssim kevssim changed the title refactor kernel Refactor kernel Jun 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant