Merged
Conversation
636bc40 to
e5a8c5b
Compare
da74b83 to
73ebda1
Compare
8317872 to
35a60da
Compare
7c8d16f to
1d3325e
Compare
bbadd6e to
2f77108
Compare
jinminxi104
reviewed
Feb 6, 2026
996688e to
700db7d
Compare
Contributor
There was a problem hiding this comment.
Pull request overview
This PR adds expert parallelism (EP) support for Ascend NPU devices by integrating with the dlinfer library. The changes enable distributed MoE (Mixture of Experts) computation across multiple Ascend devices with optimized communication strategies based on the hardware generation (A2/A3).
Changes:
- Added EP support to dlinfer backend for Ascend devices with MoE metadata tracking and communication type selection
- Updated PyTorch/torch-npu version constraints to support newer versions (up to 2.10.0/2.25.0)
- Refactored kernel imports to use torch.Tensor directly instead of dlinfer type annotations for better compatibility
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| requirements/runtime_ascend.txt | Updated torch, torch-npu, and torchvision version constraints to support newer releases |
| docker/Dockerfile_ascend_a3 | Updated base CANN image and PyTorch versions to match runtime requirements |
| lmdeploy/pytorch/kernels/dlinfer/pagedattention.py | Changed imports to use typing.Optional/Sequence and torch.Tensor instead of dlinfer annotations |
| lmdeploy/pytorch/kernels/dlinfer/flash_attention.py | Changed import to use torch.Tensor instead of dlinfer type annotation |
| lmdeploy/pytorch/kernels/dlinfer/moe_gating_topk_softmax.py | Added moe_metadata parameter and DlinferMoeMetadata type import for EP support |
| lmdeploy/pytorch/kernels/dlinfer/fused_moe.py | Added moe_metadata parameter and MoE type imports for EP functionality |
| lmdeploy/pytorch/kernels/dlinfer/init.py | Exported DlinferMoECommType and DlinferMoeMetadata for use in backend |
| lmdeploy/pytorch/backends/dlinfer/moe.py | Extended MoE implementation with EP support including expert partitioning and metadata handling |
| lmdeploy/pytorch/backends/dlinfer/ascend/op_backend.py | Added comprehensive EP infrastructure including DistMeta, communication type selection, and MoE metadata creation |
| lmdeploy/pytorch/configurations/utils.py | Improved flash_mla availability check to handle torch_npu version differences |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
jinminxi104
approved these changes
Feb 9, 2026
Collaborator
|
waiting for ci result on dlinfer-side |
Collaborator
|
ci passed |
jinminxi104
approved these changes
Feb 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
related PR DeepLink-org/dlinfer#237