Skip to content

[ascend]suppot ep#3696

Merged
lvhan028 merged 27 commits intoInternLM:mainfrom
DeepLink-org:support_ds_eager
Feb 11, 2026
Merged

[ascend]suppot ep#3696
lvhan028 merged 27 commits intoInternLM:mainfrom
DeepLink-org:support_ds_eager

Conversation

@yao-fengchen
Copy link
Collaborator

@yao-fengchen yao-fengchen commented Jul 1, 2025

related PR DeepLink-org/dlinfer#237

  • add ascend ep
  • qwen 235B ep

@jinminxi104 jinminxi104 marked this pull request as draft July 2, 2025 02:42
@yao-fengchen yao-fengchen changed the title [ascend]suppot deepseek eager_mode [ascend]suppot ep Dec 16, 2025
@yao-fengchen yao-fengchen force-pushed the support_ds_eager branch 3 times, most recently from 636bc40 to e5a8c5b Compare December 23, 2025 09:16
@jinminxi104 jinminxi104 marked this pull request as ready for review February 6, 2026 09:33
Copilot AI review requested due to automatic review settings February 6, 2026 09:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds expert parallelism (EP) support for Ascend NPU devices by integrating with the dlinfer library. The changes enable distributed MoE (Mixture of Experts) computation across multiple Ascend devices with optimized communication strategies based on the hardware generation (A2/A3).

Changes:

  • Added EP support to dlinfer backend for Ascend devices with MoE metadata tracking and communication type selection
  • Updated PyTorch/torch-npu version constraints to support newer versions (up to 2.10.0/2.25.0)
  • Refactored kernel imports to use torch.Tensor directly instead of dlinfer type annotations for better compatibility

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
requirements/runtime_ascend.txt Updated torch, torch-npu, and torchvision version constraints to support newer releases
docker/Dockerfile_ascend_a3 Updated base CANN image and PyTorch versions to match runtime requirements
lmdeploy/pytorch/kernels/dlinfer/pagedattention.py Changed imports to use typing.Optional/Sequence and torch.Tensor instead of dlinfer annotations
lmdeploy/pytorch/kernels/dlinfer/flash_attention.py Changed import to use torch.Tensor instead of dlinfer type annotation
lmdeploy/pytorch/kernels/dlinfer/moe_gating_topk_softmax.py Added moe_metadata parameter and DlinferMoeMetadata type import for EP support
lmdeploy/pytorch/kernels/dlinfer/fused_moe.py Added moe_metadata parameter and MoE type imports for EP functionality
lmdeploy/pytorch/kernels/dlinfer/init.py Exported DlinferMoECommType and DlinferMoeMetadata for use in backend
lmdeploy/pytorch/backends/dlinfer/moe.py Extended MoE implementation with EP support including expert partitioning and metadata handling
lmdeploy/pytorch/backends/dlinfer/ascend/op_backend.py Added comprehensive EP infrastructure including DistMeta, communication type selection, and MoE metadata creation
lmdeploy/pytorch/configurations/utils.py Improved flash_mla availability check to handle torch_npu version differences

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@grimoire grimoire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jinminxi104 jinminxi104 self-requested a review February 9, 2026 06:31
@jinminxi104
Copy link
Collaborator

waiting for ci result on dlinfer-side

@jinminxi104
Copy link
Collaborator

ci passed

@lvhan028 lvhan028 merged commit 09071c7 into InternLM:main Feb 11, 2026
15 checks passed
@lvhan028 lvhan028 added the enhancement New feature or request label Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants