fix: skip self NVLink connection check #582 by pazyork · Pull Request #609 · deepseek-ai/DeepEP

pazyork · 2026-04-24T06:52:11Z

Fixes #582

Problem
When multiple ranks share the same physical GPU (common in local single-GPU debugging, container GPU sharing, MIG vGPU scenarios, even on multi-GPU servers), the NVLink initialization check triggers a false assertion:
AssertionError: No NVLink connection between GPU 0 and GPU 0

Root cause
The original code only skips duplicate checks by index i >= j, but doesn't handle the case where different rank indices map to the same physical GPU ID.

Fix
Add or physical_device_indices[i] == physical_device_indices[j] condition to the check loop, skip NVLink check for same physical GPU pairs.

Test
Added lightweight unit test in tests/test_utils.py with mocked pynvml/distributed, no real multi-GPU hardware required. Verified:

Duplicate GPU scenario no longer raises assertion
Normal multi-GPU scenario works unchanged

Impact
No breaking changes, only affects the NVLink initialization check process. All existing normal deployment scenarios are completely unaffected.

Thanks for reviewing! It's a tiny fix, feel free to let me know if you have any suggestions and I'll adjust promptly. @jershi425 @sphish

Fixes deepseek-ai#582: avoid false assertion when multiple ranks share the same physical GPU (common in local single-GPU debugging, container GPU sharing, MIG scenarios). Add unit test for this case.

fix: skip self NVLink check for duplicate physical GPU IDs

4fdd8a2

Fixes deepseek-ai#582: avoid false assertion when multiple ranks share the same physical GPU (common in local single-GPU debugging, container GPU sharing, MIG scenarios). Add unit test for this case.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: skip self NVLink connection check #582#609

fix: skip self NVLink connection check #582#609
pazyork wants to merge 1 commit intodeepseek-ai:mainfrom
pazyork:main

pazyork commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pazyork commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants