Replace custom ibgda usage with nvshmem native apis#574
Open
seth-howell wants to merge 4 commits intodeepseek-ai:mainfrom
Open
Replace custom ibgda usage with nvshmem native apis#574seth-howell wants to merge 4 commits intodeepseek-ai:mainfrom
seth-howell wants to merge 4 commits intodeepseek-ai:mainfrom
Conversation
Signed-off-by: Seth Howell <sethh@nvidia.com>
Add support for unordered networks. A fence is required before the put_signal to send the tokens when the transport is unordered. This is a no-op for IB-based transports. Signed-off-by: Seth Howell <sethh@nvidia.com>
This prevents small performance losses from following proxy paths in the IB cases. TODO: When the EFA transport supports ordering between put-signal operations remove the fence operation entirely. Signed-off-by: Seth Howell <sethh@nvidia.com>
Signed-off-by: Seth Howell <sethh@nvidia.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This patch enables native NVSHMEM API usage in DeepEP
Eliminating the custom ibgda_device.cu file from DeepEP and switching to the new public nvshmem_qp APIs extends support to multiple transport backends including IBRC, IBDEVX, and libfabric.
This patch does not degrade performance on IBGDA. In some cases (2N, 4N) the kernels exhibit better performance than IBGDA.
It preserves IBGDA as the default transport unless a specific transport is supplied by the user.
Unordered transports like libfabric + EFA do require a fence before the put_signal operations to maintain functionality. Currently the performance of libfabric + EFA is not close to the performance of IB due to some missing functionality in the libfabric transport:
These improvements will be given to users for free in future releases of NVSHMEM.