Support Blackwell GPUs (torch 2.7 + DGL 2.4 cu124)#29
Open
jackytamkc wants to merge 1 commit intoZJUFanLab:mainfrom
Open
Support Blackwell GPUs (torch 2.7 + DGL 2.4 cu124)#29jackytamkc wants to merge 1 commit intoZJUFanLab:mainfrom
jackytamkc wants to merge 1 commit intoZJUFanLab:mainfrom
Conversation
- _train.py: replace `graph.adjacency_matrix().to_dense().shape[0]` with `graph.num_nodes()` at the two pos_weight sites. Same value, no N*N allocation, and avoids the torch-pinned libdgl_sparse_pytorch load that fails on torch 2.7 with the DGL 2.4 cu124 wheel. - README: add Blackwell / sm_120 install instructions alongside the existing cu113 path. Backward-compatible: num_nodes() works on all supported DGL versions; README changes are additive. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
6 tasks
|
Thank you for your comments. We will consider testing it and add it into our next version. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
graph.adjacency_matrix().to_dense().shape[0]withgraph.num_nodes()at the twopos_weightsites inscniche/trainer/_train.py. Same value, no N×N dense allocation, and avoids the torch-pinnedlibdgl_sparse_pytorch_<X>.soload that fails on torch 2.7 with the DGL 2.4 cu124 wheel.Background
Blackwell GPUs (compute capability sm_120) require PyTorch built against CUDA 12.8. No DGL wheel is currently published for cu128, but the DGL 2.4 cu124 wheel's CUDA kernels are ABI-compatible with torch 2.7 and JIT-forward to sm_120. The only blocker for scNiche on this stack is
DGLGraph.adjacency_matrix(), which tries to dlopen a torch-version-pinned sparse library (libdgl_sparse_pytorch_2.4.so) and crashes against torch 2.7. Both call sites in scNiche only use the result to readN, so swapping tonum_nodes()is functionally equivalent and removes the dependency on the sparse library.Backward compatibility
DGLGraph.num_nodes()has been part of the public DGL API since 0.5, so existing users on DGL 1.1.0+cu113 are unaffected (same integer, faster).Test plan
Runner.fitandRunner_batch.fitstill produce the samepos_weightand embeddings as before.Runner.fitandRunner_batch.fittrain end-to-end without thelibdgl_sparse_pytorchload error.pip checkwarning aboutdgl requires torch<=2.4.0is expected on the Blackwell stack and documented in the README.🤖 Generated with Claude Code