Add support for GDRCOPY#1339
Conversation
8136d62 to
217aa20
Compare
b348981 to
79606de
Compare
There was a problem hiding this comment.
Pull Request Overview
This PR adds support for GDRCOPY functionality to the NVIDIA Kubernetes device plugin, following the same pattern as existing GDS and MOFED support. GDRCOPY enables GPU Direct Remote Direct Memory Access (RDMA) Copy functionality for high-performance GPU-to-GPU communication.
- Adds GDRCOPY configuration option with environment variable support
- Integrates GDRCOPY into the CDI (Container Device Interface) specification generation
- Updates dependencies to support the new GDRCOPY functionality in nvidia-container-toolkit
Reviewed Changes
Copilot reviewed 12 out of 138 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| internal/plugin/server.go | Sets NVIDIA_GDRCOPY environment variable when GDRCOPY is enabled |
| internal/cdi/options.go | Adds WithGdrcopyEnabled option for CDI handler configuration |
| internal/cdi/imex.go | Updates interface signatures for compatibility with nvidia-container-toolkit |
| internal/cdi/cdi.go | Integrates GDRCOPY into CDI spec generation alongside GDS/MOFED |
| internal/cdi/api.go | Removes local cdiSpecGenerator interface, uses upstream interface |
| go.mod | Updates nvidia-container-toolkit and CDI dependencies to support GDRCOPY |
| deployments/helm/nvidia-device-plugin/values.yaml | Adds gdrcopyEnabled helm configuration option |
| deployments/helm/nvidia-device-plugin/templates/daemonset-device-plugin.yml | Adds GDRCOPY_ENABLED environment variable to daemonset |
| cmd/nvidia-device-plugin/plugin-manager.go | Passes GDRCOPY configuration to CDI handler |
| cmd/nvidia-device-plugin/main.go | Adds gdrcopy-enabled CLI flag |
| api/config/v1/flags_test.go | Updates test expectations for new GDRCOPY flag |
| api/config/v1/flags.go | Adds GDRCopyEnabled field to configuration structure |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
cdesiniotis
left a comment
There was a problem hiding this comment.
Some minor comments. LGTM otherwise.
af51db9 to
640469e
Compare
d798941 to
74646b8
Compare
….0-rc.5 Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
74646b8 to
7bb7845
Compare
cdesiniotis
left a comment
There was a problem hiding this comment.
Thanks @elezar LGTM.
This change adds support for GDRCOPY through the same mechansim as for GDS and MOFED. This requires new functionality in the nvidia-container-toolkit to generated CDI specs for gdrcopy devices when using CDI mode.
This depends on NVIDIA/nvidia-container-toolkit#1230