Skip to content

Add support for GDRCOPY#1339

Merged
cdesiniotis merged 4 commits into
NVIDIA:mainfrom
elezar:bump-nvidia-container-toolkit-v1.18.0-rc.2
Oct 6, 2025
Merged

Add support for GDRCOPY#1339
cdesiniotis merged 4 commits into
NVIDIA:mainfrom
elezar:bump-nvidia-container-toolkit-v1.18.0-rc.2

Conversation

@elezar
Copy link
Copy Markdown
Member

@elezar elezar commented Aug 1, 2025

This change adds support for GDRCOPY through the same mechansim as for GDS and MOFED. This requires new functionality in the nvidia-container-toolkit to generated CDI specs for gdrcopy devices when using CDI mode.

This depends on NVIDIA/nvidia-container-toolkit#1230

@elezar elezar added this to the v0.18.0 milestone Aug 1, 2025
@elezar elezar force-pushed the bump-nvidia-container-toolkit-v1.18.0-rc.2 branch 2 times, most recently from 8136d62 to 217aa20 Compare August 8, 2025 11:39
@elezar elezar changed the title Bump nvidia container toolkit v1.18.0 rc.2 Add support for GDRCOPY Aug 8, 2025
@elezar elezar marked this pull request as draft August 8, 2025 11:42
@elezar elezar force-pushed the bump-nvidia-container-toolkit-v1.18.0-rc.2 branch 2 times, most recently from b348981 to 79606de Compare August 13, 2025 11:23
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for GDRCOPY functionality to the NVIDIA Kubernetes device plugin, following the same pattern as existing GDS and MOFED support. GDRCOPY enables GPU Direct Remote Direct Memory Access (RDMA) Copy functionality for high-performance GPU-to-GPU communication.

  • Adds GDRCOPY configuration option with environment variable support
  • Integrates GDRCOPY into the CDI (Container Device Interface) specification generation
  • Updates dependencies to support the new GDRCOPY functionality in nvidia-container-toolkit

Reviewed Changes

Copilot reviewed 12 out of 138 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
internal/plugin/server.go Sets NVIDIA_GDRCOPY environment variable when GDRCOPY is enabled
internal/cdi/options.go Adds WithGdrcopyEnabled option for CDI handler configuration
internal/cdi/imex.go Updates interface signatures for compatibility with nvidia-container-toolkit
internal/cdi/cdi.go Integrates GDRCOPY into CDI spec generation alongside GDS/MOFED
internal/cdi/api.go Removes local cdiSpecGenerator interface, uses upstream interface
go.mod Updates nvidia-container-toolkit and CDI dependencies to support GDRCOPY
deployments/helm/nvidia-device-plugin/values.yaml Adds gdrcopyEnabled helm configuration option
deployments/helm/nvidia-device-plugin/templates/daemonset-device-plugin.yml Adds GDRCOPY_ENABLED environment variable to daemonset
cmd/nvidia-device-plugin/plugin-manager.go Passes GDRCOPY configuration to CDI handler
cmd/nvidia-device-plugin/main.go Adds gdrcopy-enabled CLI flag
api/config/v1/flags_test.go Updates test expectations for new GDRCOPY flag
api/config/v1/flags.go Adds GDRCopyEnabled field to configuration structure

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.

Comment thread internal/cdi/options.go Outdated
Comment thread internal/cdi/options.go Outdated
cdesiniotis
cdesiniotis previously approved these changes Aug 14, 2025
Copy link
Copy Markdown
Contributor

@cdesiniotis cdesiniotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments. LGTM otherwise.

Comment thread cmd/nvidia-device-plugin/main.go Outdated
Comment thread internal/cdi/options.go Outdated
@elezar elezar force-pushed the bump-nvidia-container-toolkit-v1.18.0-rc.2 branch 3 times, most recently from af51db9 to 640469e Compare September 25, 2025 20:26
@elezar elezar marked this pull request as ready for review September 26, 2025 20:00
@elezar elezar requested a review from cdesiniotis September 26, 2025 20:00
@elezar elezar force-pushed the bump-nvidia-container-toolkit-v1.18.0-rc.2 branch 2 times, most recently from d798941 to 74646b8 Compare September 29, 2025 08:30
@elezar elezar dismissed cdesiniotis’s stale review October 6, 2025 10:12

Significant changes.

elezar added 4 commits October 6, 2025 12:13
….0-rc.5

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar elezar force-pushed the bump-nvidia-container-toolkit-v1.18.0-rc.2 branch from 74646b8 to 7bb7845 Compare October 6, 2025 10:13
Copy link
Copy Markdown
Contributor

@cdesiniotis cdesiniotis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @elezar LGTM.

@cdesiniotis cdesiniotis merged commit dd6edf8 into NVIDIA:main Oct 6, 2025
9 checks passed
@elezar elezar deleted the bump-nvidia-container-toolkit-v1.18.0-rc.2 branch November 21, 2025 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants