Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/workflows/image.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ jobs:
env:
IMAGE_NAME: ghcr.io/${LOWERCASE_REPO_OWNER}/k8s-device-plugin
VERSION: ${COMMIT_SHORT_SHA}
CVE_UPDATES: "libarchive"
run: |
echo "${VERSION}"
make -f deployments/container/Makefile build
11 changes: 11 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
## Changelog

## Version v0.17.4
- Bump github.com/NVIDIA/go-nvlib from 0.7.2 to 0.7.4
- Bump golang version to 1.23.12
- Ensure that directory volumes have Directory type
- Ignore errors getting device memory using NVML

### Version v0.17.3
- Bump nvidia-container-toolkit to 1.17.8
- Bump github.com/NVIDIA/go-nvml from 0.12.4-1 to 0.12.9-0
- Bump golang version to 1.23.11

### Version v0.17.2
- Update nvidia.com/gpu.product label to include blackwell architectures
- Update documentation to indicate that nvidia.com/gpu.memory label is in MiB instead of MB
Expand Down
46 changes: 23 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ Once you have configured the options above on all the GPU nodes in your
cluster, you can enable GPU support by deploying the following Daemonset:

```shell
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.17.3/deployments/static/nvidia-device-plugin.yml
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.17.4/deployments/static/nvidia-device-plugin.yml
```

**Note:** This is a simple static daemonset meant to demonstrate the basic
Expand Down Expand Up @@ -639,12 +639,12 @@ helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update
```

Then verify that the latest release (`v0.17.3`) of the plugin is available:
Then verify that the latest release (`v0.17.4`) of the plugin is available:

```shell
$ helm search repo nvdp --devel
NAME CHART VERSION APP VERSION DESCRIPTION
nvdp/nvidia-device-plugin 0.17.3 0.17.3 A Helm chart for ...
nvdp/nvidia-device-plugin 0.17.4 0.17.4 A Helm chart for ...
```

Once this repo is updated, you can begin installing packages from it to deploy
Expand All @@ -656,7 +656,7 @@ The most basic installation command without any options is then:
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
--namespace nvidia-device-plugin \
--create-namespace \
--version 0.17.3
--version 0.17.4
```

**Note:** You only need the to pass the `--devel` flag to `helm search repo`
Expand All @@ -665,7 +665,7 @@ version (e.g. `<version>-rc.1`). Full releases will be listed without this.

### Configuring the device plugin's `helm` chart

The `helm` chart for the latest release of the plugin (`v0.17.3`) includes
The `helm` chart for the latest release of the plugin (`v0.17.4`) includes
a number of customizable values.

Prior to `v0.12.0` the most commonly used values were those that had direct
Expand All @@ -675,7 +675,7 @@ case of the original values is then to override an option from the `ConfigMap`
if desired. Both methods are discussed in more detail below.

The full set of values that can be set are found here:
[here](https://github.com/NVIDIA/k8s-device-plugin/blob/v0.17.3/deployments/helm/nvidia-device-plugin/values.yaml).
[here](https://github.com/NVIDIA/k8s-device-plugin/blob/v0.17.4/deployments/helm/nvidia-device-plugin/values.yaml).

#### Passing configuration to the plugin via a `ConfigMap`

Expand Down Expand Up @@ -718,7 +718,7 @@ And deploy the device plugin via helm (pointing it at this config file and givin

```shell
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
--version=0.17.3 \
--version=0.17.4 \
--namespace nvidia-device-plugin \
--create-namespace \
--set-file config.map.config=/tmp/dp-example-config0.yaml
Expand All @@ -743,7 +743,7 @@ kubectl create cm -n nvidia-device-plugin nvidia-plugin-configs \

```shell
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
--version=0.17.3 \
--version=0.17.4 \
--namespace nvidia-device-plugin \
--create-namespace \
--set config.name=nvidia-plugin-configs
Expand Down Expand Up @@ -773,7 +773,7 @@ And redeploy the device plugin via helm (pointing it at both configs with a spec

```shell
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
--version=0.17.3 \
--version=0.17.4 \
--namespace nvidia-device-plugin \
--create-namespace \
--set config.default=config0 \
Expand All @@ -795,7 +795,7 @@ kubectl create cm -n nvidia-device-plugin nvidia-plugin-configs \

```shell
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
--version=0.17.3 \
--version=0.17.4 \
--namespace nvidia-device-plugin \
--create-namespace \
--set config.default=config0 \
Expand Down Expand Up @@ -881,7 +881,7 @@ runtimeClassName:
```

Please take a look in the
[`values.yaml`](https://github.com/NVIDIA/k8s-device-plugin/blob/v0.17.3/deployments/helm/nvidia-device-plugin/values.yaml)
[`values.yaml`](https://github.com/NVIDIA/k8s-device-plugin/blob/v0.17.4/deployments/helm/nvidia-device-plugin/values.yaml)
file to see the full set of overridable parameters for the device plugin.

Examples of setting these options include:
Expand All @@ -891,7 +891,7 @@ Enabling compatibility with the `CPUManager` and running with a request for

```shell
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
--version=0.17.3 \
--version=0.17.4 \
--namespace nvidia-device-plugin \
--create-namespace \
--set compatWithCPUManager=true \
Expand All @@ -903,7 +903,7 @@ Enabling compatibility with the `CPUManager` and the `mixed` `migStrategy`.

```shell
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
--version=0.17.3 \
--version=0.17.4 \
--namespace nvidia-device-plugin \
--create-namespace \
--set compatWithCPUManager=true \
Expand All @@ -922,7 +922,7 @@ To enable it, simply set `gfd.enabled=true` during helm install.

```shell
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
--version=0.17.3 \
--version=0.17.4 \
--namespace nvidia-device-plugin \
--create-namespace \
--set gfd.enabled=true
Expand Down Expand Up @@ -980,13 +980,13 @@ helm repo add nvdp https://nvidia.github.io/k8s-device-plugin
helm repo update
```

Then verify that the latest release (`v0.17.3`) of the plugin is available
Then verify that the latest release (`v0.17.4`) of the plugin is available
(Note that this includes the GFD chart):

```shell
helm search repo nvdp --devel
NAME CHART VERSION APP VERSION DESCRIPTION
nvdp/nvidia-device-plugin 0.17.3 0.17.3 A Helm chart for ...
nvdp/nvidia-device-plugin 0.17.4 0.17.4 A Helm chart for ...
```

Once this repo is updated, you can begin installing packages from it to deploy
Expand All @@ -996,7 +996,7 @@ The most basic installation command without any options is then:

```shell
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
--version 0.17.3 \
--version 0.17.4 \
--namespace gpu-feature-discovery \
--create-namespace \
--set devicePlugin.enabled=false
Expand All @@ -1007,7 +1007,7 @@ the default namespace.

```shell
helm upgrade -i nvdp nvdp/nvidia-device-plugin \
--version=0.17.3 \
--version=0.17.4 \
--set allowDefaultNamespace=true \
--set nfd.enabled=false \
--set migStrategy=mixed \
Expand All @@ -1031,14 +1031,14 @@ Using the default values for the flags:
helm upgrade -i nvdp \
--namespace nvidia-device-plugin \
--create-namespace \
https://nvidia.github.io/k8s-device-plugin/stable/nvidia-device-plugin-0.17.3.tgz
https://nvidia.github.io/k8s-device-plugin/stable/nvidia-device-plugin-0.17.4.tgz
```

## Building and Running Locally

The next sections are focused on building the device plugin locally and running it.
It is intended purely for development and testing, and not required by most users.
It assumes you are pinning to the latest release tag (i.e. `v0.17.3`), but can
It assumes you are pinning to the latest release tag (i.e. `v0.17.4`), but can
easily be modified to work with any available tag or branch.

### With Docker
Expand All @@ -1048,8 +1048,8 @@ easily be modified to work with any available tag or branch.
Option 1, pull the prebuilt image from [Docker Hub](https://hub.docker.com/r/nvidia/k8s-device-plugin):

```shell
docker pull nvcr.io/nvidia/k8s-device-plugin:v0.17.3
docker tag nvcr.io/nvidia/k8s-device-plugin:v0.17.3 nvcr.io/nvidia/k8s-device-plugin:devel
docker pull nvcr.io/nvidia/k8s-device-plugin:v0.17.4
docker tag nvcr.io/nvidia/k8s-device-plugin:v0.17.4 nvcr.io/nvidia/k8s-device-plugin:devel
```

Option 2, build without cloning the repository:
Expand All @@ -1058,7 +1058,7 @@ Option 2, build without cloning the repository:
docker build \
-t nvcr.io/nvidia/k8s-device-plugin:devel \
-f deployments/container/Dockerfile.ubuntu \
https://github.com/NVIDIA/k8s-device-plugin.git#v0.17.3
https://github.com/NVIDIA/k8s-device-plugin.git#v0.17.4
```

Option 3, if you want to modify the code:
Expand Down
7 changes: 7 additions & 0 deletions deployments/container/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,13 @@ RUN rpm -qa | sort -u > /tmp/package-list.minimal
# We define the following image as a base image and remove unneeded packages.
FROM nvcr.io/nvidia/cuda:13.0.0-base-ubi9 AS base

# Upgrade packages here that are required to resolve CVEs
ARG CVE_UPDATES
RUN if [ -n "${CVE_UPDATES}" ]; then \
dnf update -y ${CVE_UPDATES} && \
rm -rf /var/cache/yum/*; \
fi

WORKDIR /cleanup

COPY --from=minimal /tmp/package-names.minimal package-names.minimal
Expand Down
1 change: 1 addition & 0 deletions deployments/container/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ $(IMAGE_TARGETS): image-%:
--build-arg GOLANG_VERSION="$(GOLANG_VERSION)" \
--build-arg VERSION="$(VERSION)" \
--build-arg GIT_COMMIT="$(GIT_COMMIT)" \
--build-arg CVE_UPDATES="$(CVE_UPDATES)" \
$(if $(LABEL_IMAGE_SOURCE),--label "org.opencontainers.image.source=$(LABEL_IMAGE_SOURCE)",) \
-f $(DOCKERFILE) \
$(CURDIR)
Expand Down
4 changes: 2 additions & 2 deletions deployments/helm/nvidia-device-plugin/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,8 @@ apiVersion: v2
name: nvidia-device-plugin
type: application
description: A Helm chart for the nvidia-device-plugin on Kubernetes
version: "0.17.3"
appVersion: "0.17.3"
version: "0.17.4"
appVersion: "0.17.4"
kubeVersion: ">= 1.10.0-0"
home: https://github.com/NVIDIA/k8s-device-plugin

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ metadata:
name: gpu-feature-discovery
labels:
app.kubernetes.io/name: gpu-feature-discovery
app.kubernetes.io/version: 0.17.3
app.kubernetes.io/version: 0.17.4
app.kubernetes.io/part-of: nvidia-gpu
spec:
selector:
Expand All @@ -15,11 +15,11 @@ spec:
metadata:
labels:
app.kubernetes.io/name: gpu-feature-discovery
app.kubernetes.io/version: 0.17.3
app.kubernetes.io/version: 0.17.4
app.kubernetes.io/part-of: nvidia-gpu
spec:
containers:
- image: nvcr.io/nvidia/k8s-device-plugin:v0.17.3
- image: nvcr.io/nvidia/k8s-device-plugin:v0.17.4
name: gpu-feature-discovery
command: ["/usr/bin/gpu-feature-discovery"]
volumeMounts:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ metadata:
name: gpu-feature-discovery
labels:
app.kubernetes.io/name: gpu-feature-discovery
app.kubernetes.io/version: 0.17.3
app.kubernetes.io/version: 0.17.4
app.kubernetes.io/part-of: nvidia-gpu
spec:
selector:
Expand All @@ -15,11 +15,11 @@ spec:
metadata:
labels:
app.kubernetes.io/name: gpu-feature-discovery
app.kubernetes.io/version: 0.17.3
app.kubernetes.io/version: 0.17.4
app.kubernetes.io/part-of: nvidia-gpu
spec:
containers:
- image: nvcr.io/nvidia/k8s-device-plugin:v0.17.3
- image: nvcr.io/nvidia/k8s-device-plugin:v0.17.4
name: gpu-feature-discovery
command: ["/usr/bin/gpu-feature-discovery"]
volumeMounts:
Expand Down
6 changes: 3 additions & 3 deletions deployments/static/gpu-feature-discovery-daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ metadata:
name: gpu-feature-discovery
labels:
app.kubernetes.io/name: gpu-feature-discovery
app.kubernetes.io/version: 0.17.3
app.kubernetes.io/version: 0.17.4
app.kubernetes.io/part-of: nvidia-gpu
spec:
selector:
Expand All @@ -15,11 +15,11 @@ spec:
metadata:
labels:
app.kubernetes.io/name: gpu-feature-discovery
app.kubernetes.io/version: 0.17.3
app.kubernetes.io/version: 0.17.4
app.kubernetes.io/part-of: nvidia-gpu
spec:
containers:
- image: nvcr.io/nvidia/k8s-device-plugin:v0.17.3
- image: nvcr.io/nvidia/k8s-device-plugin:v0.17.4
name: gpu-feature-discovery
command: ["/usr/bin/gpu-feature-discovery"]
volumeMounts:
Expand Down
6 changes: 3 additions & 3 deletions deployments/static/gpu-feature-discovery-job.yaml.template
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,19 @@ metadata:
name: gpu-feature-discovery
labels:
app.kubernetes.io/name: gpu-feature-discovery
app.kubernetes.io/version: 0.17.3
app.kubernetes.io/version: 0.17.4
app.kubernetes.io/part-of: nvidia-gpu
spec:
template:
metadata:
labels:
app.kubernetes.io/name: gpu-feature-discovery
app.kubernetes.io/version: 0.17.3
app.kubernetes.io/version: 0.17.4
app.kubernetes.io/part-of: nvidia-gpu
spec:
nodeName: NODE_NAME
containers:
- image: nvcr.io/nvidia/k8s-device-plugin:v0.17.3
- image: nvcr.io/nvidia/k8s-device-plugin:v0.17.4
name: gpu-feature-discovery
command: ["/usr/bin/gpu-feature-discovery"]
args:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ spec:
# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
priorityClassName: "system-node-critical"
containers:
- image: nvcr.io/nvidia/k8s-device-plugin:v0.17.3
- image: nvcr.io/nvidia/k8s-device-plugin:v0.17.4
name: nvidia-device-plugin-ctr
env:
- name: FAIL_ON_INIT_ERROR
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ spec:
- env:
- name: PASS_DEVICE_SPECS
value: "true"
image: nvcr.io/nvidia/k8s-device-plugin:v0.16.1
image: nvcr.io/nvidia/k8s-device-plugin:v0.17.4
name: nvidia-device-plugin-ctr
securityContext:
privileged: true
Expand Down
2 changes: 1 addition & 1 deletion deployments/static/nvidia-device-plugin.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ spec:
# See https://kubernetes.io/docs/tasks/administer-cluster/guaranteed-scheduling-critical-addon-pods/
priorityClassName: "system-node-critical"
containers:
- image: nvcr.io/nvidia/k8s-device-plugin:v0.17.3
- image: nvcr.io/nvidia/k8s-device-plugin:v0.17.4
name: nvidia-device-plugin-ctr
env:
- name: FAIL_ON_INIT_ERROR
Expand Down
2 changes: 1 addition & 1 deletion versions.mk
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ MODULE := github.com/NVIDIA/$(DRIVER_NAME)

REGISTRY ?= nvcr.io/nvidia

VERSION ?= v0.17.3
VERSION ?= v0.17.4

# vVERSION represents the version with a guaranteed v-prefix
vVERSION := v$(VERSION:v%=%)
Expand Down