-
Notifications
You must be signed in to change notification settings - Fork 492
Description
Describe the bug
In the 1.19.0 release, a new After=multi-user.target requirement was added to nvidia-cdi-refresh.service. This new ordering dependency makes it impossible to start any unit as part of multi-user.target that depends on nvidia-cdi-refresh.service as it creates an ordering cycle in systemd. I expect this is a fairly common use-case where systemd is used to start containers that require the toolkit on boot.
It was added as part of a "fix" for #1611. In that issue, it is notable that there doesn't seem to be any explanation as to why this dependency resolves the problem. The reporter's hastily self-tested workaround was just taken at face value and added to the unit file.
In general, ordering dependencies shipped in a package should be as minimal as possible to make the service work. A user can always add a drop-in for the unit that adds additional ordering constraints if their system requires it, but users cannot create a drop-in to remove ordering constraints that are shipped in the main unit file.
To Reproduce
Update to the 1.19.0 version
► rpm -q nvidia-container-toolkit{,-base}
nvidia-container-toolkit-1.19.0-1.x86_64
nvidia-container-toolkit-base-1.19.0-1.x86_64
Create and enable a service that has an After=nvidia-cdi-refresh.service dependency, and whose Install section specifies WantedBy=multi-user.target. For example:
► sudo systemctl cat podman-compose.service
# /etc/systemd/system/podman-compose.service
[Unit]
Description=podman-compose in /home/bryce/Software/podman-compose
Requires=nvidia-cdi-refresh.service
After=nvidia-cdi-refresh.service
[Service]
Type=simple
ExecStartPre=-/usr/bin/podman compose -f /home/bryce/Software/podman-compose/compose.yaml up --no-start
ExecStartPre=/usr/bin/podman compose -f /home/bryce/Software/podman-compose/compose.yaml start
ExecStart=/usr/bin/podman compose -f /home/bryce/Software/podman-compose/compose.yaml wait
ExecStop=/usr/bin/podman compose -f /home/bryce/Software/podman-compose/compose.yaml down -t 30
[Install]
WantedBy=multi-user.target
On boot, systemd will flag an ordering cycle and one of the units will not be started in an effort to break that cycle.
Mar 15 20:06:02 magichead systemd[1]: multi-user.target: Found ordering cycle on podman-compose.service/start
Mar 15 20:06:02 magichead systemd[1]: multi-user.target: Found dependency on nvidia-cdi-refresh.service/start
Mar 15 20:06:02 magichead systemd[1]: multi-user.target: Found dependency on multi-user.target/start
Mar 15 20:06:02 magichead systemd[1]: multi-user.target: Job podman-compose.service/start deleted to break ordering cycle starting with multi-user.target/start
Expected behavior
The shipped unit file should not include dependencies unless there is a real and verifiable reason for them. For example, if this unit requires a file on a particular file system, then adding a RequiresMountsFor is appropriate. If it requires the Nvidia kernel module to be loaded before it starts, then perhaps an ExecStartPre that calls modprobe would be appropriate. However, adding After=multi-user.target is too broad of a dependency to add, even if it happens to handle both of the example requirements above because it also forces the system to wait for a bunch of other stuff that the service does not actually need.
Environment (please provide the following information):
nvidia-container-toolkitversion:1.19.0- NVIDIA Driver Version:
580.142 - Host OS: OpenSUSE Leap 15.6
- Kernel Version:
6.4.0 - Container Runtime Version:
runc 1.3.4 - CPU Architecture:
x86_64 - GPU Model(s): GTX 1070
- Output of
nvidia-smi
► sudo nvidia-smi
Sun Mar 15 20:26:08 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.142 Driver Version: 580.142 CUDA Version: 13.0 |
+-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce GTX 1070 On | 00000000:26:00.0 Off | N/A |
| 0% 30C P8 6W / 151W | 50MiB / 8192MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 1769 G /usr/bin/X 38MiB |
+-----------------------------------------------------------------------------------------+
- Container logs: N/A as my containers do not even attempt to start on boot as a result of this bug.