Skip to content

[Bug]: nvidia-cdi-refresh.service excessively broad ordering constraint #1735

@brycehalling

Description

@brycehalling

Describe the bug
In the 1.19.0 release, a new After=multi-user.target requirement was added to nvidia-cdi-refresh.service. This new ordering dependency makes it impossible to start any unit as part of multi-user.target that depends on nvidia-cdi-refresh.service as it creates an ordering cycle in systemd. I expect this is a fairly common use-case where systemd is used to start containers that require the toolkit on boot.

It was added as part of a "fix" for #1611. In that issue, it is notable that there doesn't seem to be any explanation as to why this dependency resolves the problem. The reporter's hastily self-tested workaround was just taken at face value and added to the unit file.

In general, ordering dependencies shipped in a package should be as minimal as possible to make the service work. A user can always add a drop-in for the unit that adds additional ordering constraints if their system requires it, but users cannot create a drop-in to remove ordering constraints that are shipped in the main unit file.

To Reproduce
Update to the 1.19.0 version

► rpm -q nvidia-container-toolkit{,-base}
nvidia-container-toolkit-1.19.0-1.x86_64
nvidia-container-toolkit-base-1.19.0-1.x86_64

Create and enable a service that has an After=nvidia-cdi-refresh.service dependency, and whose Install section specifies WantedBy=multi-user.target. For example:

► sudo systemctl cat podman-compose.service 
# /etc/systemd/system/podman-compose.service
[Unit]
Description=podman-compose in /home/bryce/Software/podman-compose
Requires=nvidia-cdi-refresh.service
After=nvidia-cdi-refresh.service

[Service]
Type=simple
ExecStartPre=-/usr/bin/podman compose -f /home/bryce/Software/podman-compose/compose.yaml up --no-start 
ExecStartPre=/usr/bin/podman compose -f /home/bryce/Software/podman-compose/compose.yaml start
ExecStart=/usr/bin/podman compose -f /home/bryce/Software/podman-compose/compose.yaml wait
ExecStop=/usr/bin/podman compose -f /home/bryce/Software/podman-compose/compose.yaml down -t 30

[Install]
WantedBy=multi-user.target

On boot, systemd will flag an ordering cycle and one of the units will not be started in an effort to break that cycle.

Mar 15 20:06:02 magichead systemd[1]: multi-user.target: Found ordering cycle on podman-compose.service/start
Mar 15 20:06:02 magichead systemd[1]: multi-user.target: Found dependency on nvidia-cdi-refresh.service/start
Mar 15 20:06:02 magichead systemd[1]: multi-user.target: Found dependency on multi-user.target/start
Mar 15 20:06:02 magichead systemd[1]: multi-user.target: Job podman-compose.service/start deleted to break ordering cycle starting with multi-user.target/start

Expected behavior
The shipped unit file should not include dependencies unless there is a real and verifiable reason for them. For example, if this unit requires a file on a particular file system, then adding a RequiresMountsFor is appropriate. If it requires the Nvidia kernel module to be loaded before it starts, then perhaps an ExecStartPre that calls modprobe would be appropriate. However, adding After=multi-user.target is too broad of a dependency to add, even if it happens to handle both of the example requirements above because it also forces the system to wait for a bunch of other stuff that the service does not actually need.

Environment (please provide the following information):

  • nvidia-container-toolkit version: 1.19.0
  • NVIDIA Driver Version: 580.142
  • Host OS: OpenSUSE Leap 15.6
  • Kernel Version: 6.4.0
  • Container Runtime Version: runc 1.3.4
  • CPU Architecture: x86_64
  • GPU Model(s): GTX 1070
  • Output of nvidia-smi
► sudo nvidia-smi
Sun Mar 15 20:26:08 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.142                Driver Version: 580.142        CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1070        On  |   00000000:26:00.0 Off |                  N/A |
|  0%   30C    P8              6W /  151W |      50MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A            1769      G   /usr/bin/X                               38MiB |
+-----------------------------------------------------------------------------------------+
  • Container logs: N/A as my containers do not even attempt to start on boot as a result of this bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIssue/PR to expose/discuss/fix a bugneeds-triageissue or PR has not been assigned a priority-px label

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions