nvidia-docker: unbreak the runc symlink#280087
Conversation
jmbaur
left a comment
There was a problem hiding this comment.
LGTM. In a future change, I would like to start using Nvidia's tooling for CDI (https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html) that AFAIK would allow us to ditch using nvidia-container-runtime stuff altogether and just generate a single yaml file on bootup that describes the hardware, then the container runtime knows how to read that natively. Podman supports it well and it will be in docker v25 which should be released soon. I've gotten it working for jetson devices, but have yet to try it out on x86_64.
|
@jmbaur that sounds great, do ping me in the PR! |
|
FYI @jmbaur @SomeoneSerge I'm also trying to get this to work. Adjacent to #280184 I sent #278969 which still works with the |
(cherry picked from commit 1e1eb8b)
Description of changes
"${nvidia-docker}/bin/nvidia-container-runtime"was broken referring to a non-existent file; we should probably delete and deprecatenvidia-dockerNote that nvidia-docker had been deprecated upstream for a long time, we should actually remove it. Our libnvidia-docker and nvidia-container-toolkit are terribly outdated too. I started preparing the updates in #279235 but I felt uncomfortable moving forward with them because it turned out that
docker run --gpus(anddokcer run --runtime nvidia) is broken even master (at least on my host), both with and without the updates.All that stuff is out of scope for this PR, this only includes the obvious fixes and a small refactoring that helps with debugging:
nix-update nvidiaCtkPackages.nvidia-container-toolkit-docker, etcI can look into updates later, but first I'd like to learn why
--gpus allbroke in the first place.I'm not exposing
nvidia-container-toolkit-*as top-level attributes and do not recurse intonvidiaCtkPackagesbecause there's no need to yet (keeping the interface smaller)I also don't see any reason to keep using the symlinkJoins, but this doesn't matter because they likely go away together with nvidia-docker?
Seeing how this PR is meant to be small, I intend to merge it as soon as Ofborg allows, unless anybody objects.
CC @jmbaur @GTrunSec @cpcloud
Also CC @kirillrdy @siraben @aaronjheng based on git-blame.
Things done
nix.conf? (See Nix manual)sandbox = relaxedsandbox = truenix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage./result/bin/)Add a 👍 reaction to pull requests you find important.