Conversation
Adds a GPU variant of the demo-bundled image so users with NVIDIA GPUs can run `flyte start demo --image ghcr.io/flyteorg/flyte-demo:gpu-latest` and submit tasks with `Resources(gpu=1)`. - Dockerfile.gpu stages NVIDIA Container Toolkit v1.19.x binaries and their shared libs into the rancher/k3s final image. Libs are copied into /usr/lib/<triple>/ because the nvidia-ctk OCI hook runs without inheriting LD_LIBRARY_PATH. A statically-linked /sbin/ldconfig is also staged (rancher/k3s ships none) because the toolkit's update-ldcache hook bind-mounts it into workload pods. - containerd-config.toml.tmpl sets nvidia as the default containerd runtime. Pods requesting nvidia.com/gpu get GPUs without needing runtimeClassName in their spec; non-GPU pods are unaffected (nvidia-container-runtime is a passthrough when no GPU is requested). - nvidia-device-plugin.yaml installs a RuntimeClass and the NVIDIA k8s-device-plugin DaemonSet so nvidia.com/gpu is advertised on the node. Auto-applied by k3s at startup. - Makefile gains a build-gpu target producing flyte-demo:gpu-latest. - CI gains a build-and-push step publishing gpu-latest, gpu-nightly, and gpu-<sha> tags to both flyte-demo and flyte-sandbox-v2. The GPU plumbing was verified end-to-end with a layered test image on an A10G (torch 2.11.0+cu130 reported cuda_available=True). The full multi-stage Dockerfile.gpu has not been built locally; the CI run here is the first end-to-end test of the production Dockerfile and may need fixup iterations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes the duplicated builder/bootstrap/pg-cache stages and final-stage setup by making Dockerfile.gpu a thin layer on top of flyte-demo:latest (parameterized via ARG BASE_IMAGE). CI now builds the CPU image first and passes its sha-tag in as BASE_IMAGE to the GPU build. - Dockerfile.gpu shrinks from ~165 to ~75 lines; inherits flyte-binary, embedded postgres, staging manifests, and k3d entrypoint from the base image unchanged. - Makefile build-gpu target now depends on build (not the full prereq chain) and passes BASE_IMAGE=flyte-demo:latest. - CI gates the GPU build on push/workflow_dispatch since PR builds don't push the CPU image to ghcr.io (nothing to pull for BASE_IMAGE). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3 tasks
Drops the `if:` gate and conditions `push:` on the same expression the CPU build uses, so both steps always build and only push on v2-branch pushes or workflow_dispatch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
On pull_request events the CPU build step runs with push=false, so the GPU build's FROM ghcr.io/.../flyte-demo:sha-<sha> fails to resolve (image not found in the registry). Fix by producing an OCI archive of the CPU image locally and passing it to the GPU build as a named build context (build-contexts: base=oci-layout://...) with BASE_IMAGE=base. Registry push happens in a separate step that only runs on push / workflow_dispatch, so PR builds no longer need ghcr credentials for the GPU step.
Add GHA cache (type=gha) to the three docker/build-push-action steps in build-and-push-demo-bundled-image. CPU archive and CPU push share the demo-cpu scope so the push reuses layers from the archive build; GPU gets its own demo-gpu scope.
The oci-layout:// build-context source requires Dockerfile frontend 1.5+. CI was failing with 'unsupported context source oci-layout for base'. Signed-off-by: Kevin Su <pingsutw@apache.org>
cosmicBboy
pushed a commit
to flyteorg/flyte-sdk
that referenced
this pull request
Apr 21, 2026
## Summary Adds `--gpu` to `flyte start demo`. When set, the underlying `docker run` is invoked with `--gpus all`, giving the demo container access to host NVIDIA GPUs. ```bash flyte start demo --gpu --image ghcr.io/flyteorg/flyte-demo:gpu-latest ``` Default off — existing non-GPU users are unaffected. ## Why The GPU-capable demo image being added in [flyteorg/flyte#7243](flyteorg/flyte#7243) configures k3s, containerd, and the NVIDIA device plugin inside the image, but none of that matters if the host GPUs aren't passed through to the container. `--gpus all` on the `docker run` is the piece the CLI owns. ## Change - `_start.py` — new `--gpu` click option on the `demo` command. - `_demo.py` — threads `gpu: bool` through `launch_demo → _launch_demo_rich/plain → _run_step → _run_container`, which appends `--gpus all` when true. ## Test plan <img width="1879" height="769" alt="Screenshot 2026-04-21 at 12 38 07 AM" src="https://github.com/user-attachments/assets/44a92867-f1dd-4c89-b96d-3c5af63389f4" /> - [x] Installed locally with `uv tool install --from . flyte`; `flyte start demo --help` lists the new flag. - [x] `flyte start demo --gpu --image flyte-demo:gpu-local` on an A10G host launched the container with `--gpus all`, a Flyte task with `Resources(gpu=1)` ran `torch.cuda.is_available() == True`. - [x] No regression expected on existing users (flag defaults to off). --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
cosmicBboy
approved these changes
Apr 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a GPU variant of the
demo-bundledimage so users on an NVIDIA-enabled host can run:and submit Flyte tasks with
Resources(gpu=1)— no PodTemplate, noruntimeClassNameneeded — and get a GPU.What's in the image
Dockerfile.gpu— stages NVIDIA Container Toolkit v1.19.x (nvidia-ctk,nvidia-container-runtime,libnvidia-container) into therancher/k3sfinal image. Two subtle prereqs the OCI hook needs:/usr/lib/<arch-triple>/— thenvidia-ctkOCI hook runs without inheritingLD_LIBRARY_PATH./sbin/ldconfigfromdebian:bookworm-slim— rancher/k3s ships none, and the toolkit's update-ldcache hook bind-mounts it into workload pods.containerd-config.toml.tmpl— setsdefault_runtime_name = "nvidia". GPU pods get GPUs automatically; non-GPU pods are unaffected (nvidia-container-runtime is a passthrough).nvidia-device-plugin.yaml—RuntimeClass nvidia+DaemonSet(nvcr.io/nvidia/k8s-device-plugin:v0.17.0) auto-applied by k3s on startup.Makefile— newbuild-gputarget.gpu-latest,gpu-nightly, andgpu-<sha>tags to bothflyte-demoandflyte-sandbox-v2.Companion change
A matching
--gpuflag onflyte start demo(adds--gpus alltodocker run) will land in flyteorg/flyte-sdk. Image is useful without it via manualdocker run --gpus all ….Test plan
flyte-demo:nightly) verified on an A10G: k3s auto-registers nvidia runtime, node advertisesnvidia.com/gpu: 1, Flyte task withResources(gpu=1)ran torch 2.11.0+cu130 reportingcuda_available=True, device_name="NVIDIA A10G".Dockerfile.gpuhas NOT been built locally — this CI run is the first build. Expect potential fixup iterations.Follow-ups
Resources(gpu="A10G:1")) require GPU Feature Discovery to label the node.migStrategy.main