Skip to content

Add --gpu flag to flyte start demo#989

Merged
cosmicBboy merged 5 commits intomainfrom
feat/demo-gpu-flag
Apr 21, 2026
Merged

Add --gpu flag to flyte start demo#989
cosmicBboy merged 5 commits intomainfrom
feat/demo-gpu-flag

Conversation

@pingsutw
Copy link
Copy Markdown
Member

@pingsutw pingsutw commented Apr 21, 2026

Summary

Adds --gpu to flyte start demo. When set, the underlying docker run is invoked with --gpus all, giving the demo container access to host NVIDIA GPUs.

flyte start demo --gpu --image ghcr.io/flyteorg/flyte-demo:gpu-latest

Default off — existing non-GPU users are unaffected.

Why

The GPU-capable demo image being added in flyteorg/flyte#7243 configures k3s, containerd, and the NVIDIA device plugin inside the image, but none of that matters if the host GPUs aren't passed through to the container. --gpus all on the docker run is the piece the CLI owns.

Change

  • _start.py — new --gpu click option on the demo command.
  • _demo.py — threads gpu: bool through launch_demo → _launch_demo_rich/plain → _run_step → _run_container, which appends --gpus all when true.

Test plan

Screenshot 2026-04-21 at 12 38 07 AM
  • Installed locally with uv tool install --from . flyte; flyte start demo --help lists the new flag.
  • flyte start demo --gpu --image flyte-demo:gpu-local on an A10G host launched the container with --gpus all, a Flyte task with Resources(gpu=1) ran torch.cuda.is_available() == True.
  • No regression expected on existing users (flag defaults to off).

When set, passes `--gpus all` to the underlying `docker run` so the demo
cluster can use host NVIDIA GPUs. Requires an NVIDIA-enabled host and a
GPU-capable demo image (e.g. flyte-demo:gpu-latest, being added in
flyteorg/flyte). Default off; existing non-GPU users are unaffected.

Usage:
    flyte start demo --gpu --image ghcr.io/flyteorg/flyte-demo:gpu-latest

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/flyte/cli/_demo.py
"--volume",
f"{volume_name}:/var/lib/flyte/storage",
]
if gpu:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we should add some unit tests for these? Also how about an alias devbox

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alias devbox

will change it in a separate PR

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added tests

On Linux bind mounts, the in-container kubeconfig lands root-owned on
the host and kubectl exits non-zero, which surfaces as CalledProcessError
rather than PermissionError, so the existing chown-retry branch never
fired. macOS avoided this because Docker Desktop remaps ownership.
- _run_container: asserts --gpus all is appended only when gpu=True
- _merge_kubeconfig: asserts chown retry fires on both PermissionError
  and CalledProcessError (the Linux bind-mount case this PR fixes), and
  that the second failure propagates
- demo CLI: asserts --gpu is plumbed through to launch_demo
If the user passes --gpu without --image, use
ghcr.io/flyteorg/flyte-demo:gpu-latest rather than the CPU default.
An explicit --image is still respected.
@cosmicBboy cosmicBboy merged commit 14d27e8 into main Apr 21, 2026
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants