Skip to content

[FEATURE] Epic: first-class AMD accelerator tier (Strix Halo / RDNA, Vulkan + ROCm) #696

@Defilan

Description

@Defilan

Feature Description

Make AMD a first-class accelerator tier in LLMKube, alongside NVIDIA CUDA (in-cluster pods) and Apple Metal (off-cluster agent). The immediate driver is a homelab AMD Strix Halo node (Ryzen AI Max+ 395, RDNA 3.5 iGPU "Radeon 8060S" / gfx1151, 128GB LPDDR5X unified memory), but the goal is a general, supported path for AMD GPUs/APUs.

This is an umbrella epic. The work decomposes into a runtime image, node onboarding, a validated example, observability, and a follow-on ROCm tier.

Problem Statement

As a fleet operator running heterogeneous on-prem hardware, I want to add an AMD node and have LLMKube schedule, serve, observe, and route to it with the same CRDs I use for NVIDIA and Metal, so that AMD is a real tier and not a manual bare-metal workaround.

Today the operator hardcodes NVIDIA assumptions on the pod path and ships only CUDA + Metal runtimes. A user can run llama.cpp on an AMD APU on bare metal but has no first-class LLMKube path. AMD APUs are a compelling sovereign-inference tier: a single Strix Halo box exposes ~90GB of unified memory to the iGPU, enough for large models, at low power.

Proposed Solution

Ship AMD support as a Vulkan-first tier (Vulkan/RADV is faster and far more stable than ROCm on gfx1151 today), with ROCm as a follow-on per-model opt-in. All of it runs in-cluster through the same Model / InferenceService path as CUDA; the only new node-level primitive is a render-device plugin.

Decomposition (each its own issue):

Alternatives Considered

  • ROCm-first: rejected for v1. On gfx1151 today ROCm is Preview-tier and crash-prone (dense-model hipGraphInstantiate crashes, KV-cache-spills-to-host bug), needs a pinned kernel, and produces a 6-12GB image. Kept as a follow-on tier.
  • Off-cluster agent (Metal pattern): unnecessary for Linux+Vulkan, which containerizes cleanly. In-cluster reuses the existing CUDA machinery.

Additional Context

  • Related issues: feat(crd): make GPU resource name configurable to support AMD/Vulkan/Intel scheduling #395 (scheduling resource name)
  • Backends compared on gfx1151: Vulkan/RADV vs ROCm/HIP. Vulkan wins generation throughput (25-32% in a 128-run benchmark) and stability across dense + MoE models; both share the same /dev/dri/renderD128 Kubernetes exposure.
  • Similar features: the existing Apple Metal tier is the precedent for "a non-NVIDIA accelerator as a first-class citizen."

Priority

  • High - Would significantly improve my workflow

Willingness to Contribute

  • Yes, I can submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions