[FEATURE] Epic: first-class AMD accelerator tier (Strix Halo / RDNA, Vulkan + ROCm)

## Feature Description

Make AMD a first-class accelerator tier in LLMKube, alongside NVIDIA CUDA (in-cluster pods) and Apple Metal (off-cluster agent). The immediate driver is a homelab AMD Strix Halo node (Ryzen AI Max+ 395, RDNA 3.5 iGPU "Radeon 8060S" / `gfx1151`, 128GB LPDDR5X unified memory), but the goal is a general, supported path for AMD GPUs/APUs.

This is an umbrella epic. The work decomposes into a runtime image, node onboarding, a validated example, observability, and a follow-on ROCm tier.

## Problem Statement

> As a fleet operator running heterogeneous on-prem hardware, I want to add an AMD node and have LLMKube schedule, serve, observe, and route to it with the same CRDs I use for NVIDIA and Metal, so that AMD is a real tier and not a manual bare-metal workaround.

Today the operator hardcodes NVIDIA assumptions on the pod path and ships only CUDA + Metal runtimes. A user can run llama.cpp on an AMD APU on bare metal but has no first-class LLMKube path. AMD APUs are a compelling sovereign-inference tier: a single Strix Halo box exposes ~90GB of unified memory to the iGPU, enough for large models, at low power.

## Proposed Solution

Ship AMD support as a **Vulkan-first** tier (Vulkan/RADV is faster and far more stable than ROCm on `gfx1151` today), with ROCm as a follow-on per-model opt-in. All of it runs in-cluster through the same `Model` / `InferenceService` path as CUDA; the only new node-level primitive is a render-device plugin.

Decomposition (each its own issue):

- [x] #395 **Scheduling foundation** — vendor-driven GPU resource name + `/dev/dri` generic-device-plugin escape hatch. DONE via #709 (community contribution by @joryirving), incl. the `model_amd_vulkan_igpu.yaml` sample.
  - [ ] #710 follow-up: `checkAcceleratorAvailability` ignores the new `gpu.resourceName` override, so `AcceleratorReady` status is inaccurate for escape-hatch users (status-only, not a scheduling gate).
- [x] #697 **AMD Vulkan llama.cpp runtime image** — a Mesa/RADV `llama-server` image in the build matrix (the biggest gap).
- [x] #698 **Strix Halo / AMD K8s node enablement runbook** — kernel, amdgpu, GTT kernel args, generic-device-plugin DaemonSet.
- [ ] #699 **Validated AMD example + benchmark** — an end-to-end InferenceService on the AMD node with GPU offload, plus a benchmark entry.
- [ ] #700 **AMD GPU observability** — node GPU temp/power exporter + llama.cpp `/metrics`, Grafana panel (the DCGM-for-AMD analog).
- [ ] #701 **(follow-on) ROCm 7.2 HIP runtime tier** — per-model opt-in for the models where ROCm wins.

## Alternatives Considered

- **ROCm-first:** rejected for v1. On `gfx1151` today ROCm is Preview-tier and crash-prone (dense-model `hipGraphInstantiate` crashes, KV-cache-spills-to-host bug), needs a pinned kernel, and produces a 6-12GB image. Kept as a follow-on tier.
- **Off-cluster agent (Metal pattern):** unnecessary for Linux+Vulkan, which containerizes cleanly. In-cluster reuses the existing CUDA machinery.

## Additional Context

- Related issues: #395 (scheduling resource name)
- Backends compared on `gfx1151`: Vulkan/RADV vs ROCm/HIP. Vulkan wins generation throughput (25-32% in a 128-run benchmark) and stability across dense + MoE models; both share the same `/dev/dri/renderD128` Kubernetes exposure.
- Similar features: the existing Apple Metal tier is the precedent for "a non-NVIDIA accelerator as a first-class citizen."

## Priority

- [x] High - Would significantly improve my workflow

## Willingness to Contribute

- [x] Yes, I can submit a PR



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Epic: first-class AMD accelerator tier (Strix Halo / RDNA, Vulkan + ROCm) #696

Feature Description

Problem Statement

Proposed Solution

Alternatives Considered

Additional Context

Priority

Willingness to Contribute

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEATURE] Epic: first-class AMD accelerator tier (Strix Halo / RDNA, Vulkan + ROCm) #696

Description

Feature Description

Problem Statement

Proposed Solution

Alternatives Considered

Additional Context

Priority

Willingness to Contribute

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions