You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A ROCm 7.2 (HIP) runtime tier for AMD nodes, as a per-model opt-in alongside the Vulkan default. This is the "ROCm proper" follow-on under the AMD epic (#696): for the specific models where ROCm beats Vulkan, allow an InferenceService to select the ROCm runtime image.
Problem Statement
As a fleet operator, I want to run the models that perform better under ROCm/HIP on AMD via the same CRDs, so that I am not locked to Vulkan when ROCm wins.
Vulkan is the right v1 default on gfx1151 (faster generation, far more stable), but ROCm 7.x can edge it on some small-dense and MoE workloads with a fully-built HIP stack (rocWMMA + hipBLASLt). Once the Vulkan tier is solid, ROCm is worth offering for those cases.
Proposed Solution
A ROCm llama-server image built from rocm/dev-ubuntu-24.04:7.2-complete with -DGPU_TARGETS=gfx1151 -DGGML_HIP_ROCWMMA_FATTN=ON -DGGML_HIP_NO_VMM=ON -DGGML_HIP_MMQ_MFMA=ON (or COPY prebuilt lemonade-sdk/llamacpp-rocm gfx1151 binaries to avoid long HIP builds).
Per-model/per-InferenceService runtime selection so an operator opts a specific model into ROCm.
Re-benchmark ROCm vs Vulkan per model and document which wins where; track ROCm release notes for when gfx1151 leaves Preview and the hipGraph/KV-cache bugs are fixed.
Alternatives Considered
Make ROCm the default: rejected for now given gfx1151 instability (dense-model crashes, KV-cache-to-host). Vulkan stays the default; ROCm is opt-in.
Feature Description
A ROCm 7.2 (HIP) runtime tier for AMD nodes, as a per-model opt-in alongside the Vulkan default. This is the "ROCm proper" follow-on under the AMD epic (#696): for the specific models where ROCm beats Vulkan, allow an
InferenceServiceto select the ROCm runtime image.Problem Statement
Vulkan is the right v1 default on
gfx1151(faster generation, far more stable), but ROCm 7.x can edge it on some small-dense and MoE workloads with a fully-built HIP stack (rocWMMA + hipBLASLt). Once the Vulkan tier is solid, ROCm is worth offering for those cases.Proposed Solution
llama-serverimage built fromrocm/dev-ubuntu-24.04:7.2-completewith-DGPU_TARGETS=gfx1151 -DGGML_HIP_ROCWMMA_FATTN=ON -DGGML_HIP_NO_VMM=ON -DGGML_HIP_MMQ_MFMA=ON(or COPY prebuiltlemonade-sdk/llamacpp-rocmgfx1151 binaries to avoid long HIP builds)./dev/dri/render*+/dev/kfdvia the device plugin (feat(crd): make GPU resource name configurable to support AMD/Vulkan/Intel scheduling #395); the ROCm node prerequisites extend the node runbook.gfx1151leaves Preview and thehipGraph/KV-cache bugs are fixed.Alternatives Considered
gfx1151instability (dense-model crashes, KV-cache-to-host). Vulkan stays the default; ROCm is opt-in.Additional Context
Priority
Willingness to Contribute