[FEATURE] ROCm 7.2 HIP runtime tier for AMD nodes (per-model opt-in, follow-on)

## Feature Description

A ROCm 7.2 (HIP) runtime tier for AMD nodes, as a per-model opt-in alongside the Vulkan default. This is the "ROCm proper" follow-on under the AMD epic (#696): for the specific models where ROCm beats Vulkan, allow an `InferenceService` to select the ROCm runtime image.

## Problem Statement

> As a fleet operator, I want to run the models that perform better under ROCm/HIP on AMD via the same CRDs, so that I am not locked to Vulkan when ROCm wins.

Vulkan is the right v1 default on `gfx1151` (faster generation, far more stable), but ROCm 7.x can edge it on some small-dense and MoE workloads with a fully-built HIP stack (rocWMMA + hipBLASLt). Once the Vulkan tier is solid, ROCm is worth offering for those cases.

## Proposed Solution

- A ROCm `llama-server` image built from `rocm/dev-ubuntu-24.04:7.2-complete` with `-DGPU_TARGETS=gfx1151 -DGGML_HIP_ROCWMMA_FATTN=ON -DGGML_HIP_NO_VMM=ON -DGGML_HIP_MMQ_MFMA=ON` (or COPY prebuilt `lemonade-sdk/llamacpp-rocm` gfx1151 binaries to avoid long HIP builds).
- Pod mounts `/dev/dri/render*` + `/dev/kfd` via the device plugin (#395); the ROCm node prerequisites extend the node runbook.
- Per-model/per-InferenceService runtime selection so an operator opts a specific model into ROCm.
- Re-benchmark ROCm vs Vulkan per model and document which wins where; track ROCm release notes for when `gfx1151` leaves Preview and the `hipGraph`/KV-cache bugs are fixed.

## Alternatives Considered

- **Make ROCm the default:** rejected for now given `gfx1151` instability (dense-model crashes, KV-cache-to-host). Vulkan stays the default; ROCm is opt-in.

## Additional Context

- Related issues: #696 (epic), Vulkan runtime image, node runbook, #395
- This is explicitly a follow-on: do not start until the Vulkan tier (image + node + example) is working in the lab.

## Priority

- [x] Medium - Nice to have

## Willingness to Contribute

- [x] Yes, I can submit a PR


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] ROCm 7.2 HIP runtime tier for AMD nodes (per-model opt-in, follow-on) #701

Feature Description

Problem Statement

Proposed Solution

Alternatives Considered

Additional Context

Priority

Willingness to Contribute

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEATURE] ROCm 7.2 HIP runtime tier for AMD nodes (per-model opt-in, follow-on) #701

Description

Feature Description

Problem Statement

Proposed Solution

Alternatives Considered

Additional Context

Priority

Willingness to Contribute

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions