[FEATURE] Validated AMD (Vulkan) example InferenceService + benchmark

## Feature Description

A validated, end-to-end example of serving a model on the AMD node (Vulkan tier), plus a benchmark entry so the AMD tier has the same "here is a working manifest and here are the numbers" treatment as CUDA and Metal.

## Problem Statement

> As someone evaluating LLMKube on AMD, I want a known-good example manifest and real tokens/sec, so that I can trust the tier works and size my hardware.

The other tiers have validated examples and benchmark numbers. The AMD tier needs the same proof, and it doubles as the acceptance test that #696 actually landed.

## Proposed Solution

- A `Model` + `InferenceService` example manifest targeting the AMD node (vendor `amd`, Vulkan runtime image, GPU layer offload set for the unified-memory budget).
- A documented end-to-end run: deploy, hit the OpenAI-compatible endpoint, confirm GPU offload, record decode/prefill tokens/sec at a couple of context lengths.
- A benchmark entry alongside the existing CUDA/Metal numbers (a MoE model such as a Qwen3 30B-class is a good showcase for the 90GB unified pool).
- Wire it into the heterogeneous-fleet story: this node becomes a real backend tier the gateway (#661) and router can target.

## Alternatives Considered

- Folding this into the runtime-image issue: kept separate so the image PR stays focused and the example carries the reproducible numbers.

## Additional Context

- Related issues: #696 (epic), AMD Vulkan image, node runbook, #661 (gateway can route to this backend)
- Pairs with the homelab AI Gateway lab-acceptance scenario: the AMD node is a candidate fallback/second tier in a heterogeneous failover demo.

## Priority

- [x] High - Would significantly improve my workflow

## Willingness to Contribute

- [x] Yes, I can submit a PR


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Validated AMD (Vulkan) example InferenceService + benchmark #699

Feature Description

Problem Statement

Proposed Solution

Alternatives Considered

Additional Context

Priority

Willingness to Contribute

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[FEATURE] Validated AMD (Vulkan) example InferenceService + benchmark #699

Description

Feature Description

Problem Statement

Proposed Solution

Alternatives Considered

Additional Context

Priority

Willingness to Contribute

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions