CI: Self-hosted GPU runners needed to fully test VkFFT backends

## Summary

The C++ build/compile legs of this module are validated on GitHub-hosted runners (`ubuntu-24.04`, `macos-15`, `windows-2022`), but **FFT correctness cannot be exercised there**. The hosted runners have no real GPU; the only OpenCL ICD available is pocl (CPU), whose VkFFT kernel results diverge from real-GPU output and fail baseline image comparison. As a result, hosted CI runs only the two lightweight checks (`VkFFTBackendKWStyleTest`, `VkFFTBackendInDoxygenGroup`) and skips every functional FFT test.

Full validation requires **self-hosted GPU-enabled runners**.

## Current state

Two workflows already target self-hosted GPU runners but have no online runner to claim them, so they queued indefinitely (12h+) on every push/PR until they were gated:

| Workflow | `runs-on` labels | Needs |
|---|---|---|
| `test-gpu.yml` | `[self-hosted, gpu]` | GPU + OpenCL/CUDA/HIP/Level-Zero/Metal driver |
| `test-notebooks.yml` | `[self-hosted, notebook-gpu]` | GPU + Jupyter/nbmake + `clinfo` device |

These are now gated behind `workflow_dispatch` or a `gpu-ci` PR label so they no longer accumulate stuck queued jobs. They will only do useful work once a matching runner is online.

## What's needed

1. **Register self-hosted runner(s)** for `InsightSoftwareConsortium/ITKVkFFTBackend` (or an org runner group this repo can use) with the labels `gpu` and `notebook-gpu`.
2. The runner host should expose at least one real GPU backend so VkFFT can be exercised end-to-end. Coverage goal across the `VKFFT_BACKEND` modes:

   | `VKFFT_BACKEND` | Backend | Requires |
   |---|---|---|
   | 1 | CUDA | NVIDIA GPU + CUDA toolkit |
   | 2 | HIP | AMD GPU + ROCm |
   | 3 | OpenCL | any GPU + real (non-pocl) OpenCL ICD |
   | 4 | Level Zero | Intel GPU + oneAPI Level Zero |
   | 5 | Metal | Apple Silicon + macOS |

   A single NVIDIA host covers backends 1 and 3; full matrix coverage needs additional hosts.
3. Once a runner is live, functional FFT tests (currently skipped on hosted CI) and the notebook tests will run via the existing gated workflows — add the `gpu-ci` label to a PR or use the Actions "Run workflow" button.

## Why this matters

Without GPU CI, every backend change (e.g. the Level Zero backend added in #73, the OpenCL multi-ICD fix, CUDA 13 API updates) is only smoke-tested for compilation. Regressions in actual FFT output can land undetected because no automated job computes a transform on real hardware and compares against the baseline images.

Workflow	`runs-on` labels	Needs
`test-gpu.yml`	`[self-hosted, gpu]`	GPU + OpenCL/CUDA/HIP/Level-Zero/Metal driver
`test-notebooks.yml`	`[self-hosted, notebook-gpu]`	GPU + Jupyter/nbmake + `clinfo` device

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: Self-hosted GPU runners needed to fully test VkFFT backends #75

Summary

Current state

What's needed

Why this matters

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

`VKFFT_BACKEND`	Backend	Requires
1	CUDA	NVIDIA GPU + CUDA toolkit
2	HIP	AMD GPU + ROCm
3	OpenCL	any GPU + real (non-pocl) OpenCL ICD
4	Level Zero	Intel GPU + oneAPI Level Zero
5	Metal	Apple Silicon + macOS

CI: Self-hosted GPU runners needed to fully test VkFFT backends #75

Description

Summary

Current state

What's needed

Why this matters

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions