Summary
The C++ build/compile legs of this module are validated on GitHub-hosted runners (ubuntu-24.04, macos-15, windows-2022), but FFT correctness cannot be exercised there. The hosted runners have no real GPU; the only OpenCL ICD available is pocl (CPU), whose VkFFT kernel results diverge from real-GPU output and fail baseline image comparison. As a result, hosted CI runs only the two lightweight checks (VkFFTBackendKWStyleTest, VkFFTBackendInDoxygenGroup) and skips every functional FFT test.
Full validation requires self-hosted GPU-enabled runners.
Current state
Two workflows already target self-hosted GPU runners but have no online runner to claim them, so they queued indefinitely (12h+) on every push/PR until they were gated:
| Workflow |
runs-on labels |
Needs |
test-gpu.yml |
[self-hosted, gpu] |
GPU + OpenCL/CUDA/HIP/Level-Zero/Metal driver |
test-notebooks.yml |
[self-hosted, notebook-gpu] |
GPU + Jupyter/nbmake + clinfo device |
These are now gated behind workflow_dispatch or a gpu-ci PR label so they no longer accumulate stuck queued jobs. They will only do useful work once a matching runner is online.
What's needed
-
Register self-hosted runner(s) for InsightSoftwareConsortium/ITKVkFFTBackend (or an org runner group this repo can use) with the labels gpu and notebook-gpu.
-
The runner host should expose at least one real GPU backend so VkFFT can be exercised end-to-end. Coverage goal across the VKFFT_BACKEND modes:
VKFFT_BACKEND |
Backend |
Requires |
| 1 |
CUDA |
NVIDIA GPU + CUDA toolkit |
| 2 |
HIP |
AMD GPU + ROCm |
| 3 |
OpenCL |
any GPU + real (non-pocl) OpenCL ICD |
| 4 |
Level Zero |
Intel GPU + oneAPI Level Zero |
| 5 |
Metal |
Apple Silicon + macOS |
A single NVIDIA host covers backends 1 and 3; full matrix coverage needs additional hosts.
-
Once a runner is live, functional FFT tests (currently skipped on hosted CI) and the notebook tests will run via the existing gated workflows — add the gpu-ci label to a PR or use the Actions "Run workflow" button.
Why this matters
Without GPU CI, every backend change (e.g. the Level Zero backend added in #73, the OpenCL multi-ICD fix, CUDA 13 API updates) is only smoke-tested for compilation. Regressions in actual FFT output can land undetected because no automated job computes a transform on real hardware and compares against the baseline images.
Summary
The C++ build/compile legs of this module are validated on GitHub-hosted runners (
ubuntu-24.04,macos-15,windows-2022), but FFT correctness cannot be exercised there. The hosted runners have no real GPU; the only OpenCL ICD available is pocl (CPU), whose VkFFT kernel results diverge from real-GPU output and fail baseline image comparison. As a result, hosted CI runs only the two lightweight checks (VkFFTBackendKWStyleTest,VkFFTBackendInDoxygenGroup) and skips every functional FFT test.Full validation requires self-hosted GPU-enabled runners.
Current state
Two workflows already target self-hosted GPU runners but have no online runner to claim them, so they queued indefinitely (12h+) on every push/PR until they were gated:
runs-onlabelstest-gpu.yml[self-hosted, gpu]test-notebooks.yml[self-hosted, notebook-gpu]clinfodeviceThese are now gated behind
workflow_dispatchor agpu-ciPR label so they no longer accumulate stuck queued jobs. They will only do useful work once a matching runner is online.What's needed
Register self-hosted runner(s) for
InsightSoftwareConsortium/ITKVkFFTBackend(or an org runner group this repo can use) with the labelsgpuandnotebook-gpu.The runner host should expose at least one real GPU backend so VkFFT can be exercised end-to-end. Coverage goal across the
VKFFT_BACKENDmodes:VKFFT_BACKENDA single NVIDIA host covers backends 1 and 3; full matrix coverage needs additional hosts.
Once a runner is live, functional FFT tests (currently skipped on hosted CI) and the notebook tests will run via the existing gated workflows — add the
gpu-cilabel to a PR or use the Actions "Run workflow" button.Why this matters
Without GPU CI, every backend change (e.g. the Level Zero backend added in #73, the OpenCL multi-ICD fix, CUDA 13 API updates) is only smoke-tested for compilation. Regressions in actual FFT output can land undetected because no automated job computes a transform on real hardware and compares against the baseline images.