Skip to content

COMP: Recover lost CI modernization from #74 (ITK 5.4.6, OpenCL ICD/headers 2025.07.22, OpenCL 3.0)#79

Merged
hjmjohnson merged 7 commits into
InsightSoftwareConsortium:mainfrom
hjmjohnson:recover/ci-itk-5.4.6-onto-main
Jun 8, 2026
Merged

COMP: Recover lost CI modernization from #74 (ITK 5.4.6, OpenCL ICD/headers 2025.07.22, OpenCL 3.0)#79
hjmjohnson merged 7 commits into
InsightSoftwareConsortium:mainfrom
hjmjohnson:recover/ci-itk-5.4.6-onto-main

Conversation

@hjmjohnson

Copy link
Copy Markdown
Member

Recovers the CI modernization from #74, which was marked MERGED but never actually reached main. Rebased onto current main (preserving the #76 direction-axis test) so the work is not lost.

What happened to #74

#74 (ci/itk-5.4.6) was opened against #73's branch (enh/vkfft-backend-5-metal), not main, and was merged into that branch on 2026-06-03 (merge commit 9936e83).

Before #73 itself merged, its branch was force-pushed backward (9936e8330b8bbe), which dropped #74's seven commits from the branch tip. When #73 subsequently merged into main, it carried the 30b8bbe state — without any of #74's changes.

The result: GitHub still shows #74 as "MERGED" (it was merged into the branch), but none of its commits are ancestors of main. Verify:

# 9936e83 (the #74 merge) is NOT in main:
git merge-base --is-ancestor 9936e83abb92a27b5ca174add6c5f0af415f0b01 origin/main; echo $?   # -> 1
# main still has the pre-#74 tags:
git show origin/main:.github/workflows/build-test-package.yml | grep -E 'itk-git-tag|opencl-.*-git-tag'
#   itk-git-tag: "v5.3.0"
#   opencl-icd-loader-git-tag: "v2021.04.29"
#   opencl-headers-git-tag:    "v2021.04.29"

The branch ref was deleted, but the commits remained reachable via refs/pull/74/head and were recovered from there.

What this PR restores (the seven #74 commits, rebased onto main)
  • ITK v5.4.6 (itk-git-tag / itk-wheel-tag), runners on macos-15
  • OpenCL ICD-Loader and Headers v2025.07.22; target OpenCL 3.0 (CL_TARGET_OPENCL_VERSION=300)
  • macOS PoCL via Homebrew; Linux PoCL via conda-forge (with Anaconda ToS accept)
  • Device-less OpenCL hardening (itkVkCommon.cxx, itk-module-init.cmake, itkVkDefinitions.h)
  • Compile-only build matrix for CUDA / Level Zero / Metal backends
  • Self-hosted GPU workflows gated behind workflow_dispatch / the gpu-ci label
  • Minimal-ITK-modules build to speed CI; a PoCL-safe FFT round-trip subset on the hosted OpenCL leg

The rebase preserved #76's direction-axis test (itkVkComplexToComplex1DFFTImageFilterSizesTest.cxx) unchanged — #74 never modified that file.

…_BACKEND

Default VKFFT_BACKEND to OpenCL in itkVkDefinitions.h when it is not
defined on the command line, so castxml wrapping invocations do not take
vkFFT.h's Vulkan branch and fail on a missing <vulkan/vulkan.h>.

Skip OpenCL platforms that report no devices: querying device count with
num_entries=0 on such a platform (e.g. Apple's deprecated OpenCL
framework on macOS 15) returns CL_DEVICE_NOT_FOUND, which must be
treated as 'try the next platform' rather than a hard failure.
itk-module-init.cmake searched only default system paths for ze_loader
and the headers, missing a loader installed to a non-system prefix.
Honor LEVEL_ZERO_ROOT/CMPLR_ROOT with include and lib path suffixes,
matching the top-level CMakeLists.txt.

Add the level_zero/ subdirectory to the include path: VkFFT includes
<ze_api.h> bare while this module uses <level_zero/ze_api.h>, so both
the prefix and its level_zero/ subdir must be reachable.
Update the build/test/package workflow to ITK v5.4.6, the macos-15
runner, OpenCL ICD loader/headers v2025.07.22, Python 3.9, and current
actions/* versions. Build only the ITK modules VkFFTBackend depends on
(ITK_BUILD_DEFAULT_MODULES=OFF plus the declared DEPENDS/COMPILE_DEPENDS/
TEST_DEPENDS) via a shared itk-minimal-modules variable. Point the
freshly built ICD loader at conda's pocl vendor file and restrict the
hosted ctest run to the non-FFT smoke tests (pocl on CPU diverges from
real-GPU VkFFT kernels). Disable the Python wheel jobs pending an
ITKPythonBuilds C++17 wrapping fix. Resolve the dockcross OpenCL loader
library by glob so the wheel script tracks the ICD loader version.
Test GPU and Notebook tests request [self-hosted, gpu] and
[self-hosted, notebook-gpu] runners that are not online for this repo,
so they queued indefinitely on every push/PR. Trigger them only on
workflow_dispatch or when a PR carries the 'gpu-ci' label.
…ackends

The hosted build-cxx leg compiles only VKFFT_BACKEND=3 (OpenCL). Add a
build-backend job that configures and compiles each remaining
module-supported backend (no GPU needed to compile):

  - CUDA (1)       ubuntu-24.04, CUDA toolkit + libcufft + driver stub
  - Level Zero (4) ubuntu-24.04, ze_loader built from source, installed
                   to /usr so VkFFT's hardcoded header/library search and
                   bare ze_loader link resolve
  - Metal (5)      macos-15, builds with BUILD_TESTING=ON and runs ctest;
                   the Apple Silicon runner executes the FFTs on its GPU

ITK is built with only the module's dependency set. HIP (2) is omitted:
the module has no VKFFT_BACKEND=2 CMake branch.
Make CL_TARGET_OPENCL_VERSION a cache variable defaulting to 300 and use
it in the compile definition, so the value the CI workflows pass is
honored instead of a hardcoded 120. Bump the build-test-package and
test-gpu workflows to opencl-version 300. Verified the OpenCL backend
compiles and links with no new deprecation warnings.
pocl (CPU OpenCL) computes VkFFT's size-19 Bluestein inverse incorrectly,
so the hosted OpenCL legs ran only the lint smoke tests. Give the round-trip
tests (ForwardInverse, ForwardInverse1D, HalfHermitian) an optional maxSize
argument (default 20, unchanged for GPU runners) and register PoclSafe
variants capped at size 16 — radix-2/3/5/7 plus Bluestein primes 11/13,
clear of pocl's prime-19 weakness.

The ubuntu leg (conda pocl, verified) now selects 'VkFFTBackend|PoclSafe',
adding genuine FFT correctness coverage on pocl. windows (no OpenCL ICD) and
macos (pocl unverified locally) stay lint-only. Full-size and baseline FFT
correctness still run on the GPU runner.

Verified locally: the capped variants pass under pocl 7.1 (conda-forge) and
on a real GPU; the full-size variants still pass on the GPU and continue to
fail under pocl at size 19.
@hjmjohnson hjmjohnson marked this pull request as ready for review June 8, 2026 13:11
@hjmjohnson hjmjohnson merged commit 70f692d into InsightSoftwareConsortium:main Jun 8, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants