feat: GPU keep-on-device + kvikio (GDS) reader + pipeline GPU wiring by FIrgolitsch · Pull Request #112 · linum-uqam/linumpy

FIrgolitsch · 2026-04-30T02:49:10Z

Stacked PR 15/16 — review order: #115 → #97 → #98 → #99 → #100 → #101 → #108 → #106 → #107 → #87 → #116 → #110 → #111 → #40 → #112 → #113

Base: sphinx-config (#40). Retargets to main as upstream PRs merge.

PR — GPU keep-on-device + kvikio (GDS) reader + pipeline GPU wiring

Extends the GPU stack with end-to-end on-device data flow for the OCT reconstruction pipeline and adds a GPUDirect Storage (GDS) reader as a fast path for reading uncompressed zarr arrays straight into device memory.

GPU keep-on-device

New linumpy.gpu.zarr_io with gpu_zarr_context() (uses zarr.config.enable_gpu()) and read_zarr_to_gpu(...) with auto backend selection (kvikio when available, zarr-gpu otherwise).
linumpy.gpu.interpolation: device-preserving resize, affine_transform, map_coordinates.
New linumpy.gpu.interface with a GPU implementation of find_tissue_interface (no-mask path) using cupyx filters.
linumpy.geometry.interface.find_tissue_interface(..., use_gpu=...) and linumpy.mosaic.stacking.find_z_overlap(..., use_gpu=...) now route to GPU when requested.
linum_aip.py and linum_resample_mosaic_grid.py use gpu_zarr_context to keep tiles on-device through the slab loop and writer.
linum_detect_focal_curvature.py: vectorized roll via take_along_axis (xp dispatch) and --use_gpu/--no-use_gpu.
linum_stack_slices_motor.py: --use_gpu/--no-use_gpu plumbed to find_z_overlap.

kvikio (GDS) reader (prototype)

linumpy/gpu/kvikio_zarr.py: GDS reader for raw uncompressed zarr v2 + v3.
- Refuses incompatible arrays (compressed, filtered, non-C order, mismatched endian) with NotImplementedError.
- Uses contiguous scratch buffer for CuFile.pread.
scripts/linum_benchmark_kvikio_zarr.py: benchmark with kvikio and zarr.config.enable_gpu() paths for comparison.
read_zarr_to_gpu falls back to zarr-gpu when kvikio is in compat mode, when arrays aren't GDS-compatible, or on any runtime failure.

Server / build

shell_scripts/server_setup/nvfs_kernel7_patch.sh: nvidia-fs 2.28.4 patch for kernel 7.0; symvers helper now also handles .ko.zst.
pyproject.toml: bump ome-zarr to >=0.16.0 (NGFF 0.5).

Nextflow pipeline GPU wiring

fix_focal_curvature and stack processes pass --use_gpu/--no-use_gpu from params.use_gpu.
nextflow.config: withName: "fix_focal_curvature" gets maxForks = params.use_gpu ? 4 : null.
withName: "resample_mosaic_grid": maxForks = params.use_gpu ? 6 : null (measured ~1 GB GPU mem per fork; IO-gated).
_run_pipelined: prefetch + GPU compute pipeline; periodic free of cupy memory pool.

linumpy/gpu/kvikio_zarr.py: read_zarr_v2_to_gpu() loads an uncompressed zarr v2 array directly into a CuPy array using kvikio.CuFile.pread, bypassing the host bounce buffer when GDS is active. scripts/linum_benchmark_kvikio_zarr.py: benchmarks the GDS path against the conventional zarr.open + numpy + cupy.asarray path; supports synthetic dataset generation. Prototype scope: zarr v2, compressor=None, order=C. Compressed chunks (blosc/lz4) need nvCOMP for on-device decompression and are out of scope here.

CuFile.pread requires a contiguous device buffer; out[slices] is generally non-contiguous. Read each chunk into a single reused chunk-shaped scratch and copy into the output.

… in gpu_zarr_context

…png)

…focal curvature roll

…ted TODOs

…c_grid process Co-authored-by: Copilot <copilot@github.com>

FIrgolitsch and others added 13 commits April 29, 2026 22:46

build(server): add nvidia-fs 2.28.4 patch for kernel 7.0

492012f

patch: nvfs symvers script also handles .ko.zst

7be8dd1

kvikio_zarr: use contiguous scratch buffer for CuFile.pread

64f3c6b

CuFile.pread requires a contiguous device buffer; out[slices] is generally non-contiguous. Read each chunk into a single reused chunk-shaped scratch and copy into the output.

kvikio: support zarr v3 + add zarr.enable_gpu bench path

a59ceaa

kvikio bench: fix ty error and restore +x

d420a4a

deps: bump ome-zarr to >=0.16.0 (NGFF 0.5)

3e2b01b

GPU: device-preserving resize/affine/map_coordinates; resample writer…

0fe3f4b

… in gpu_zarr_context

linum_aip: GPU keep-on-device via gpu_zarr_context (mirror linum_aip_…

0a3096f

…png)

GPU: device-aware find_tissue_interface + find_z_overlap; vectorized …

d62df32

…focal curvature roll

Wire --use_gpu for focal_curvature + stack into nextflow; drop comple…

79fcd7c

…ted TODOs

feat: increase maxForks for GPU parallel resampling in resample_mosai…

811340e

…c_grid process Co-authored-by: Copilot <copilot@github.com>

refactor: optimize GPU memory management in _run_pipelined function

ab73c15

FIrgolitsch force-pushed the sphinx-config branch from 6cb511c to 288f45f Compare April 30, 2026 03:21

FIrgolitsch force-pushed the pr-m-gpu-kvikio branch from 7544bf5 to 95f24bc Compare April 30, 2026 03:21

FIrgolitsch force-pushed the sphinx-config branch from 288f45f to af57ab6 Compare April 30, 2026 03:26

FIrgolitsch force-pushed the pr-m-gpu-kvikio branch from 95f24bc to 14bfd65 Compare April 30, 2026 03:26

FIrgolitsch force-pushed the sphinx-config branch from af57ab6 to 47e639e Compare April 30, 2026 03:51

FIrgolitsch force-pushed the pr-m-gpu-kvikio branch from 14bfd65 to ab73c15 Compare April 30, 2026 03:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: GPU keep-on-device + kvikio (GDS) reader + pipeline GPU wiring#112

feat: GPU keep-on-device + kvikio (GDS) reader + pipeline GPU wiring#112
FIrgolitsch wants to merge 13 commits intosphinx-configfrom
pr-m-gpu-kvikio

FIrgolitsch commented Apr 30, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

FIrgolitsch commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR — GPU keep-on-device + kvikio (GDS) reader + pipeline GPU wiring

GPU keep-on-device

kvikio (GDS) reader (prototype)

Server / build

Nextflow pipeline GPU wiring

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

FIrgolitsch commented Apr 30, 2026 •

edited

Loading