feat: GPU keep-on-device + kvikio (GDS) reader + pipeline GPU wiring#112
Open
FIrgolitsch wants to merge 13 commits intosphinx-configfrom
Open
feat: GPU keep-on-device + kvikio (GDS) reader + pipeline GPU wiring#112FIrgolitsch wants to merge 13 commits intosphinx-configfrom
FIrgolitsch wants to merge 13 commits intosphinx-configfrom
Conversation
linumpy/gpu/kvikio_zarr.py: read_zarr_v2_to_gpu() loads an uncompressed zarr v2 array directly into a CuPy array using kvikio.CuFile.pread, bypassing the host bounce buffer when GDS is active. scripts/linum_benchmark_kvikio_zarr.py: benchmarks the GDS path against the conventional zarr.open + numpy + cupy.asarray path; supports synthetic dataset generation. Prototype scope: zarr v2, compressor=None, order=C. Compressed chunks (blosc/lz4) need nvCOMP for on-device decompression and are out of scope here.
CuFile.pread requires a contiguous device buffer; out[slices] is generally non-contiguous. Read each chunk into a single reused chunk-shaped scratch and copy into the output.
… in gpu_zarr_context
…focal curvature roll
…c_grid process Co-authored-by: Copilot <copilot@github.com>
This was referenced Apr 30, 2026
6cb511c to
288f45f
Compare
7544bf5 to
95f24bc
Compare
288f45f to
af57ab6
Compare
95f24bc to
14bfd65
Compare
af57ab6 to
47e639e
Compare
14bfd65 to
ab73c15
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR — GPU keep-on-device + kvikio (GDS) reader + pipeline GPU wiring
Extends the GPU stack with end-to-end on-device data flow for the OCT reconstruction pipeline and adds a GPUDirect Storage (GDS) reader as a fast path for reading uncompressed zarr arrays straight into device memory.
GPU keep-on-device
linumpy.gpu.zarr_iowithgpu_zarr_context()(useszarr.config.enable_gpu()) andread_zarr_to_gpu(...)with auto backend selection (kvikio when available, zarr-gpu otherwise).linumpy.gpu.interpolation: device-preservingresize,affine_transform,map_coordinates.linumpy.gpu.interfacewith a GPU implementation offind_tissue_interface(no-mask path) usingcupyxfilters.linumpy.geometry.interface.find_tissue_interface(..., use_gpu=...)andlinumpy.mosaic.stacking.find_z_overlap(..., use_gpu=...)now route to GPU when requested.linum_aip.pyandlinum_resample_mosaic_grid.pyusegpu_zarr_contextto keep tiles on-device through the slab loop and writer.linum_detect_focal_curvature.py: vectorized roll viatake_along_axis(xpdispatch) and--use_gpu/--no-use_gpu.linum_stack_slices_motor.py:--use_gpu/--no-use_gpuplumbed tofind_z_overlap.kvikio (GDS) reader (prototype)
linumpy/gpu/kvikio_zarr.py: GDS reader for raw uncompressed zarr v2 + v3.Corder, mismatched endian) withNotImplementedError.CuFile.pread.scripts/linum_benchmark_kvikio_zarr.py: benchmark with kvikio andzarr.config.enable_gpu()paths for comparison.read_zarr_to_gpufalls back to zarr-gpu when kvikio is in compat mode, when arrays aren't GDS-compatible, or on any runtime failure.Server / build
shell_scripts/server_setup/nvfs_kernel7_patch.sh: nvidia-fs 2.28.4 patch for kernel 7.0; symvers helper now also handles.ko.zst.pyproject.toml: bumpome-zarrto>=0.16.0(NGFF 0.5).Nextflow pipeline GPU wiring
fix_focal_curvatureandstackprocesses pass--use_gpu/--no-use_gpufromparams.use_gpu.nextflow.config:withName: "fix_focal_curvature"getsmaxForks = params.use_gpu ? 4 : null.withName: "resample_mosaic_grid":maxForks = params.use_gpu ? 6 : null(measured ~1 GB GPU mem per fork; IO-gated)._run_pipelined: prefetch + GPU compute pipeline; periodic free of cupy memory pool.