-
Notifications
You must be signed in to change notification settings - Fork 4
Description
I’m opening this issue to discuss and track work on a higher-performance remote reader for NWB/LINDI over S3-compatible storage (Ceph/S3 in our case).
This follows discussions with Ben Dichter and Ryan Ly.
I’m interested in implementing and benchmarking this in lindi. I’ve built similar high-performance readers before (different format, similar constraints), e.g. in this project: https://www.nature.com/articles/s41593-024-01715-2. I'm opening this issue to track the work and elicit feedback.
Use case
For ML workloads, we repeatedly request time windows from very large ephys files (e.g., ~7 TB NWB, 1024 channels). Current HDF5-style access patterns can trigger many small underlying reads, and on our Ceph/S3 system each range request has high fixed latency (~0.3 s), so request count dominates runtime.
Goal
For a single logical request (time window over one or more channels), reduce S3 range GETs from “many/small” to:
- ~1 request when required bytes are contiguous
- a small number of requests when bytes are split into a few contiguous runs
- parallel requests when multiple discontiguous channel runs are required
Requirements
- Minimize number of underlying S3 range requests per query.
- Coalesce adjacent/near-adjacent byte ranges when beneficial.
- Parallelize independent range reads (e.g., discontiguous channel runs).
- Keep correctness identical to current reader behavior.
Assumptions
- Dataset chunk/layout is configured to support efficient sequential access for target query patterns.
- Some multi-channel requests may be discontiguous in file layout; these should be parallelized rather than forced
into large over-reads. - For
.lindi.jsongeneration, we may need explicit per-chunk refs (i.e., avoid_EXTERNAL_ARRAY_LINKfor target
datasets).
Non-goals
- Optimizing arbitrary highly-strided access patterns that require excessive over-read.
- Changing NWB schema semantics.
Proposed deliverables
- Range planner: map a dataset selection to exact byte ranges.
- Coalescing policy: merge ranges using configurable
max_gap_bytes/max_overread_bytes. - Parallel fetch executor with bounded concurrency.
- Benchmarks + instrumentation:
- request count
- total bytes fetched
- wall time / throughput
- Regression tests for correctness and performance-sensitive paths.
Acceptance criteria (initial)
- Demonstrate substantial reduction in S3 request count for representative ephys window reads.
- Show end-to-end latency improvement on remote object storage.
- Preserve exact returned data vs baseline reader.
Comments on API placement are welcome:
- extend
LindiRemfile, or - add a new specialized high-performance reader path.