Skip to content

Proposal: High-performance S3 range reader for NWB/LINDI (coalesced + parallel reads) #118

@davidparks21

Description

@davidparks21

I’m opening this issue to discuss and track work on a higher-performance remote reader for NWB/LINDI over S3-compatible storage (Ceph/S3 in our case).

This follows discussions with Ben Dichter and Ryan Ly.

I’m interested in implementing and benchmarking this in lindi. I’ve built similar high-performance readers before (different format, similar constraints), e.g. in this project: https://www.nature.com/articles/s41593-024-01715-2. I'm opening this issue to track the work and elicit feedback.

Use case

For ML workloads, we repeatedly request time windows from very large ephys files (e.g., ~7 TB NWB, 1024 channels). Current HDF5-style access patterns can trigger many small underlying reads, and on our Ceph/S3 system each range request has high fixed latency (~0.3 s), so request count dominates runtime.

Goal

For a single logical request (time window over one or more channels), reduce S3 range GETs from “many/small” to:

  • ~1 request when required bytes are contiguous
  • a small number of requests when bytes are split into a few contiguous runs
  • parallel requests when multiple discontiguous channel runs are required

Requirements

  • Minimize number of underlying S3 range requests per query.
  • Coalesce adjacent/near-adjacent byte ranges when beneficial.
  • Parallelize independent range reads (e.g., discontiguous channel runs).
  • Keep correctness identical to current reader behavior.

Assumptions

  • Dataset chunk/layout is configured to support efficient sequential access for target query patterns.
  • Some multi-channel requests may be discontiguous in file layout; these should be parallelized rather than forced
    into large over-reads.
  • For .lindi.json generation, we may need explicit per-chunk refs (i.e., avoid _EXTERNAL_ARRAY_LINK for target
    datasets).

Non-goals

  • Optimizing arbitrary highly-strided access patterns that require excessive over-read.
  • Changing NWB schema semantics.

Proposed deliverables

  1. Range planner: map a dataset selection to exact byte ranges.
  2. Coalescing policy: merge ranges using configurable max_gap_bytes / max_overread_bytes.
  3. Parallel fetch executor with bounded concurrency.
  4. Benchmarks + instrumentation:
    • request count
    • total bytes fetched
    • wall time / throughput
  5. Regression tests for correctness and performance-sensitive paths.

Acceptance criteria (initial)

  • Demonstrate substantial reduction in S3 request count for representative ephys window reads.
  • Show end-to-end latency improvement on remote object storage.
  • Preserve exact returned data vs baseline reader.

Comments on API placement are welcome:

  • extend LindiRemfile, or
  • add a new specialized high-performance reader path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions