Skip to content

Port to iohub 0.3.2 ImageArray API#407

Open
alxndrkalinin wants to merge 7 commits intomodular-viscy-stagingfrom
port/iohub-0.3.2-api
Open

Port to iohub 0.3.2 ImageArray API#407
alxndrkalinin wants to merge 7 commits intomodular-viscy-stagingfrom
port/iohub-0.3.2-api

Conversation

@alxndrkalinin
Copy link
Copy Markdown
Collaborator

@alxndrkalinin alxndrkalinin commented Apr 22, 2026

Summary

Staging commit a10d4c4 bumped packages/viscy-data to iohub>=0.3.2, which rewrote ImageArray and requires Python ≥3.12. Fresh uv sync breaks ("No solution found"); once forced through, 81 tests fail / 1090 pass against this tip because callers still use the removed .name / .tensorstore(context=...) attributes. This PR ports all callers to the new API while preserving the tensorstore backend (async concurrent reads with data_copy_concurrency / cache_pool_bytes).

After this PR: fresh sync resolves cleanly, 8 fail / 1163 pass — the remaining 8 are the pre-existing out-of-scope failures listed below, not caused or touched by this change.

Commits

  1. chore(deps) — bump requires-python >=3.12 in root + 9 sub-project pyprojects, drop the now-unsupported Python :: 3.11 classifier, regenerate uv.lock (iohub==0.3.2).
  2. refactor(viscy-data)ImageArray.nameImageArray.path at 5 call sites (3 source files). Sliding-window sample-id tuple is emitted as f"/{img.path}" so PredictionWriter._create_image / MaskTestDataset continue to parse the 5-part /A/1/0/array layout via split("/").
  3. refactor(data) — port .tensorstore(context=...).native and move tensorstore context config from per-dataset to per-plate (at open_ome_zarr time) via iohub.core.config.TensorStoreConfig.
  4. refactor(data): use context managers for one-shot plate openers — wrap generate_normalization_metadata and run_qc_metrics plate opens in with blocks, per CLAUDE.md's context-manager rule.
  5. refactor(data): drop unused cache_pool_bytes from Dataset classes — remove the no-op param from TripletDataset / MultiExperimentTripletDataset and the three call sites that still passed it. DataModule-level cache_pool_bytes stays (it's the one that actually feeds TensorStoreConfig at plate-open time), so YAML configs are unaffected.
  6. docs(data): align docstrings and error text with .name -> .path port — update HCSStackIndex.image comment, _read_img_window return-docstring, and _get_windows error message to match the /A/1/0/array format the port emits.
  7. fix(docs): address Copilot review typosZ-scliceZ-slice in prediction_writer.py; fix stale qc_metrics.run_qc_metrics reference in focus.py comment (real function is generate_qc_metadata).

How the tensorstore backend is preserved

iohub 0.3.2 removed per-array .tensorstore(context=...). The context is now set once at plate open:

from iohub.core.config import TensorStoreConfig
with open_ome_zarr(
    path,
    implementation="tensorstore",
    implementation_config=TensorStoreConfig(
        data_copy_concurrency=N,
        cache_pool_bytes=M,
    ),
) as plate:
    handle = plate[pos]["0"].native  # raw ts.TensorStore
    data = handle[sel].read().result()
    # or ts.stack([handle[sel].translate_to[0] for _ in patches]).read().result()

Four plate openers are upgraded to the tensorstore impl — the only ones in the repo whose positions flow into .native[sel].read().result() / ts.stack(...) reads:

Opener Call site
TripletDataModule._align_tracks_tables_with_positions packages/viscy-data/src/viscy_data/triplet.py
MultiExperimentIndex._store_cache opener applications/dynaclr/src/dynaclr/data/index.py
generate_normalization_metadata packages/viscy-utils/src/viscy_utils/meta_utils.py
run_qc_metricsgenerate_qc_metadata applications/qc/src/qc/qc_metrics.py

SLURM-aware CPU count (SLURM_CPUS_PER_TASKos.cpu_count() fallback) is preserved. cache_pool_bytes is owned by the DataModule and threaded into TensorStoreConfig at plate-open time. The Dataset classes no longer carry the parameter.

All downstream .oindex[...], .translate_to[...], ts.stack(...).read().result() — including the batched-read pattern documented in applications/dynaclr/CLAUDE.md — work unchanged because .native on a tensorstore-backed plate returns exactly the same raw ts.TensorStore the old .tensorstore(context=ctx) factory returned. Other openers (hcs.py, mmap_cache.py, segmentation.py, cell_index.py, etc.) stay on the default zarr impl — they only use .oindex / img[...] / metadata, which work on both impls.

recheck_cached_data="open" is no longer plumbable through iohub and has been dropped; benign for the read-only training/predict paths that were the only consumers.

Smoke verification

handle type: TensorStore
slab dtype/shape: float32 (44, 10, 10)
ts.stack shape: (3, 10, 10)

An HCSDataModule smoke test on real SEC61B zarr builds and iterates through a batch end-to-end.

Not fixed in this PR (pre-existing, unrelated)

These 8 tests are still red on the base branch and this PR does not attempt them:

  • applications/airtable/tests/test_database.py — 2 failures (mock call-count drift, env-vars fixture).
  • applications/dynaclr/tests/test_multi_experiment_integration.py — 4 failures (missing configs/training/{fit,multi_experiment_fit}.yml after the dynaclr-dino / cleanup PRs).
  • applications/dynaclr/tests/test_training_integration.py — 2 failures (one config-missing, one pytorch_metric_learning labels shape regression).

Test plan

  • uv sync --all-packages --all-extras resolves cleanly on a fresh venv.
  • Smoke: open_ome_zarr(..., implementation="tensorstore", implementation_config=TensorStoreConfig(data_copy_concurrency=8)) on SEC61B zarr returns a TensorStore handle with working .read().result() and ts.stack.
  • uv run pytest across packages + applications: 8 failed / 1163 passed / 1 xfailed (vs. 81 failed / 1090 passed pre-port).
  • HCSDataModule setup + batch iteration works end-to-end against real zarr.
  • Full dynacell fit --trainer.fast_dev_run=true on a GPU (deferred — blocked on a /-partition space issue unrelated to this PR).

🤖 Generated with Claude Code

alxndrkalinin and others added 3 commits April 21, 2026 17:02
iohub 0.3.2 only ships wheels for Python >=3.12, so a workspace
that still advertises >=3.11 forces uv to resolve for a Python
where the viscy-data dep cannot be satisfied ("No solution found"
during fresh sync).

Bumps the pin on the root workspace plus every sub-project (four
packages and five applications) and drops the now-unsupported
Python 3.11 classifier from each pyproject. uv.lock shrinks
significantly because large portions of the 3.11-only resolve
graph drop out.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
iohub 0.3.2 removed ImageArray.name; .path returns the equivalent
zarr node path (minus the leading slash on the zarr-python backend
these call sites use). Prefix a literal "/" in the sliding-window
sample-id tuple so downstream consumers that split the path —
notably PredictionWriter._create_image and MaskTestDataset —
continue to see the five-part /A/1/0/array layout they already
parse via split("/").

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
iohub 0.3.2 dropped per-array ImageArray.tensorstore(context=...).
The tensorstore context is now set once at plate-open time via
open_ome_zarr(implementation="tensorstore",
implementation_config=TensorStoreConfig(...)), and the raw
ts.TensorStore handle is reachable through ImageArray.native.

Moves context construction (data_copy_concurrency from
SLURM_CPUS_PER_TASK or os.cpu_count(), cache_pool_bytes from the
DataModule config) out of the Dataset layer and onto the plate
opener in each of the four call paths that actually need
tensorstore reads: TripletDataModule, MultiExperimentIndex (for
MultiExperimentTripletDataset), viscy-utils meta_utils grid-sample,
and qc.run_qc_metrics.

Every downstream .oindex / .translate_to / ts.stack(...).read()
call — including the batched-read pattern documented in
applications/dynaclr/CLAUDE.md — keeps working unchanged because
.native returns the same raw ts.TensorStore the old
.tensorstore(context=ctx) factory used to return. Dataset-level
cache_pool_bytes params are retained as no-ops for call-site
compat, documented as such.

The recheck_cached_data="open" kwarg is no longer plumbable through
iohub's API; dropping it is benign for the read-only training and
predict paths that were the only consumers.

qc/tests/test_focus.py::test_focus_slice_metric_call is updated to
open its fixture plate with the tensorstore impl, matching how
FocusSliceMetric is invoked from production
qc_metrics.run_qc_metrics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Ports the codebase to iohub>=0.3.2’s updated ImageArray API and preserves the tensorstore backend by configuring tensorstore at plate-open time (via TensorStoreConfig) and switching call sites from .tensorstore(context=...) to .native.

Changes:

  • Bump requires-python to >=3.12 across the root project and multiple sub-projects, removing the Python 3.11 classifier.
  • Update ImageArray.name usages to ImageArray.path, and adjust sliding-window sample IDs to keep the existing "/A/1/0/array" parsing behavior.
  • Move tensorstore configuration to open_ome_zarr(..., implementation="tensorstore", implementation_config=TensorStoreConfig(...)) and update downstream reads to use .native.

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pyproject.toml Raise minimum Python to 3.12; drop 3.11 classifier.
packages/viscy-utils/src/viscy_utils/meta_utils.py Open plates with tensorstore + TensorStoreConfig; switch sampling to .native.
packages/viscy-utils/src/viscy_utils/callbacks/prediction_writer.py Replace image.name with image.path in resize debug logging.
packages/viscy-utils/pyproject.toml Raise minimum Python to 3.12; drop 3.11 classifier.
packages/viscy-transforms/pyproject.toml Raise minimum Python to 3.12; drop 3.11 classifier.
packages/viscy-models/pyproject.toml Raise minimum Python to 3.12; drop 3.11 classifier.
packages/viscy-data/src/viscy_data/triplet.py Switch tensorstore access to .native; configure tensorstore at plate open with TensorStoreConfig.
packages/viscy-data/src/viscy_data/sliding_window.py Replace img.name with img.path; emit sample IDs as f"/{img.path}" to preserve downstream parsing.
packages/viscy-data/src/viscy_data/segmentation.py Replace target_img.name with target_img.path in debug logging.
packages/viscy-data/pyproject.toml Raise minimum Python to 3.12; normalize iohub>=0.3.2 spec formatting.
applications/qc/tests/test_focus.py Open test plate using tensorstore impl + TensorStoreConfig to support .native.read().result().
applications/qc/src/qc/qc_metrics.py Open plate using tensorstore impl + TensorStoreConfig for metric execution.
applications/qc/src/qc/focus.py Switch focus metric reads to .native[...] .read().result() and drop direct tensorstore context creation.
applications/qc/pyproject.toml Raise minimum Python to 3.12; drop 3.11 classifier.
applications/dynaclr/src/dynaclr/data/index.py Cache plates opened with tensorstore impl + configurable TensorStoreConfig.
applications/dynaclr/src/dynaclr/data/dataset.py Switch tensorstore handle acquisition to .native; keep cache_pool_bytes as documented no-op.
applications/dynaclr/src/dynaclr/data/datamodule.py Build and pass a shared TensorStoreConfig (SLURM-aware concurrency + cache pool) into the index.
applications/dynaclr/pyproject.toml Raise minimum Python to 3.12; drop 3.11 classifier.
applications/dynacell/pyproject.toml Raise minimum Python to 3.12; drop 3.11 classifier.
applications/cytoland/pyproject.toml Raise minimum Python to 3.12; drop 3.11 classifier.
applications/airtable/pyproject.toml Raise minimum Python to 3.12; drop 3.11 classifier.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/viscy-utils/src/viscy_utils/callbacks/prediction_writer.py Outdated
Comment thread applications/qc/src/qc/focus.py Outdated
alxndrkalinin and others added 4 commits April 21, 2026 22:19
Both generate_normalization_metadata and run_qc_metrics open the plate
in read/write mode, use it through a sizable loop body, then close it
explicitly at the end. A mid-loop exception would leak the write lock
and the tensorstore handle. CLAUDE.md requires context managers for
any external resource — wrap both openers in `with` blocks so cleanup
happens on every exit path.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The iohub 0.3.2 port moved tensorstore cache-pool config to the
plate-open site (TripletDataModule and DynaCLR DataModule). The
parameter on TripletDataset and MultiExperimentTripletDataset was
retained in the previous commit as a no-op for call-site compat,
but CLAUDE.md explicitly says to delete unused code rather than
carry backwards-compat ballast. Removing it — and the three
Dataset-construction call sites that still pass it — closes the
loop. DataModule-level cache_pool_bytes stays (it's the one that
actually feeds TensorStoreConfig at open time), so existing YAML
configs are unaffected.

Also updates the TripletDataModule docstring for cache_pool_bytes
to describe the new plate-level semantic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three text spots still described the pre-port shape:

- HCSStackIndex.image comment still said "name" and showed the old
  four-segment format without the leading slash.
- SlidingWindowDataset._read_img_window docstring still referred to
  the returned string as "image name".
- The IndexError in _get_windows formatted the FOV as img_arr.path
  without the leading slash prefix the read sites prepend, so the
  error message's FOV identifier didn't match what the sample-id
  tuple stores.

Update all three to match the ported format.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- prediction_writer.py:63 — debug f-string said "Z-sclice"; fix to
  "Z-slice" to match the parameter docstring a few lines above.
- qc/focus.py:54 — inline comment referenced a non-existent
  qc_metrics.run_qc_metrics; point to the actual orchestrator,
  generate_qc_metadata.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@edyoshikun
Copy link
Copy Markdown
Member

Here are some of my findings:

  • @srivarra can we expose some TensorStoreConfig variables? 0.3.2 file_io_concurrency for NSF I/O parallelism ,recheck_cached_data for the heavy load NSF reads of metadata. These are critical for I/O boosts. These need to be exposed for us to evalute.
  • Why do we have TensorStoreConfig instead of just the raw ts.Context (naive question). In VisCy we use the ts.Context. Currently, we do ts.stack([pos1["0"].native, pos2["0"].native, ...]).read().result() which works, but each .native call goes through iohub's Position wrapper and I think it re-parses .zattrs/.zarray on first access. An iohub helper like plate.stack_positions([path1, path2, ...], array="0") -> ts.TensorStore that batches the metadata reads would cut the per-FOV Python overhead. Even better: a plate-level "open all FOVs under this well" call that shares one ts.Context and one metadata read pass.
  • @srivarra do we always need to through the ImageArray? Can we go Position.array_tensorstore(array="0") that returns ts.TensorStore directly without building a wrapper? Or is this beause the default backend is zarrpython?

@ieivanov ieivanov requested a review from srivarra April 22, 2026 19:40
@ieivanov
Copy link
Copy Markdown
Contributor

@srivarra is there an opportunity to introduce better iohub defaults what will reduce some of the boilerplate code in this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants