Port to iohub 0.3.2 ImageArray API#407
Open
alxndrkalinin wants to merge 7 commits intomodular-viscy-stagingfrom
Open
Port to iohub 0.3.2 ImageArray API#407alxndrkalinin wants to merge 7 commits intomodular-viscy-stagingfrom
alxndrkalinin wants to merge 7 commits intomodular-viscy-stagingfrom
Conversation
iohub 0.3.2 only ships wheels for Python >=3.12, so a workspace
that still advertises >=3.11 forces uv to resolve for a Python
where the viscy-data dep cannot be satisfied ("No solution found"
during fresh sync).
Bumps the pin on the root workspace plus every sub-project (four
packages and five applications) and drops the now-unsupported
Python 3.11 classifier from each pyproject. uv.lock shrinks
significantly because large portions of the 3.11-only resolve
graph drop out.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
iohub 0.3.2 removed ImageArray.name; .path returns the equivalent
zarr node path (minus the leading slash on the zarr-python backend
these call sites use). Prefix a literal "/" in the sliding-window
sample-id tuple so downstream consumers that split the path —
notably PredictionWriter._create_image and MaskTestDataset —
continue to see the five-part /A/1/0/array layout they already
parse via split("/").
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
iohub 0.3.2 dropped per-array ImageArray.tensorstore(context=...). The tensorstore context is now set once at plate-open time via open_ome_zarr(implementation="tensorstore", implementation_config=TensorStoreConfig(...)), and the raw ts.TensorStore handle is reachable through ImageArray.native. Moves context construction (data_copy_concurrency from SLURM_CPUS_PER_TASK or os.cpu_count(), cache_pool_bytes from the DataModule config) out of the Dataset layer and onto the plate opener in each of the four call paths that actually need tensorstore reads: TripletDataModule, MultiExperimentIndex (for MultiExperimentTripletDataset), viscy-utils meta_utils grid-sample, and qc.run_qc_metrics. Every downstream .oindex / .translate_to / ts.stack(...).read() call — including the batched-read pattern documented in applications/dynaclr/CLAUDE.md — keeps working unchanged because .native returns the same raw ts.TensorStore the old .tensorstore(context=ctx) factory used to return. Dataset-level cache_pool_bytes params are retained as no-ops for call-site compat, documented as such. The recheck_cached_data="open" kwarg is no longer plumbable through iohub's API; dropping it is benign for the read-only training and predict paths that were the only consumers. qc/tests/test_focus.py::test_focus_slice_metric_call is updated to open its fixture plate with the tensorstore impl, matching how FocusSliceMetric is invoked from production qc_metrics.run_qc_metrics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Ports the codebase to iohub>=0.3.2’s updated ImageArray API and preserves the tensorstore backend by configuring tensorstore at plate-open time (via TensorStoreConfig) and switching call sites from .tensorstore(context=...) to .native.
Changes:
- Bump
requires-pythonto>=3.12across the root project and multiple sub-projects, removing the Python 3.11 classifier. - Update
ImageArray.nameusages toImageArray.path, and adjust sliding-window sample IDs to keep the existing"/A/1/0/array"parsing behavior. - Move tensorstore configuration to
open_ome_zarr(..., implementation="tensorstore", implementation_config=TensorStoreConfig(...))and update downstream reads to use.native.
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Raise minimum Python to 3.12; drop 3.11 classifier. |
| packages/viscy-utils/src/viscy_utils/meta_utils.py | Open plates with tensorstore + TensorStoreConfig; switch sampling to .native. |
| packages/viscy-utils/src/viscy_utils/callbacks/prediction_writer.py | Replace image.name with image.path in resize debug logging. |
| packages/viscy-utils/pyproject.toml | Raise minimum Python to 3.12; drop 3.11 classifier. |
| packages/viscy-transforms/pyproject.toml | Raise minimum Python to 3.12; drop 3.11 classifier. |
| packages/viscy-models/pyproject.toml | Raise minimum Python to 3.12; drop 3.11 classifier. |
| packages/viscy-data/src/viscy_data/triplet.py | Switch tensorstore access to .native; configure tensorstore at plate open with TensorStoreConfig. |
| packages/viscy-data/src/viscy_data/sliding_window.py | Replace img.name with img.path; emit sample IDs as f"/{img.path}" to preserve downstream parsing. |
| packages/viscy-data/src/viscy_data/segmentation.py | Replace target_img.name with target_img.path in debug logging. |
| packages/viscy-data/pyproject.toml | Raise minimum Python to 3.12; normalize iohub>=0.3.2 spec formatting. |
| applications/qc/tests/test_focus.py | Open test plate using tensorstore impl + TensorStoreConfig to support .native.read().result(). |
| applications/qc/src/qc/qc_metrics.py | Open plate using tensorstore impl + TensorStoreConfig for metric execution. |
| applications/qc/src/qc/focus.py | Switch focus metric reads to .native[...] .read().result() and drop direct tensorstore context creation. |
| applications/qc/pyproject.toml | Raise minimum Python to 3.12; drop 3.11 classifier. |
| applications/dynaclr/src/dynaclr/data/index.py | Cache plates opened with tensorstore impl + configurable TensorStoreConfig. |
| applications/dynaclr/src/dynaclr/data/dataset.py | Switch tensorstore handle acquisition to .native; keep cache_pool_bytes as documented no-op. |
| applications/dynaclr/src/dynaclr/data/datamodule.py | Build and pass a shared TensorStoreConfig (SLURM-aware concurrency + cache pool) into the index. |
| applications/dynaclr/pyproject.toml | Raise minimum Python to 3.12; drop 3.11 classifier. |
| applications/dynacell/pyproject.toml | Raise minimum Python to 3.12; drop 3.11 classifier. |
| applications/cytoland/pyproject.toml | Raise minimum Python to 3.12; drop 3.11 classifier. |
| applications/airtable/pyproject.toml | Raise minimum Python to 3.12; drop 3.11 classifier. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Both generate_normalization_metadata and run_qc_metrics open the plate in read/write mode, use it through a sizable loop body, then close it explicitly at the end. A mid-loop exception would leak the write lock and the tensorstore handle. CLAUDE.md requires context managers for any external resource — wrap both openers in `with` blocks so cleanup happens on every exit path. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The iohub 0.3.2 port moved tensorstore cache-pool config to the plate-open site (TripletDataModule and DynaCLR DataModule). The parameter on TripletDataset and MultiExperimentTripletDataset was retained in the previous commit as a no-op for call-site compat, but CLAUDE.md explicitly says to delete unused code rather than carry backwards-compat ballast. Removing it — and the three Dataset-construction call sites that still pass it — closes the loop. DataModule-level cache_pool_bytes stays (it's the one that actually feeds TensorStoreConfig at open time), so existing YAML configs are unaffected. Also updates the TripletDataModule docstring for cache_pool_bytes to describe the new plate-level semantic. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three text spots still described the pre-port shape: - HCSStackIndex.image comment still said "name" and showed the old four-segment format without the leading slash. - SlidingWindowDataset._read_img_window docstring still referred to the returned string as "image name". - The IndexError in _get_windows formatted the FOV as img_arr.path without the leading slash prefix the read sites prepend, so the error message's FOV identifier didn't match what the sample-id tuple stores. Update all three to match the ported format. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- prediction_writer.py:63 — debug f-string said "Z-sclice"; fix to "Z-slice" to match the parameter docstring a few lines above. - qc/focus.py:54 — inline comment referenced a non-existent qc_metrics.run_qc_metrics; point to the actual orchestrator, generate_qc_metadata. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Member
|
Here are some of my findings:
|
Contributor
|
@srivarra is there an opportunity to introduce better iohub defaults what will reduce some of the boilerplate code in this PR? |
Open
10 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Staging commit
a10d4c4bumpedpackages/viscy-datatoiohub>=0.3.2, which rewroteImageArrayand requires Python ≥3.12. Freshuv syncbreaks ("No solution found"); once forced through, 81 tests fail / 1090 pass against this tip because callers still use the removed.name/.tensorstore(context=...)attributes. This PR ports all callers to the new API while preserving the tensorstore backend (async concurrent reads withdata_copy_concurrency/cache_pool_bytes).After this PR: fresh sync resolves cleanly, 8 fail / 1163 pass — the remaining 8 are the pre-existing out-of-scope failures listed below, not caused or touched by this change.
Commits
chore(deps)— bumprequires-python >=3.12in root + 9 sub-project pyprojects, drop the now-unsupportedPython :: 3.11classifier, regenerateuv.lock(iohub==0.3.2).refactor(viscy-data)—ImageArray.name→ImageArray.pathat 5 call sites (3 source files). Sliding-window sample-id tuple is emitted asf"/{img.path}"soPredictionWriter._create_image/MaskTestDatasetcontinue to parse the 5-part/A/1/0/arraylayout viasplit("/").refactor(data)— port.tensorstore(context=...)→.nativeand move tensorstore context config from per-dataset to per-plate (atopen_ome_zarrtime) viaiohub.core.config.TensorStoreConfig.refactor(data): use context managers for one-shot plate openers— wrapgenerate_normalization_metadataandrun_qc_metricsplate opens inwithblocks, per CLAUDE.md's context-manager rule.refactor(data): drop unused cache_pool_bytes from Dataset classes— remove the no-op param fromTripletDataset/MultiExperimentTripletDatasetand the three call sites that still passed it. DataModule-levelcache_pool_bytesstays (it's the one that actually feedsTensorStoreConfigat plate-open time), so YAML configs are unaffected.docs(data): align docstrings and error text with .name -> .path port— updateHCSStackIndex.imagecomment,_read_img_windowreturn-docstring, and_get_windowserror message to match the/A/1/0/arrayformat the port emits.fix(docs): address Copilot review typos—Z-sclice→Z-sliceinprediction_writer.py; fix staleqc_metrics.run_qc_metricsreference infocus.pycomment (real function isgenerate_qc_metadata).How the tensorstore backend is preserved
iohub 0.3.2 removed per-array
.tensorstore(context=...). The context is now set once at plate open:Four plate openers are upgraded to the tensorstore impl — the only ones in the repo whose positions flow into
.native[sel].read().result()/ts.stack(...)reads:TripletDataModule._align_tracks_tables_with_positionspackages/viscy-data/src/viscy_data/triplet.pyMultiExperimentIndex._store_cacheopenerapplications/dynaclr/src/dynaclr/data/index.pygenerate_normalization_metadatapackages/viscy-utils/src/viscy_utils/meta_utils.pyrun_qc_metrics→generate_qc_metadataapplications/qc/src/qc/qc_metrics.pySLURM-aware CPU count (
SLURM_CPUS_PER_TASK→os.cpu_count()fallback) is preserved.cache_pool_bytesis owned by the DataModule and threaded intoTensorStoreConfigat plate-open time. The Dataset classes no longer carry the parameter.All downstream
.oindex[...],.translate_to[...],ts.stack(...).read().result()— including the batched-read pattern documented inapplications/dynaclr/CLAUDE.md— work unchanged because.nativeon a tensorstore-backed plate returns exactly the same rawts.TensorStorethe old.tensorstore(context=ctx)factory returned. Other openers (hcs.py,mmap_cache.py,segmentation.py,cell_index.py, etc.) stay on the default zarr impl — they only use.oindex/img[...]/ metadata, which work on both impls.recheck_cached_data="open"is no longer plumbable through iohub and has been dropped; benign for the read-only training/predict paths that were the only consumers.Smoke verification
An HCSDataModule smoke test on real SEC61B zarr builds and iterates through a batch end-to-end.
Not fixed in this PR (pre-existing, unrelated)
These 8 tests are still red on the base branch and this PR does not attempt them:
applications/airtable/tests/test_database.py— 2 failures (mock call-count drift, env-vars fixture).applications/dynaclr/tests/test_multi_experiment_integration.py— 4 failures (missingconfigs/training/{fit,multi_experiment_fit}.ymlafter the dynaclr-dino / cleanup PRs).applications/dynaclr/tests/test_training_integration.py— 2 failures (one config-missing, onepytorch_metric_learninglabels shape regression).Test plan
uv sync --all-packages --all-extrasresolves cleanly on a fresh venv.open_ome_zarr(..., implementation="tensorstore", implementation_config=TensorStoreConfig(data_copy_concurrency=8))on SEC61B zarr returns aTensorStorehandle with working.read().result()andts.stack.uv run pytestacross packages + applications: 8 failed / 1163 passed / 1 xfailed (vs. 81 failed / 1090 passed pre-port).dynacell fit --trainer.fast_dev_run=trueon a GPU (deferred — blocked on a/-partition space issue unrelated to this PR).🤖 Generated with Claude Code