Refactor applications/DynaCell into a reusable benchmark package#404
Draft
alxndrkalinin wants to merge 164 commits intomodular-viscy-stagingfrom
Draft
Refactor applications/DynaCell into a reusable benchmark package#404alxndrkalinin wants to merge 164 commits intomodular-viscy-stagingfrom
alxndrkalinin wants to merge 164 commits intomodular-viscy-stagingfrom
Conversation
When gpu_augmentations include a spatial crop (e.g. BatchedCenterSpatialCropd), the output shape intentionally differs from z_window_size/yx_patch_size. The validation was raising a false ValueError for configs like UNeXt2 (z_window=20 read, gpu crop to 15) and CellDiff (z_window=13 read, crop to 8). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
z_window_size 15→20: restores the 5-slice Z margin for affine augmentation (20 read → 15 after GPU center crop), matching the original VSCyto3D finetune_3d.py pipeline. batch_size 8→64 with num_samples 2→4: each GPU now processes 16 samples (~13 GB VRAM) instead of 2 (~2 GB), reducing GPU idle time from ~97% to ~84%. LR scaled by sqrt(8) for Adam (0.0002→0.0006). SLURM: mem-per-cpu 12G→20G for /dev/shm mmap buffer headroom. Removed --ckpt_path for fresh training with new hyperparameters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…l crop z_window 8→13 with spatial_size [13,624,624]: provides 5-slice Z margin (13→8 after GPU center crop) and 112px YX margin (624→512) for affine augmentation artifacts, matching the UNeXt2 two-stage crop strategy. batch_size 4→8: doubles throughput without VRAM pressure. Phase3D normalization: median/iqr→mean/std to match UNeXt2 and the original VSCyto3D pipeline. val_gpu_augmentations: DivisibleCropd→BatchedCenterSpatialCropd because CellDiff's ViT requires exact input_spatial_size [8,512,512] (fixed positional embeddings), not just divisible dimensions. max_epochs 200→10: initial smoke-test run before scaling up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
crop_at_read was removed in the mmap preload refactor but this error message still referenced it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Spell out full read/input sizes in header comment. Add single-GPU SLURM script for A100/H100 (80GB). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add scratch_dir: /dev/shm and /dev/shm cleanup to SLURM trap, matching the UNeXt2 config. 256G RAM allocation covers the 86 GB mmap buffer in tmpfs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
fit_unext2_continue.yml had no ckpt_path, causing the job to train from scratch instead of resuming from epoch=15 (step=10128). Points to last-v1.ckpt which is the actual latest checkpoint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduce lr 0.0006→0.0004 and batch_size 64→32 to fit within node RAM budget. Fix val_gpu_augmentations: BatchedDivisibleCropd with k= [1,64,64] left Z at 20 slices (no Z reduction); replace with BatchedCenterSpatialCropd roi_size=[15,384,384] to match the model's expected input. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ViT with 8×512×512 patches is significantly larger than UNeXt2; batch_size=8 OOMs on a single GPU. Reduce to 2 (1 FOV → 2 patches). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous checkpoints used wrong val aug (BatchedDivisibleCropd, no Z reduction) and different lr/batch_size. Resuming would mix incompatible training dynamics. Start fresh with corrected config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Single GPU gave 10,816 steps/epoch (2h) due to 32 overlapping z-windows per FOV × 338 FOVs. With 4-GPU DDP at batch=2/GPU, steps drop to 2,704/epoch (~31 min). mmap buffer is OS-shared so RAM stays flat. Reduce num_workers 8→4 (4 workers × 4 ranks = 32 processes for 32 CPUs). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Kornia 0.8.x hard-codes padding_mode='zeros' in RandomAffine3D. Add _PaddedRandomAffine3D subclass that overrides apply_transform to pass the user-specified mode to warp_affine3d (which already supports it). Expose padding_mode='border'|'reflection' in BatchedRandAffined for configs where crop/output ratio < √2 and the oversized border cannot absorb large rotations without zero-corner artifacts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
624/512 = 1.22 < √2, so the 56px oversized-crop border cannot absorb rotations larger than ~14°. With rotate_range=±π, large rotations leave visible zero-corner gaps in the training batches (not an issue for UNeXt2 which has ratio 600/384=1.56 > √2). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LightningCLI._parse_ckpt_path merges checkpoint hyper_parameters back into the config before class instantiation, so a checkpoint trained with predict_method='generate' would silently override the user's predict_method='sliding_window' in the YAML config. Fix by mirroring DynacellUNet: add ckpt_path to __init__, load state dict there directly, and exclude it (along with all predict-time params) from save_hyperparameters. With no top-level ckpt_path in the config, _parse_ckpt_path is never triggered. Move ckpt_path and output_store from the predict_gpu.yml recipe into the per-experiment predict config; recipes stay as pure templates. Add sec61b/predict_celldiff.yml as the canonical SEC61B CellDiff predict config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
LightningCLI._parse_ckpt_path applies checkpoint hyper_parameters as the highest-priority config layer, overriding values the user explicitly set in the YAML. The correct hierarchy is: base-class defaults → checkpoint hparams → user config Snapshot model init_args before the checkpoint merge and restore them after, so any value present in the user's config always wins. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When rotation + zoom-out combine, the backward-warp footprint can exceed the source crop, creating zero-corner artifacts. The new safe_crop_size parameter computes a per-sample scale floor from the sampled rotation angle: s_min = coverage * k(θ) * D / S, where k = |cos θ| + |sin θ|. safe_crop_coverage (default 1.0) relaxes the constraint — 0.9 allows small corners as extra augmentation while eliminating the worst ~30% of artifacts. Also removes padding_mode=border from CellDiff config (the scale clamping makes it unnecessary). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reorganize into four sections: Project & Context (what is this), Development (how to get running), Project Conventions (patterns here), Engineering Standards (how to write code). Remove duplicate Code Style heading. Consistent structure with dynacell CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
`str or None` → `str | None` for consistency with the rest of the file and project conventions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The docstring said validation always runs after gpu_augmentations, but the code skips validation when gpu_augmentations is present (they handle cropping themselves). Update to match actual behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
When overwrite=True, existing prediction channels in the output store are silently reused instead of raising FileExistsError. Default False preserves the previous error-on-duplicate behavior. Needed for re-running predictions on the same output store during iteration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SEC61B benchmark configs (fit, predict, SLURM scripts) are paper-specific and belong in the dynacell-paper repo. The hcs_sec61b_3d data recipe hardcodes HPC paths. All preserved on preserve/sec61b-configs branch. Also adds .gitignore for lightning_logs/ and outputs/, and updates README to reference the new config layout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three-layer config model: - configs/recipes/ — reusable fragments (model, trainer, data, modes) - configs/examples/ — generic fit/predict pair per model family Fix base: references (../recipes/ → ../../recipes/), scrub hardcoded SEC61B paths from celldiff predict, add missing ckpt_path to fnet3d and unetvit3d predict, change preload default to false in generic recipe, update test config discovery path and __main__ docstring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
UNeXt2 had a model recipe but no generic example. Adds fit.yml and predict.yml following the same pattern as fnet3d/unetvit3d/celldiff examples. Config discovery test now finds 8 leaf configs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Absorb dataset manifest schemas from dynacell-paper and add new benchmark collection and spec schemas per VISCY_HANDOFF.md. - manifests.py: DatasetManifest, TargetConfig, VoxelSpacing, StoreLocations, SplitDefinition + load_manifest, load_splits, get_target - collections.py: Provenance, ChannelEntry, CollectionExperiment, BenchmarkCollection + load_collection - specs.py: BenchmarkSpec + load_benchmark_spec No registry dict, no import-time side effects. Callers pass paths explicitly. Add pydantic>=2 and omegaconf to base dependencies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Reuse viscy_data.collection.ChannelEntry instead of duplicating it - Extract shared OmegaConf+Pydantic loading to _yaml.load_yaml() - Keep Provenance local (stricter than viscy_data version: required created_at/created_by for benchmark traceability) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Migrate tables, figures, and Hydra CLI entry point for benchmark reporting. tables.py is pure pandas, figures.py preserves Agg backend ordering, cli.py uses parents[3] for config resolution. Tests use module-level pytest.importorskip() for pandas/matplotlib so they skip gracefully without those deps installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Migrate rewrite_zarr() and load_preprocess_config() — the clearly reusable, dependency-light preprocess helpers. Dataset-specific utilities (selection, segmentation, workflow) remain in dynacell-paper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Lightning subcommands (fit, predict, test, validate) delegate to viscy_utils.cli.main(). Hydra subcommands (evaluate, report) lazily import their entry points. ModuleNotFoundError prints an install hint for the missing extra instead of a raw traceback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Full evaluation pipeline: pixel metrics (PCC, SSIM, NRMSE, PSNR), segmentation metrics (Dice, IoU), feature metrics (DINOv3, DynaCLR FID), spectral PCC subpackage, and Hydra CLI entry point. Heavy deps (segmenter_model_zoo, aicssegmentation, cubic, microssim, transformers, dynaclr, skimage) gated with try/except ImportError. Pipeline imports segmentation/utils lazily inside functions so `import dynacell.evaluation.pipeline` succeeds without heavy deps. eval.yaml uses OmegaConf ??? markers for all required fields so running without overrides gives MissingMandatoryValue, not AttributeError. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- eval: all deps for dynacell evaluate (segmentation, metrics, cubic, transformers, dynaclr, etc.) - report: pandas + matplotlib + hydra for dynacell report - preprocess: iohub + tqdm for zarr rewriting utilities Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Restructure applications/dynacell/configs/benchmarks/virtual_staining/
so train, predict, and eval leaves for the same benchmark cell live
under one directory `<org>/<train_set>/<model>/`, with shared building
blocks split by consumer:
- `shared/model/` — train/predict building blocks (train_sets,
predict_sets, targets, model_overlays, launcher_profiles)
- `shared/eval/` — HPC-bound Hydra eval groups (target, feature_extractor/dynaclr)
- `<org>/<train_set>/<model>/{train.yml, predict/<predset>.yml, eval/<predset>.yaml}`
Fold `configs/evaluation/` into the benchmark tree and delete the
sibling root. The eval selector renames from `benchmark=<path>` to
`leaf=<path>` to reflect the new filesystem model; leaves are addressed
through a committed symlink tree at `virtual_staining/leaf/` because
Hydra's group resolution requires a physical `leaf/` prefix. Eval
leaves keep `.yaml` (Hydra only discovers `.yaml` for groups); train
and predict leaves stay `.yml`.
`dynacell.__main__` now injects two `hydra.searchpath` roots — the
benchmark tree (for `leaf/`) and `shared/eval/` (for `target/` and
`feature_extractor/dynaclr/`). The `_EXTERNAL_CONFIGS_SUBPATH` constant
becomes `_EXTERNAL_SEARCHPATHS`, and the helper is renamed to
`_external_configs_dirs` (plural, returning a list).
Also fixes a pre-existing bug: the three unetvit3d train leaves under
mito/nucleus/membrane referenced a non-existent `runtime_single_gpu.yml`
launcher profile; they now use `runtime_shared.yml`. Model-overlay
`base:` entries targeting `recipes/` gain an extra `..` segment to
account for the new `shared/model/` depth.
Tests:
- test_benchmark_config_composition.py: path literals, three new
(unetvit3d × {mito,nucleus,membrane}) pairs, new
test_eval_leaf_symlink_resolves parametrised over all 8 eval leaves.
- test_cli_routing.py: three `_external_configs_dir` patches renamed
(plural) and argv selector strings updated to `leaf=`.
- test_submit_benchmark_job.py: parametrised leaf paths.
Docs: virtual_staining/README.md (full rewrite to match new tree),
evaluation/README.md (groups table, external-users section, benchmark
eval leaves section), UNEXT2_VS_FCMAE_CLASSES.md (stale path), dynacell/
README.md (sbatch example path), BENCHMARK_CONFIG_SCHEMA.md (Hydra
search-path contract note about the `leaf/` symlink adapter).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…l/ move
The preceding refactor commit renamed train and predict leaves to
<org>/<train_set>/<model>/{train.yml, predict/<predset>.yml} but the
content updates that prepend the shared/model/ segment to every base:
line were unstaged when that commit landed. This catches the 25 train
and predict leaves up with the new shared/model/ layout, and tightens a
prose comment in fnet3d_paper_fit.yml that still referenced the
pre-refactor shared/targets/ path.
Without this fix a fresh checkout of the reorg commit has train and
predict leaves pointing at shared/train_sets/, shared/targets/, etc. —
paths that no longer exist — so `load_composed_config` would raise
FileNotFoundError on every leaf.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trim the WHAT-narration from docstrings and comments in dynacell.__main__: drop the redundant "two roots are injected" enumeration (the tuple is self-describing), fix the "repo root" claim in _external_configs_dirs (the walk stops at the app's pyproject.toml, not the workspace root), and drop the "joined with commas" line from _inject_external_configs' docstring. Bind ``p = parent / sub`` via walrus inside the existence filter so the path is computed once instead of twice per subpath. In the composition tests, delete the ``EVAL_LEAVES = PREDICT_LEAVES`` alias (the aliased list was the same object, causing silent coupling) and drive the symlink-integrity test off ``PREDICT_LEAVES`` directly. Drop the ``@package _global_`` header assertion from the symlink test — that check belongs to composition, not to symlink integrity. Add a ``REPO_ROOT`` sanity assert so drift of the ``parents[3]`` magic count fails loudly. Tighten ``TestInjectExternalConfigs`` class docstring to a single line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the "in-package / repo checkout" shorthand in the config-groups table with concrete source paths, so readers see exactly where each group lives after the reorg: shared/eval/ for HPC-bound groups, leaf/ for the eval-leaf symlink tree, _configs/ for packaged schema. Tighten the "Running an evaluation" intro so it no longer implies all groups live under _configs/, and spell out the two hydra.searchpath roots dynacell.__main__ injects. Add a footgun entry for external users about Hydra's .yaml-only group resolution — saving a group as .yml makes the selector silently fail with MissingConfigException. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Keep the top level of virtual_staining/ biology-only: `er/`, `membrane/`,
`mito/`, `nucleus/` plus README.md. Everything else — shared building
blocks and the leaf/ Hydra symlink adapter — now lives under a single
_internal/ directory whose leading underscore signals "implementation
detail; don't browse here for science."
Layout is now:
virtual_staining/
<org>/<train_set>/<model>/{train.yml, predict/, eval/}
_internal/
shared/
model/ # train/predict building blocks
eval/ # target/, feature_extractor/dynaclr/
leaf/ # symlink tree aliasing canonical eval leaves
dynacell.__main__ injects two hydra.searchpath roots under _internal/
(the tree root for leaf/, and shared/eval/ for target + dynaclr) —
same cardinality as before, just under the hidden support tree.
Train/predict leaves get `_internal/` prepended to every base: path.
Model overlays gain one more `../` to reach configs/recipes/ (now 6
segments deep instead of 5). The 8 leaf/ symlinks are recreated with
6 relative-dot segments (was 5 at virtual_staining/leaf/, now 6 at
virtual_staining/_internal/leaf/).
Docs refreshed across BENCHMARK_CONFIG_SCHEMA.md, virtual_staining/
README.md, and evaluation/README.md: every layout tree, base: example,
group-source column, and search-path reference points at _internal/.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related bugs in predict-path tiling: P1. DynacellFlowMatching.predict_step routed predict_method='sliding_window' to CELLDiff3DVS.generate_sliding_window(), which is the non-overlapping tiler and does not consume predict_overlap. Any config that set both predict_method='sliding_window' and predict_overlap=[...] silently got non-overlapping tiles, producing different predictions than the user expected. Now raise ValueError when sliding_window is combined with a non-zero overlap, pointing users to predict_method='iterative' (the overlap-anchored velocity tiler) instead. Also refresh the docstring to spell out which methods consume predict_overlap. Fix the membrane predict leaf to match its peers (er, mito): predict_method=iterative. Example config at configs/examples/celldiff/predict.yml gets the same fix plus an inline comment clarifying which methods overlap. P2. DynacellUNet.predict_sliding_window allocated both accumulator buffers (prediction_sum, prediction_count) with zeros_like(source), so their channel dimension was always in_channels. For models whose out_channels != in_channels (common: 1 phase in -> 2 target out), the first `prediction_sum[slicer] += patch_out` raised a broadcast error. Allocate the accumulators lazily from the first patch output so they take the model's actual out_channels and dtype. Regression tests added for both paths: - sliding_window + non-zero overlap must raise ValueError - predict_sliding_window returns the correct shape when out_channels=2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The earlier "always append" strategy broke argparse when users placed
diagnostic flags (-c job, --info, --help, ...) after positional
overrides:
$ dynacell evaluate leaf=er/... -c job
dynacell: error: unrecognized arguments: hydra.searchpath=[file://...]
Hydra's argparse declares ``overrides`` as a single positional with
nargs="*". Once that nargs run is interrupted by a flag, any later
positional is reported unrecognized. Appending the injected token at
the end scatters a positional across the flag in the override-first
layout; prepending scatters it across the flag in the flag-first layout.
Fix: insert the token adjacent to an existing positional override when
one is present, so positionals stay contiguous whichever side the
flags sit on. Fall back to append when the user passed no positional
overrides (e.g. bare --help).
Three regression tests cover the three layouts: flag-leading,
flag-trailing, and no-positional. The former "appends always" test is
replaced by the layout-specific pair.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tighten the "Running an evaluation" lead so readers see up front that repo-checkout groups live under virtual_staining/_internal/ (the hidden support tree), not loose under virtual_staining/. The detailed bullet list below already spelled this out; the intro now matches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 191 out of 201 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+65
to
+69
| if target_name in ["nucleus", "membrane"]: | ||
| _require_segmenter_model_zoo() | ||
| if seg_model is None: | ||
| raise ValueError("SegModel must be provided for nucleus and membrane segmentation.") | ||
| mask = seg_model.apply_on_single_zstack(img[None, ...]) |
There was a problem hiding this comment.
The ValueError message for missing seg_model says "SegModel must be provided", but the parameter is seg_model and the required type here is the loaded SuperModel. Updating the message to reference seg_model/SuperModel will make the failure clearer for users.
Lightning's SLURMEnvironment rejects --ntasks at runtime (slurm.py:_validate_srun_variables) and demands --ntasks-per-node. Yesterday's fix emitted the wrong directive; all four FCMAE jobs failed within 50 s on the cluster. Switch the sbatch generator, hardware profiles, pre-submit validator, and invariant test to use ntasks_per_node throughout. The enforced invariant is now devices == ntasks_per_node and gpus == nodes × devices. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Gated HF repos (DINOv3) need per-user access grants, but the downloaded weights are identical. Before this change every team member who ran `dynacell evaluate compute_feature_metrics=true` re-downloaded ~2 GB of DINOv3 weights into their own `~/.cache/huggingface/`. Even ignoring the network cost, only one team member has HF gate access right now — every other member's runs 403 at model fetch. Fix: point HF_HOME at a team-shared project-storage dir. First user with gated access downloads once; everyone else reads the shared copy. Two layers so the behavior is automatic regardless of invocation: 1. `dynacell.__main__._maybe_set_shared_hf_cache()` runs before Hydra subcommands (evaluate, precompute-gt, report). It defaults HF_HOME to `/hpc/projects/comp.micro/virtual_staining/models/dynacell/evaluation/hf_cache/` when three conditions hold: the user hasn't set HF_HOME themselves, we're on a repo checkout (same signal as external configs), and the shared dir exists on this machine. Wheel installs and non-HPC machines fall through to the normal ~/.cache/huggingface default. 2. `runtime_shared.yml` also sets HF_HOME for sbatch-rendered fit / predict jobs. Those jobs don't currently load HF models, but having the env var declared in the launcher profile future-proofs any later workflow that does and keeps the team convention visible in config. Four regression tests cover the auto-setter: user-set wins, wheel install skips, missing shared dir skips, repo checkout + existing dir sets. README documents the mechanism and first-time download contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three defaults were required on every eval / precompute-gt invocation
even though they are effectively fixed for our benchmark set:
feature_extractor/dinov3=lvd1689m (only DINOv3 we evaluate with)
feature_extractor/dynaclr=default (team DynaCLR checkpoint)
io.gt_cache_dir=/hpc/.../eval_cache/<TARGET> (one dir per target)
Hoist them so the minimal command
dynacell precompute-gt target=er_sec61b predict_set=ipsc_confocal
composes everything needed for all four cache families, and
dynacell evaluate leaf=<...>
carries the same defaults without the redundant override directives.
- eval.yaml defaults list switches
optional feature_extractor/dinov3: null -> lvd1689m
optional feature_extractor/dynaclr: null -> default
The `optional` keyword means wheel installs that don't ship
dynaclr=default (it lives in the repo's shared/eval/ tree) silently
skip the default — wheel users supply their own via --config-dir.
- Each target YAML now carries io.gt_cache_dir for its target's cache
path (SEC61B, TOMM20, nucleus, membrane).
- The 8 canonical eval leaves drop the redundant
override /feature_extractor/dinov3: lvd1689m
override /feature_extractor/dynaclr: default
and the per-leaf io.gt_cache_dir — all now inherited.
- README updated: "Enable feature metrics" and "Priming the cache"
examples show the one-line minimal forms.
To swap variants at call-time nothing changes:
dynacell evaluate ... feature_extractor/dinov3=<other_variant>
dynacell evaluate ... feature_extractor.dynaclr.checkpoint=/other.ckpt
dynacell evaluate ... feature_extractor/dinov3=null # disable
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three issues from /simplify agents:
- Hardcoded HPC path in _SHARED_HF_CACHE makes the module CZ Biohub-
specific; other sites couldn't override without editing source. Read
DYNACELL_SHARED_HF_CACHE at module load and fall back to the CZ
Biohub default only when the env var is absent. Extension story is
now: set DYNACELL_SHARED_HF_CACHE in your shell / sbatch env block
and the rest of the mechanism just works.
- Docstring on _maybe_set_shared_hf_cache named the sibling helper
_external_configs_dirs; rephrase in semantic terms ("external Hydra
searchpaths resolve") so the docstring doesn't rot if the detection
mechanism ever changes.
- Four test methods in TestMaybeSetSharedHfCache imported `os` inline
instead of at module top (violates CLAUDE.md). Plus a redundant
monkeypatch.delenv() cleanup at the end of the last test that
pytest's monkeypatch fixture already handles on teardown. Lift
`import os` to module top, drop the inline imports, drop the
redundant cleanup.
Skipped /simplify nitpicks:
- functools.cache on _external_configs_dirs (6-8 stat calls saved
per CLI invocation; not worth the cognitive cost of a cache that
has to coexist with tests that patch the function reference)
- Parametrize the 4 tests (4 is small enough to leave explicit)
- Inline comment on opaque group names (marginal signal)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Setting HF_HOME to the team-shared project-storage dir inadvertently broke per-user gated-repo authentication: HF_HOME relocates the entire HF directory tree including the auth token file. The first precompute attempt with DINOv3 enabled came back 401 "Access to model ... is restricted. You must have access to it and be authenticated to access it" — not a missing grant (the user had been granted access, which would have returned 403), but a missing token. HF was looking under $HF_HOME/token, where no token file lives. Switch to HF_HUB_CACHE, which only relocates the weights/datasets cache and leaves auth state at its default ~/.cache/huggingface/token. Each user's personal token now enforces gate ACLs on their own account; the cache of downloaded weights is shared on project storage as intended. The _maybe_set_shared_hf_cache helper and its four tests flip from HF_HOME → HF_HUB_CACHE. The runtime_shared.yml launcher env block does the same. README gains a paragraph explaining *why* we chose HF_HUB_CACHE over HF_HOME, and clarifies that subsequent reads from the shared cache are plain disk reads that don't round-trip to HF and so don't need a token at all. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previous 4-GPU FCMAE runs OOM-killed before first training step. On gpu-c-1 sstat showed MaxVMSize=103G/rank at startup — 4 ranks × 103G ≈ 412G of virtual memory, plus mmap_preload /dev/shm staging, pushed past the 512G cgroup cap. num_workers=8 per rank meant 32 dataloader workers inflating per-rank VM before the first step. Halving to 4 cuts startup VM roughly in half while keeping prefetch above the single-worker bottleneck. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…odel>/ Regroup the virtual_staining benchmark tree so a training experiment lives in one directory. Paths go from <org>/<train_set>/<model>/ with separate train/predict/eval subdirs to <org>/<model>/<train_set>/ with flat predict__<predict_set>.yml and eval__<predict_set>.yaml filenames. The old layout assumed each (org, model) cell had exactly one training run. New research requirements — joint ipsc_confocal + a549_mantis training, plus evaluating each trained model on both held-out splits — make train_set an index within a cell rather than a hierarchy. Grouping by train set puts a training run (train + its predicts + its evals) in a single subdir so adding or removing an experiment is one `mkdir` or one `rm -r`. 33 leaf moves (17 train + 8 predict + 8 eval), 8 _internal/leaf symlink rewrites (5-up target instead of 6-up, new eval__<X>.yaml filename), predict base: paths lose one ../ (depth 4 -> 3), tests + tool docstring + 4 READMEs track the new layout. New selectors: leaf=er/celldiff/ ipsc_confocal/eval__ipsc_confocal. No __main__.py code change — search path roots (_internal/ and _internal/shared/eval/) are unaffected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ext early return, fill eval defaults - nrmse/psnr/ssim now normalize each image independently (min-max) instead of jointly - init_cache_context early return (no gt_cache_dir) was missing dinov3/dynaclr fields - eval.yaml: fill in default dinov3 model name and dynaclr encoder config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…aves
Stage 1 of the A549 expansion roadmap. Benchmark leaves now declare
`benchmark.dataset_ref: {dataset, target}` instead of hardcoding
`data_path`, `source_channel`, and `target_channel` per target. A
composition-time resolver reads a Pydantic DatasetManifest YAML from
roots configured via (in order) CLI roots → DYNACELL_MANIFEST_ROOTS env
var → entry points under `dynacell.manifest_roots`. The resolver is a
plain callable threaded through `viscy_utils.compose.load_composed_config`
as a keyword-only `resolver=` kwarg so viscy-utils stays generic.
Partial `dataset_ref` is a strict no-op: leaves that inherit only one
half (e.g. `{dataset: aics-hipsc}` from the shared ipsc_confocal
train_set) compose byte-identically until their target fragment is also
migrated. This makes Stage 2 (per-organelle migration) safe to roll out
incrementally.
Migrates the full ER-on-iPSC slice as a proof of end-to-end parity:
targets/er_sec61b.yml, train_sets/ipsc_confocal.yml,
predict_sets/ipsc_confocal.yml, and both ER predict leaves (celldiff,
unetvit3d). Non-ER leaves inherit partial refs and are untouched until
their targets migrate. Shared `source_channel: Phase3D` stays in the
ipsc_confocal fragments; the resolver tolerates matching values and
raises only on mismatch, which keeps non-ER leaves composing correctly
during the staged rollout.
The Lightning branch of dynacell's `main_cli` catches
NoManifestRootsError / ManifestNotFoundError / TargetNotFoundError and
prints the message to stderr with exit code 2 — no stack trace for user
misconfiguration.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A549_EXPANSION_ROADMAP.md sequences the 6-stage rollout that moves the benchmark tree from hardcoded per-target paths to manifest-driven resolution, so iPSC confocal and a549 mantis datasets can coexist without path duplication. DATASET_REF_RESOLVER_SPEC.md is the authoritative spec for the Stage 1 resolver that just landed in 0aa7e1a. Both docs live in-tree so future readers can reconstruct why the resolver exists and how the remaining stages (per-target migration, eval-side resolution, eval target migration, a549 manifest, a549 leaf expansion) should land. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ot fixture Resolve drift from the Stage 1 plan: source_channel moves out of the shared ipsc_confocal train/predict fragments and onto each target fragment (mito/nucleus/membrane), since dataset_ref is the single source of truth for data_path/source_channel/target_channel on migrated ER leaves. Restore the strict collision check that raises whenever a composed fragment also declares a resolved field, dropping the agreement-tolerant transitional policy. Also: cache load_manifest with lru_cache (same policy as _load_yaml_cached); move DatasetManifest.source_channel validation onto the schema; delete the unused _DATA_FIELDS constant; rename a shadowing loop variable in the conflict-details comprehension; switch DYNACELL_MANIFEST_ROOTS parsing to os.pathsep; consolidate the autouse manifest-root fixture into conftest.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Mirrors the companion dynacell-paper manifest additions so that Stage 2 resolver tests can splice data_path / target_channel for the two multi-marker targets derived from cell.zarr. No production-side effect — the fixture is consumed only by the autouse test conftest. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Stage 2 of the A549 expansion roadmap. Each target fragment now declares benchmark.dataset_ref.target and drops its hardcoded data_path / source_channel / target_channel; the resolver splices concrete values from the manifest (either the repo-local fixture in tests or the dynacell-paper canonical manifest at runtime). The 6 non-ER predict leaves drop their explicit data.init_args.data_path for the same reason — strict collision would otherwise raise. Tests swap the mito-only partial-ref no-op parity pair for parametrized resolver tests that cover every model in TRAIN_LEAVES / PREDICT_LEAVES composing a migrated fragment (pytest.skip handles missing FCMAE leaves for nucleus/membrane). The roadmap Stage 2 paragraph is updated to match the batched single-PR shape. Requires the dynacell-paper companion PR to land before production nucleus/membrane runs — VisCy CI is self-contained via the fixture. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…d organelles Collapses the ER-specific train/predict resolver tests and the parametrized non-ER block into unified table-driven tests keyed off a single _MIGRATED_TARGET_INFO dict. Eliminates duplicate skip/fixture setup and the N=5 model sub-parametrization that existed solely to accommodate pytest.skip for non-existent leaves. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Extends StoreLocations with gt_cache_dir: Path | None, and extends ResolvedDataset + resolve_dataset_ref to surface cell_segmentation_path and gt_cache_dir alongside the existing manifest fields. Adds a Hydra-side apply_dataset_ref hook in evaluation/_ref_hook.py that mirrors the Lightning-side compose hook: partial refs are no-ops, collisions raise ValueError, and _splice writes io.* + derived pred_channel_name + pixel_metrics.spacing into a DictConfig. Introduced as unused code — the hook is only wired at eval/precompute entry points in the follow-up commit, which keeps this change bisect-safe. Fixture manifest now carries gt_cache_dir on all four targets so the next commit's integration tests can assert resolved values byte-for-byte against the pre-migration hardcoded paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Wires apply_dataset_ref at the top of evaluate_model and precompute_gt, and extends the __main__ Hydra branch with the same ManifestNotFound/NoManifestRoots/TargetNotFound catch the Lightning branch already has. Sets HYDRA_FULL_ERROR=1 so @hydra.main does not swallow resolver errors before our clean SystemExit(2) catch. Migrates all four eval target YAMLs (er_sec61b, mito_tomm20, nucleus, membrane) plus predict_set/ipsc_confocal.yaml together so io.* and pixel_metrics.spacing come from the manifest via benchmark.dataset_ref. Adds benchmark: null placeholder to eval.yaml so struct-mode merges accept the new key. Nucleus and membrane share cell.zarr but get distinct gt_cache_dir paths, keyed by organelle not channel. Test coverage: integration tests compose every canonical eval leaf and verify splice output (Layer 1) and invoke evaluate_model.__wrapped__ / precompute_gt.__wrapped__ to prove the hook is actually wired at both @hydra.main entry points (Layer 2). Router tests assert clean SystemExit(2) with the expected stderr for all three resolver error classes on both evaluate and precompute-gt. Companion dynacell-paper manifest bump adds gt_cache_dir to the canonical aics-hipsc manifest; safe to merge either order because Pydantic extra='ignore' makes the field optional. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Factors the partial-ref no-op + Pydantic validation pipeline shared by _compose_hook and _ref_hook into a single dataset_ref_from_dict helper on dynacell.data.resolver, so REQUIRED_REF_KEYS lives in exactly one place and both hook surfaces use identical semantics (ValidationError propagates as-is rather than being rewrapped). Restores the caller's struct-mode state in _splice via try/finally so the hook no longer silently unlocks the whole Hydra config for the rest of evaluate_model / precompute_gt. Test cleanup: promotes clear_global_hydra to a shared conftest fixture (deletes the two local copies), drops the redundant DYNACELL_MANIFEST_ROOTS setenv calls now that conftest autouses it, removes the dead _FIXTURE_ROOT module constants, drops the _clear_manifest_cache autouse (the path-keyed lru cache is already stable across tests), and aligns the null-dataset scenario on the pydantic.ValidationError the hook now surfaces. YAML cleanup: deletes the WHAT-narrating header blocks from the four migrated eval target YAMLs, predict_set/ipsc_confocal.yaml, and eval.yaml — the behavior is already visible from dataset_ref + the hook module, so the comments added noise without WHY. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ints/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Makes
applications/dynacellinto the reusabledynacellpackage in VisCy.Splits config responsibilities more cleanly:
dynacellcode plus generic model recipes/examples.dynacell-paper.This PR also adds opt-in validation loss support for CellDiff flow-matching so benchmark reruns can log a scalar validation curve without forcing extra validation work on every flow-matching job.
What changed
Config and package reorg
applications/dynacell/examples/configs/intoapplications/dynacell/configs/.configs/recipes/for reusable fragments andconfigs/examples/for generic runnable examples.lightning_logs/, SEC61B paper configs, and the hard-codedhcs_sec61b_3d.ymlrecipe.Absorb reusable benchmark modules from
dynacell-paperdynacell.datawith path-based manifest/collection/spec loaders and Pydantic schemas.dynacell.reportingwith Hydra config, tables, figures, and tests.dynacell.evaluationwith configs, spectral-PCC tooling, I/O helpers, segmentation, metrics, and tests.dynacell.preprocesswith the reusableconfig.pyandzarr_utils.pyutilities.Package and CLI shape
dynacellCLI soevaluateandreportroute to Hydra entry points whilefit/predict/test/validatestill go throughviscy_utils._configsinside the Python package so installeddynacell evaluate/reportcommands resolve config from package data instead of the source tree.dynacell.__init__lazy so lightweight imports do not pull in the full training stack.eval,report, andpreprocess, plus clearer missing-extra install hints from the CLI router.Runtime, training, and transform fixes
overwrite=support toHCSPredictionWriter, including explicit handling/logging when prediction channels already exist.VisCyCLIcheckpoint hparam precedence sopredict/test/validatehonor user config over stale checkpoint values, whilefitresume still keeps checkpoint training hparams.ckpt_pathloading toDynacellFlowMatchingso predict-time settings come from config rather than checkpoint metadata.compute_validation_losstoDynacellFlowMatching. When enabled it logsloss/val/<idx>and aggregateloss/validatewhile preserving epoch-end ODE sample generation; when disabled it keeps the previous cheaper validation path._aggregate_validation_losses()helper so UNet and flow-matching use the same weighted aggregation logic.BatchedRandAffinedwith configurablepadding_mode,safe_crop_size, andsafe_crop_coverage, plus warnings/tests around unsupported X/Y-rotation guarantees.Evaluation and reporting follow-up fixes
corr_coef/PCC computation and add regression tests.evaluate_model()returns consistent with fresh evaluation output types.use_gpuin evaluation instead of always taking the CUDA path when visible.FileNotFoundErrorfor missing preprocess configs instead of failing less clearly.Benchmark schema + launcher (landed earlier on this branch)
configs/benchmarks/virtual_staining/layout with shared axes (train_sets/,targets/,model_overlays/,launcher_profiles/,predict_sets/) and per-leaf train/predict files composed viaviscy_utils.compose.load_composed_config.applications/dynacell/tools/submit_benchmark_job.pywith--dry-run,--print-script,--print-resolved-config, and--override. It composes a leaf, strips thelauncher:+benchmark:reserved keys, freezes the resolved config to{run_root}/resolved/, and renders an sbatch script to{run_root}/slurm/._maybe_compose_configinviscy_utils/cli.pyso directdynacell fit -c <leaf>also strips those reserved keys.ckpt_sha256_12sidecar to the evaluation cache so repeated lookups skip full-file hashing.Topology / trainer-recipe ownership cleanup
Four commits (
9734d07 → f9b8f1e → 5b7eaae → 3fdb7cf) untangle three layers that were previously mingled inrecipes/trainer/fit_*.yml:runtime_single_gpu.yml → runtime_shared.yml: the file's content (srun +PYTHONUNBUFFERED/NCCL_DEBUG/PYTHONFAULTHANDLERenv vars) has nothing single-GPU-specific, and every leaf — including the 4-GPU UNeXt2 leaf — composed it. Rename + 14 leaf-base updates + schema-doc references.recipes/topology/single_gpu.yml+ddp_4gpu.yml) ownstrainer.accelerator/devices/strategy/num_nodes. Duplicated independently underapplications/dynacell/configs/recipes/topology/andapplications/cytoland/examples/configs/recipes/topology/because CLAUDE.md forbids cross-application imports.recipes/trainer/fit.yml+predict.ymlper app, replacingfit_1gpu/fit_4gpu/fit_fm_4gpu/predict_gpu. Mode invariants only (seed, log cadence, callbacks,WandbLoggerclass pinned with per-app project). Topology and precision are owned by the topology recipes and model overlays respectively. Every benchmark model overlay and example leaf now composes[fit.yml + topology/*.yml]and sets precision + max_epochs explicitly.hardware_h200_single,hardware_gpu_any_long,hardware_4gpu) drop theirtrainer:block; they now own onlylauncher.sbatch.*.strategy: ddp → autoon every single-GPU leaf (Lightning-equivalent at devices=1, just intent-honest); dynacell examplesfnet3d/unetvit3d/unext2/fit.ymland benchmarkunext2.ymlnow pinWandbLogger project=dynacellinstead of falling through to Lightning's default TensorBoard;fit_1gpu-derived paths move fromsave_top_k: 4tosave_top_k: 5.precision: 32-true;examples/celldiff/fit.ymlkeeps FM-style epoch-based checkpointing (save_top_k: -1,every_n_epochs: 10, no monitor); cytoland FCMAE pretraining keepsstrategy: ddp_find_unused_parameters_true; cytoland→dynacell bridge configs keepproject: dynacelloverride over cytoland's default.applications/dynacell/tools/LEGACY/examples_configs/(pre-schema CellDiff configs) is removed outright per CLAUDE.md's "avoid backwards-compatibility hacks" rule. The equivalence tests that composed LEGACY (test_*_leaf_matches_legacy,test_fnet3d_paper_leaf_matches_ran_config,test_byte_equivalence_sec61b_train_leaf) are replaced with forward-looking composition sanity tests.~/.claude/plans/vectorized-sleeping-clock.md; commits ran through/review-planand/verify-plansubagent rounds before landing.Dependency and packaging updates
microssimto a compatible Git revision and refreshuv.lockso the newdynacell[eval]extra resolves with the current workspace.Why
This is the VisCy side of the
dynacell-papersplit:dynacellin VisCy becomes the reusable runtime.dynacell-paperowns benchmark-instance configs, orchestration, and manuscript-specific assets.That gives us a cleaner boundary for future work like multi-experiment collections and benchmark-specific runners without keeping paper state in the shared application package.
The flow-matching validation-loss change is intentionally opt-in because it adds extra validation compute and the resulting metric is still stochastic; default flow-matching checkpointing remains epoch-based.
The topology / trainer-recipe cleanup at the end removes a silent-drop-DDP trap: before this change, pairing a mismatched trainer recipe and benchmark hardware profile let the hardware profile's
trainer.deviceswin (both layers redundantly set it), which could silently run a DDP training on 1 GPU. Topology is now the single source of truth.Verification
uvx ruff check applications/dynacell/uvx ruff format --check applications/dynacell/.venv/bin/python -m pytest applications/dynacell/tests -q.venv/bin/python -m pytest applications/dynacell/tests/test_data_manifests.py applications/dynacell/tests/test_reporting_tables.py applications/dynacell/tests/test_reporting_tables_extended.py applications/dynacell/tests/test_reporting_figures.py applications/dynacell/tests/test_evaluation_metrics.py applications/dynacell/tests/test_evaluation_pipeline.py applications/dynacell/tests/test_evaluation_io.py applications/dynacell/tests/test_preprocess_config.py applications/dynacell/tests/test_preprocess_zarr_utils.py applications/dynacell/tests/test_cli_routing.py -q.venv/bin/python -m pytest applications/dynacell/tests/test_engine.py -q.venv/bin/python -m pytest applications/dynacell/tests/test_training_integration.py::test_celldiff_fm_validation_loss_keeps_generation applications/dynacell/tests/test_training_integration.py::test_celldiff_fm_warmup_cosine_fast_dev_run applications/dynacell/tests/test_training_integration.py::test_celldiff_fm_constant_schedule_fast_dev_run -q.venv/bin/dynacell fit --print_config -c applications/dynacell/configs/examples/unext2/fit.ymluv run pytest applications/dynacell/tests/test_benchmark_config_composition.py applications/dynacell/tests/test_submit_benchmark_job.py -v(benchmark schema + submit-tool tests)ddp → autoat devices=1,WandbLoggeradded where previously default TB).uv run dynacell fit -c <benchmark-unext2-leaf> --trainer.fast_dev_run=true --trainer.devices=1 --trainer.strategy=autoran one training step cleanly on an A40 interactive node (loss 1.048).submit_benchmark_job.py <benchmark-unext2-leaf>queued a real 4-GPU DDP job (31122607) via the new schema path.