Refactor applications/DynaCell into a reusable benchmark package by alxndrkalinin · Pull Request #404 · mehta-lab/VisCy

alxndrkalinin · 2026-04-07T21:29:49Z

Summary

Makes applications/dynacell into the reusable dynacell package in VisCy.

Splits config responsibilities more cleanly:

VisCy keeps reusable dynacell code plus generic model recipes/examples.
Paper-specific benchmark state such as SEC61B run configs is removed from VisCy and is expected to live in dynacell-paper.

This PR also adds opt-in validation loss support for CellDiff flow-matching so benchmark reruns can log a scalar validation curve without forcing extra validation work on every flow-matching job.

What changed

Config and package reorg

Move reusable configs from applications/dynacell/examples/configs/ into applications/dynacell/configs/.
Keep configs/recipes/ for reusable fragments and configs/examples/ for generic runnable examples.
Refresh generic fit/predict examples for CellDiff, FNet3D, and UNetViT3D, and add generic UNeXt2 fit/predict examples.
Remove committed lightning_logs/, SEC61B paper configs, and the hard-coded hcs_sec61b_3d.yml recipe.
Update the dynacell README, config-discovery tests, and local ignore rules to match the new layout.

Absorb reusable benchmark modules from `dynacell-paper`

Add dynacell.data with path-based manifest/collection/spec loaders and Pydantic schemas.
Add dynacell.reporting with Hydra config, tables, figures, and tests.
Add dynacell.evaluation with configs, spectral-PCC tooling, I/O helpers, segmentation, metrics, and tests.
Add dynacell.preprocess with the reusable config.py and zarr_utils.py utilities.

Package and CLI shape

Extend the dynacell CLI so evaluate and report route to Hydra entry points while fit/predict/test/validate still go through viscy_utils.
Ship Hydra _configs inside the Python package so installed dynacell evaluate/report commands resolve config from package data instead of the source tree.
Make dynacell.__init__ lazy so lightweight imports do not pull in the full training stack.
Add optional dependency extras for eval, report, and preprocess, plus clearer missing-extra install hints from the CLI router.

Runtime, training, and transform fixes

Add overwrite= support to HCSPredictionWriter, including explicit handling/logging when prediction channels already exist.
Adjust VisCyCLI checkpoint hparam precedence so predict/test/validate honor user config over stale checkpoint values, while fit resume still keeps checkpoint training hparams.
Add direct ckpt_path loading to DynacellFlowMatching so predict-time settings come from config rather than checkpoint metadata.
Add opt-in compute_validation_loss to DynacellFlowMatching. When enabled it logs loss/val/<idx> and aggregate loss/validate while preserving epoch-end ODE sample generation; when disabled it keeps the previous cheaper validation path.
Extract a shared _aggregate_validation_losses() helper so UNet and flow-matching use the same weighted aggregation logic.
Skip the strict HCS spatial-shape check when GPU augmentations intentionally handle cropping.
Extend BatchedRandAffined with configurable padding_mode, safe_crop_size, and safe_crop_coverage, plus warnings/tests around unsupported X/Y-rotation guarantees.

Evaluation and reporting follow-up fixes

Fix corr_coef/PCC computation and add regression tests.
Make cached evaluate_model() returns consistent with fresh evaluation output types.
Fix report barplots for mixed-result inputs where some models are missing requested metrics.
Respect use_gpu in evaluation instead of always taking the CUDA path when visible.
Raise FileNotFoundError for missing preprocess configs instead of failing less clearly.

Benchmark schema + launcher (landed earlier on this branch)

Commit the configs/benchmarks/virtual_staining/ layout with shared axes (train_sets/, targets/, model_overlays/, launcher_profiles/, predict_sets/) and per-leaf train/predict files composed via viscy_utils.compose.load_composed_config.
Migrate the CellDiff train + predict configs for ER/mito/nucleus/membrane onto schema leaves; add FNet3D paper, UNetViT3D, and UNeXt2 train leaves for ER.
Add applications/dynacell/tools/submit_benchmark_job.py with --dry-run, --print-script, --print-resolved-config, and --override. It composes a leaf, strips the launcher: + benchmark: reserved keys, freezes the resolved config to {run_root}/resolved/, and renders an sbatch script to {run_root}/slurm/.
Extend _maybe_compose_config in viscy_utils/cli.py so direct dynacell fit -c <leaf> also strips those reserved keys.
Add a ckpt_sha256_12 sidecar to the evaluation cache so repeated lookups skip full-file hashing.

Topology / trainer-recipe ownership cleanup

Four commits (9734d07 → f9b8f1e → 5b7eaae → 3fdb7cf) untangle three layers that were previously mingled in recipes/trainer/fit_*.yml:

Runtime profile renamed. runtime_single_gpu.yml → runtime_shared.yml: the file's content (srun + PYTHONUNBUFFERED/NCCL_DEBUG/PYTHONFAULTHANDLER env vars) has nothing single-GPU-specific, and every leaf — including the 4-GPU UNeXt2 leaf — composed it. Rename + 14 leaf-base updates + schema-doc references.
New topology layer (recipes/topology/single_gpu.yml + ddp_4gpu.yml) owns trainer.accelerator/devices/strategy/num_nodes. Duplicated independently under applications/dynacell/configs/recipes/topology/ and applications/cytoland/examples/configs/recipes/topology/ because CLAUDE.md forbids cross-application imports.
Unified trainer recipes — recipes/trainer/fit.yml + predict.yml per app, replacing fit_1gpu/fit_4gpu/fit_fm_4gpu/predict_gpu. Mode invariants only (seed, log cadence, callbacks, WandbLogger class pinned with per-app project). Topology and precision are owned by the topology recipes and model overlays respectively. Every benchmark model overlay and example leaf now composes [fit.yml + topology/*.yml] and sets precision + max_epochs explicitly.
Hardware profiles (hardware_h200_single, hardware_gpu_any_long, hardware_4gpu) drop their trainer: block; they now own only launcher.sbatch.*.
Intentional behavior deltas (spelled out, not masked): strategy: ddp → auto on every single-GPU leaf (Lightning-equivalent at devices=1, just intent-honest); dynacell examples fnet3d/unetvit3d/unext2/fit.yml and benchmark unext2.yml now pin WandbLogger project=dynacell instead of falling through to Lightning's default TensorBoard; fit_1gpu-derived paths move from save_top_k: 4 to save_top_k: 5.
Preserved via leaf-body overrides: FNet3D paper keeps precision: 32-true; examples/celldiff/fit.yml keeps FM-style epoch-based checkpointing (save_top_k: -1, every_n_epochs: 10, no monitor); cytoland FCMAE pretraining keeps strategy: ddp_find_unused_parameters_true; cytoland→dynacell bridge configs keep project: dynacell override over cytoland's default.
LEGACY deletion: applications/dynacell/tools/LEGACY/examples_configs/ (pre-schema CellDiff configs) is removed outright per CLAUDE.md's "avoid backwards-compatibility hacks" rule. The equivalence tests that composed LEGACY (test_*_leaf_matches_legacy, test_fnet3d_paper_leaf_matches_ran_config, test_byte_equivalence_sec61b_train_leaf) are replaced with forward-looking composition sanity tests.
Plan + review trail: plan is in ~/.claude/plans/vectorized-sleeping-clock.md; commits ran through /review-plan and /verify-plan subagent rounds before landing.

Dependency and packaging updates

Pin microssim to a compatible Git revision and refresh uv.lock so the new dynacell[eval] extra resolves with the current workspace.
Allow direct references in hatch metadata so the workspace can build/package those dependency pins cleanly.

Why

This is the VisCy side of the dynacell-paper split:

dynacell in VisCy becomes the reusable runtime.
dynacell-paper owns benchmark-instance configs, orchestration, and manuscript-specific assets.

That gives us a cleaner boundary for future work like multi-experiment collections and benchmark-specific runners without keeping paper state in the shared application package.

The flow-matching validation-loss change is intentionally opt-in because it adds extra validation compute and the resulting metric is still stochastic; default flow-matching checkpointing remains epoch-based.

The topology / trainer-recipe cleanup at the end removes a silent-drop-DDP trap: before this change, pairing a mismatched trainer recipe and benchmark hardware profile let the hardware profile's trainer.devices win (both layers redundantly set it), which could silently run a DDP training on 1 GPU. Topology is now the single source of truth.

Verification

uvx ruff check applications/dynacell/
uvx ruff format --check applications/dynacell/
.venv/bin/python -m pytest applications/dynacell/tests -q
.venv/bin/python -m pytest applications/dynacell/tests/test_data_manifests.py applications/dynacell/tests/test_reporting_tables.py applications/dynacell/tests/test_reporting_tables_extended.py applications/dynacell/tests/test_reporting_figures.py applications/dynacell/tests/test_evaluation_metrics.py applications/dynacell/tests/test_evaluation_pipeline.py applications/dynacell/tests/test_evaluation_io.py applications/dynacell/tests/test_preprocess_config.py applications/dynacell/tests/test_preprocess_zarr_utils.py applications/dynacell/tests/test_cli_routing.py -q
.venv/bin/python -m pytest applications/dynacell/tests/test_engine.py -q
.venv/bin/python -m pytest applications/dynacell/tests/test_training_integration.py::test_celldiff_fm_validation_loss_keeps_generation applications/dynacell/tests/test_training_integration.py::test_celldiff_fm_warmup_cosine_fast_dev_run applications/dynacell/tests/test_training_integration.py::test_celldiff_fm_constant_schedule_fast_dev_run -q
.venv/bin/dynacell fit --print_config -c applications/dynacell/configs/examples/unext2/fit.yml
uv run pytest applications/dynacell/tests/test_benchmark_config_composition.py applications/dynacell/tests/test_submit_benchmark_job.py -v (benchmark schema + submit-tool tests)
Topology/trainer cleanup end-to-end:
- Composition snapshot of 35 touched leaves before vs after the cleanup; only expected intentional deltas (strategy ddp → auto at devices=1, WandbLogger added where previously default TB).
- uv run dynacell fit -c <benchmark-unext2-leaf> --trainer.fast_dev_run=true --trainer.devices=1 --trainer.strategy=auto ran one training step cleanly on an A40 interactive node (loss 1.048).
- submit_benchmark_job.py <benchmark-unext2-leaf> queued a real 4-GPU DDP job (31122607) via the new schema path.

When gpu_augmentations include a spatial crop (e.g. BatchedCenterSpatialCropd), the output shape intentionally differs from z_window_size/yx_patch_size. The validation was raising a false ValueError for configs like UNeXt2 (z_window=20 read, gpu crop to 15) and CellDiff (z_window=13 read, crop to 8). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

z_window_size 15→20: restores the 5-slice Z margin for affine augmentation (20 read → 15 after GPU center crop), matching the original VSCyto3D finetune_3d.py pipeline. batch_size 8→64 with num_samples 2→4: each GPU now processes 16 samples (~13 GB VRAM) instead of 2 (~2 GB), reducing GPU idle time from ~97% to ~84%. LR scaled by sqrt(8) for Adam (0.0002→0.0006). SLURM: mem-per-cpu 12G→20G for /dev/shm mmap buffer headroom. Removed --ckpt_path for fresh training with new hyperparameters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…l crop z_window 8→13 with spatial_size [13,624,624]: provides 5-slice Z margin (13→8 after GPU center crop) and 112px YX margin (624→512) for affine augmentation artifacts, matching the UNeXt2 two-stage crop strategy. batch_size 4→8: doubles throughput without VRAM pressure. Phase3D normalization: median/iqr→mean/std to match UNeXt2 and the original VSCyto3D pipeline. val_gpu_augmentations: DivisibleCropd→BatchedCenterSpatialCropd because CellDiff's ViT requires exact input_spatial_size [8,512,512] (fixed positional embeddings), not just divisible dimensions. max_epochs 200→10: initial smoke-test run before scaling up. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

crop_at_read was removed in the mmap preload refactor but this error message still referenced it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Spell out full read/input sizes in header comment. Add single-GPU SLURM script for A100/H100 (80GB). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add scratch_dir: /dev/shm and /dev/shm cleanup to SLURM trap, matching the UNeXt2 config. 256G RAM allocation covers the 86 GB mmap buffer in tmpfs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fit_unext2_continue.yml had no ckpt_path, causing the job to train from scratch instead of resuming from epoch=15 (step=10128). Points to last-v1.ckpt which is the actual latest checkpoint. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Reduce lr 0.0006→0.0004 and batch_size 64→32 to fit within node RAM budget. Fix val_gpu_augmentations: BatchedDivisibleCropd with k= [1,64,64] left Z at 20 slices (no Z reduction); replace with BatchedCenterSpatialCropd roi_size=[15,384,384] to match the model's expected input. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

ViT with 8×512×512 patches is significantly larger than UNeXt2; batch_size=8 OOMs on a single GPU. Reduce to 2 (1 FOV → 2 patches). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Previous checkpoints used wrong val aug (BatchedDivisibleCropd, no Z reduction) and different lr/batch_size. Resuming would mix incompatible training dynamics. Start fresh with corrected config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Single GPU gave 10,816 steps/epoch (2h) due to 32 overlapping z-windows per FOV × 338 FOVs. With 4-GPU DDP at batch=2/GPU, steps drop to 2,704/epoch (~31 min). mmap buffer is OS-shared so RAM stays flat. Reduce num_workers 8→4 (4 workers × 4 ranks = 32 processes for 32 CPUs). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Kornia 0.8.x hard-codes padding_mode='zeros' in RandomAffine3D. Add _PaddedRandomAffine3D subclass that overrides apply_transform to pass the user-specified mode to warp_affine3d (which already supports it). Expose padding_mode='border'|'reflection' in BatchedRandAffined for configs where crop/output ratio < √2 and the oversized border cannot absorb large rotations without zero-corner artifacts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

624/512 = 1.22 < √2, so the 56px oversized-crop border cannot absorb rotations larger than ~14°. With rotate_range=±π, large rotations leave visible zero-corner gaps in the training batches (not an issue for UNeXt2 which has ratio 600/384=1.56 > √2). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

LightningCLI._parse_ckpt_path merges checkpoint hyper_parameters back into the config before class instantiation, so a checkpoint trained with predict_method='generate' would silently override the user's predict_method='sliding_window' in the YAML config. Fix by mirroring DynacellUNet: add ckpt_path to __init__, load state dict there directly, and exclude it (along with all predict-time params) from save_hyperparameters. With no top-level ckpt_path in the config, _parse_ckpt_path is never triggered. Move ckpt_path and output_store from the predict_gpu.yml recipe into the per-experiment predict config; recipes stay as pure templates. Add sec61b/predict_celldiff.yml as the canonical SEC61B CellDiff predict config. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

LightningCLI._parse_ckpt_path applies checkpoint hyper_parameters as the highest-priority config layer, overriding values the user explicitly set in the YAML. The correct hierarchy is: base-class defaults → checkpoint hparams → user config Snapshot model init_args before the checkpoint merge and restore them after, so any value present in the user's config always wins. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When rotation + zoom-out combine, the backward-warp footprint can exceed the source crop, creating zero-corner artifacts. The new safe_crop_size parameter computes a per-sample scale floor from the sampled rotation angle: s_min = coverage * k(θ) * D / S, where k = |cos θ| + |sin θ|. safe_crop_coverage (default 1.0) relaxes the constraint — 0.9 allows small corners as extra augmentation while eliminating the worst ~30% of artifacts. Also removes padding_mode=border from CellDiff config (the scale clamping makes it unnecessary). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Reorganize into four sections: Project & Context (what is this), Development (how to get running), Project Conventions (patterns here), Engineering Standards (how to write code). Remove duplicate Code Style heading. Consistent structure with dynacell CLAUDE.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

`str or None` → `str | None` for consistency with the rest of the file and project conventions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The docstring said validation always runs after gpu_augmentations, but the code skips validation when gpu_augmentations is present (they handle cropping themselves). Update to match actual behavior. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When overwrite=True, existing prediction channels in the output store are silently reused instead of raising FileExistsError. Default False preserves the previous error-on-duplicate behavior. Needed for re-running predictions on the same output store during iteration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

SEC61B benchmark configs (fit, predict, SLURM scripts) are paper-specific and belong in the dynacell-paper repo. The hcs_sec61b_3d data recipe hardcodes HPC paths. All preserved on preserve/sec61b-configs branch. Also adds .gitignore for lightning_logs/ and outputs/, and updates README to reference the new config layout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Three-layer config model: - configs/recipes/ — reusable fragments (model, trainer, data, modes) - configs/examples/ — generic fit/predict pair per model family Fix base: references (../recipes/ → ../../recipes/), scrub hardcoded SEC61B paths from celldiff predict, add missing ckpt_path to fnet3d and unetvit3d predict, change preload default to false in generic recipe, update test config discovery path and __main__ docstring. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

UNeXt2 had a model recipe but no generic example. Adds fit.yml and predict.yml following the same pattern as fnet3d/unetvit3d/celldiff examples. Config discovery test now finds 8 leaf configs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Absorb dataset manifest schemas from dynacell-paper and add new benchmark collection and spec schemas per VISCY_HANDOFF.md. - manifests.py: DatasetManifest, TargetConfig, VoxelSpacing, StoreLocations, SplitDefinition + load_manifest, load_splits, get_target - collections.py: Provenance, ChannelEntry, CollectionExperiment, BenchmarkCollection + load_collection - specs.py: BenchmarkSpec + load_benchmark_spec No registry dict, no import-time side effects. Callers pass paths explicitly. Add pydantic>=2 and omegaconf to base dependencies. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Reuse viscy_data.collection.ChannelEntry instead of duplicating it - Extract shared OmegaConf+Pydantic loading to _yaml.load_yaml() - Keep Provenance local (stricter than viscy_data version: required created_at/created_by for benchmark traceability) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Migrate tables, figures, and Hydra CLI entry point for benchmark reporting. tables.py is pure pandas, figures.py preserves Agg backend ordering, cli.py uses parents[3] for config resolution. Tests use module-level pytest.importorskip() for pandas/matplotlib so they skip gracefully without those deps installed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Migrate rewrite_zarr() and load_preprocess_config() — the clearly reusable, dependency-light preprocess helpers. Dataset-specific utilities (selection, segmentation, workflow) remain in dynacell-paper. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Lightning subcommands (fit, predict, test, validate) delegate to viscy_utils.cli.main(). Hydra subcommands (evaluate, report) lazily import their entry points. ModuleNotFoundError prints an install hint for the missing extra instead of a raw traceback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Full evaluation pipeline: pixel metrics (PCC, SSIM, NRMSE, PSNR), segmentation metrics (Dice, IoU), feature metrics (DINOv3, DynaCLR FID), spectral PCC subpackage, and Hydra CLI entry point. Heavy deps (segmenter_model_zoo, aicssegmentation, cubic, microssim, transformers, dynaclr, skimage) gated with try/except ImportError. Pipeline imports segmentation/utils lazily inside functions so `import dynacell.evaluation.pipeline` succeeds without heavy deps. eval.yaml uses OmegaConf ??? markers for all required fields so running without overrides gives MissingMandatoryValue, not AttributeError. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- eval: all deps for dynacell evaluate (segmentation, metrics, cubic, transformers, dynaclr, etc.) - report: pandas + matplotlib + hydra for dynacell report - preprocess: iohub + tqdm for zarr rewriting utilities Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Restructure applications/dynacell/configs/benchmarks/virtual_staining/ so train, predict, and eval leaves for the same benchmark cell live under one directory `<org>/<train_set>/<model>/`, with shared building blocks split by consumer: - `shared/model/` — train/predict building blocks (train_sets, predict_sets, targets, model_overlays, launcher_profiles) - `shared/eval/` — HPC-bound Hydra eval groups (target, feature_extractor/dynaclr) - `<org>/<train_set>/<model>/{train.yml, predict/<predset>.yml, eval/<predset>.yaml}` Fold `configs/evaluation/` into the benchmark tree and delete the sibling root. The eval selector renames from `benchmark=<path>` to `leaf=<path>` to reflect the new filesystem model; leaves are addressed through a committed symlink tree at `virtual_staining/leaf/` because Hydra's group resolution requires a physical `leaf/` prefix. Eval leaves keep `.yaml` (Hydra only discovers `.yaml` for groups); train and predict leaves stay `.yml`. `dynacell.__main__` now injects two `hydra.searchpath` roots — the benchmark tree (for `leaf/`) and `shared/eval/` (for `target/` and `feature_extractor/dynaclr/`). The `_EXTERNAL_CONFIGS_SUBPATH` constant becomes `_EXTERNAL_SEARCHPATHS`, and the helper is renamed to `_external_configs_dirs` (plural, returning a list). Also fixes a pre-existing bug: the three unetvit3d train leaves under mito/nucleus/membrane referenced a non-existent `runtime_single_gpu.yml` launcher profile; they now use `runtime_shared.yml`. Model-overlay `base:` entries targeting `recipes/` gain an extra `..` segment to account for the new `shared/model/` depth. Tests: - test_benchmark_config_composition.py: path literals, three new (unetvit3d × {mito,nucleus,membrane}) pairs, new test_eval_leaf_symlink_resolves parametrised over all 8 eval leaves. - test_cli_routing.py: three `_external_configs_dir` patches renamed (plural) and argv selector strings updated to `leaf=`. - test_submit_benchmark_job.py: parametrised leaf paths. Docs: virtual_staining/README.md (full rewrite to match new tree), evaluation/README.md (groups table, external-users section, benchmark eval leaves section), UNEXT2_VS_FCMAE_CLASSES.md (stale path), dynacell/ README.md (sbatch example path), BENCHMARK_CONFIG_SCHEMA.md (Hydra search-path contract note about the `leaf/` symlink adapter). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…l/ move The preceding refactor commit renamed train and predict leaves to <org>/<train_set>/<model>/{train.yml, predict/<predset>.yml} but the content updates that prepend the shared/model/ segment to every base: line were unstaged when that commit landed. This catches the 25 train and predict leaves up with the new shared/model/ layout, and tightens a prose comment in fnet3d_paper_fit.yml that still referenced the pre-refactor shared/targets/ path. Without this fix a fresh checkout of the reorg commit has train and predict leaves pointing at shared/train_sets/, shared/targets/, etc. — paths that no longer exist — so `load_composed_config` would raise FileNotFoundError on every leaf. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Trim the WHAT-narration from docstrings and comments in dynacell.__main__: drop the redundant "two roots are injected" enumeration (the tuple is self-describing), fix the "repo root" claim in _external_configs_dirs (the walk stops at the app's pyproject.toml, not the workspace root), and drop the "joined with commas" line from _inject_external_configs' docstring. Bind ``p = parent / sub`` via walrus inside the existence filter so the path is computed once instead of twice per subpath. In the composition tests, delete the ``EVAL_LEAVES = PREDICT_LEAVES`` alias (the aliased list was the same object, causing silent coupling) and drive the symlink-integrity test off ``PREDICT_LEAVES`` directly. Drop the ``@package _global_`` header assertion from the symlink test — that check belongs to composition, not to symlink integrity. Add a ``REPO_ROOT`` sanity assert so drift of the ``parents[3]`` magic count fails loudly. Tighten ``TestInjectExternalConfigs`` class docstring to a single line. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replace the "in-package / repo checkout" shorthand in the config-groups table with concrete source paths, so readers see exactly where each group lives after the reorg: shared/eval/ for HPC-bound groups, leaf/ for the eval-leaf symlink tree, _configs/ for packaged schema. Tighten the "Running an evaluation" intro so it no longer implies all groups live under _configs/, and spell out the two hydra.searchpath roots dynacell.__main__ injects. Add a footgun entry for external users about Hydra's .yaml-only group resolution — saving a group as .yml makes the selector silently fail with MissingConfigException. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Keep the top level of virtual_staining/ biology-only: `er/`, `membrane/`, `mito/`, `nucleus/` plus README.md. Everything else — shared building blocks and the leaf/ Hydra symlink adapter — now lives under a single _internal/ directory whose leading underscore signals "implementation detail; don't browse here for science." Layout is now: virtual_staining/ <org>/<train_set>/<model>/{train.yml, predict/, eval/} _internal/ shared/ model/ # train/predict building blocks eval/ # target/, feature_extractor/dynaclr/ leaf/ # symlink tree aliasing canonical eval leaves dynacell.__main__ injects two hydra.searchpath roots under _internal/ (the tree root for leaf/, and shared/eval/ for target + dynaclr) — same cardinality as before, just under the hidden support tree. Train/predict leaves get `_internal/` prepended to every base: path. Model overlays gain one more `../` to reach configs/recipes/ (now 6 segments deep instead of 5). The 8 leaf/ symlinks are recreated with 6 relative-dot segments (was 5 at virtual_staining/leaf/, now 6 at virtual_staining/_internal/leaf/). Docs refreshed across BENCHMARK_CONFIG_SCHEMA.md, virtual_staining/ README.md, and evaluation/README.md: every layout tree, base: example, group-source column, and search-path reference points at _internal/. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two related bugs in predict-path tiling: P1. DynacellFlowMatching.predict_step routed predict_method='sliding_window' to CELLDiff3DVS.generate_sliding_window(), which is the non-overlapping tiler and does not consume predict_overlap. Any config that set both predict_method='sliding_window' and predict_overlap=[...] silently got non-overlapping tiles, producing different predictions than the user expected. Now raise ValueError when sliding_window is combined with a non-zero overlap, pointing users to predict_method='iterative' (the overlap-anchored velocity tiler) instead. Also refresh the docstring to spell out which methods consume predict_overlap. Fix the membrane predict leaf to match its peers (er, mito): predict_method=iterative. Example config at configs/examples/celldiff/predict.yml gets the same fix plus an inline comment clarifying which methods overlap. P2. DynacellUNet.predict_sliding_window allocated both accumulator buffers (prediction_sum, prediction_count) with zeros_like(source), so their channel dimension was always in_channels. For models whose out_channels != in_channels (common: 1 phase in -> 2 target out), the first `prediction_sum[slicer] += patch_out` raised a broadcast error. Allocate the accumulators lazily from the first patch output so they take the model's actual out_channels and dtype. Regression tests added for both paths: - sliding_window + non-zero overlap must raise ValueError - predict_sliding_window returns the correct shape when out_channels=2 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The earlier "always append" strategy broke argparse when users placed diagnostic flags (-c job, --info, --help, ...) after positional overrides: $ dynacell evaluate leaf=er/... -c job dynacell: error: unrecognized arguments: hydra.searchpath=[file://...] Hydra's argparse declares ``overrides`` as a single positional with nargs="*". Once that nargs run is interrupted by a flag, any later positional is reported unrecognized. Appending the injected token at the end scatters a positional across the flag in the override-first layout; prepending scatters it across the flag in the flag-first layout. Fix: insert the token adjacent to an existing positional override when one is present, so positionals stay contiguous whichever side the flags sit on. Fall back to append when the user passed no positional overrides (e.g. bare --help). Three regression tests cover the three layouts: flag-leading, flag-trailing, and no-positional. The former "appends always" test is replaced by the layout-specific pair. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Tighten the "Running an evaluation" lead so readers see up front that repo-checkout groups live under virtual_staining/_internal/ (the hidden support tree), not loose under virtual_staining/. The detailed bullet list below already spelled this out; the intro now matches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 191 out of 201 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-21T05:52:46Z

+    if target_name in ["nucleus", "membrane"]:
+        _require_segmenter_model_zoo()
+        if seg_model is None:
+            raise ValueError("SegModel must be provided for nucleus and membrane segmentation.")
+        mask = seg_model.apply_on_single_zstack(img[None, ...])


The ValueError message for missing seg_model says "SegModel must be provided", but the parameter is seg_model and the required type here is the loaded SuperModel. Updating the message to reference seg_model/SuperModel will make the failure clearer for users.

Lightning's SLURMEnvironment rejects --ntasks at runtime (slurm.py:_validate_srun_variables) and demands --ntasks-per-node. Yesterday's fix emitted the wrong directive; all four FCMAE jobs failed within 50 s on the cluster. Switch the sbatch generator, hardware profiles, pre-submit validator, and invariant test to use ntasks_per_node throughout. The enforced invariant is now devices == ntasks_per_node and gpus == nodes × devices. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Gated HF repos (DINOv3) need per-user access grants, but the downloaded weights are identical. Before this change every team member who ran `dynacell evaluate compute_feature_metrics=true` re-downloaded ~2 GB of DINOv3 weights into their own `~/.cache/huggingface/`. Even ignoring the network cost, only one team member has HF gate access right now — every other member's runs 403 at model fetch. Fix: point HF_HOME at a team-shared project-storage dir. First user with gated access downloads once; everyone else reads the shared copy. Two layers so the behavior is automatic regardless of invocation: 1. `dynacell.__main__._maybe_set_shared_hf_cache()` runs before Hydra subcommands (evaluate, precompute-gt, report). It defaults HF_HOME to `/hpc/projects/comp.micro/virtual_staining/models/dynacell/evaluation/hf_cache/` when three conditions hold: the user hasn't set HF_HOME themselves, we're on a repo checkout (same signal as external configs), and the shared dir exists on this machine. Wheel installs and non-HPC machines fall through to the normal ~/.cache/huggingface default. 2. `runtime_shared.yml` also sets HF_HOME for sbatch-rendered fit / predict jobs. Those jobs don't currently load HF models, but having the env var declared in the launcher profile future-proofs any later workflow that does and keeps the team convention visible in config. Four regression tests cover the auto-setter: user-set wins, wheel install skips, missing shared dir skips, repo checkout + existing dir sets. README documents the mechanism and first-time download contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three defaults were required on every eval / precompute-gt invocation even though they are effectively fixed for our benchmark set: feature_extractor/dinov3=lvd1689m (only DINOv3 we evaluate with) feature_extractor/dynaclr=default (team DynaCLR checkpoint) io.gt_cache_dir=/hpc/.../eval_cache/<TARGET> (one dir per target) Hoist them so the minimal command dynacell precompute-gt target=er_sec61b predict_set=ipsc_confocal composes everything needed for all four cache families, and dynacell evaluate leaf=<...> carries the same defaults without the redundant override directives. - eval.yaml defaults list switches optional feature_extractor/dinov3: null -> lvd1689m optional feature_extractor/dynaclr: null -> default The `optional` keyword means wheel installs that don't ship dynaclr=default (it lives in the repo's shared/eval/ tree) silently skip the default — wheel users supply their own via --config-dir. - Each target YAML now carries io.gt_cache_dir for its target's cache path (SEC61B, TOMM20, nucleus, membrane). - The 8 canonical eval leaves drop the redundant override /feature_extractor/dinov3: lvd1689m override /feature_extractor/dynaclr: default and the per-leaf io.gt_cache_dir — all now inherited. - README updated: "Enable feature metrics" and "Priming the cache" examples show the one-line minimal forms. To swap variants at call-time nothing changes: dynacell evaluate ... feature_extractor/dinov3=<other_variant> dynacell evaluate ... feature_extractor.dynaclr.checkpoint=/other.ckpt dynacell evaluate ... feature_extractor/dinov3=null # disable Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Three issues from /simplify agents: - Hardcoded HPC path in _SHARED_HF_CACHE makes the module CZ Biohub- specific; other sites couldn't override without editing source. Read DYNACELL_SHARED_HF_CACHE at module load and fall back to the CZ Biohub default only when the env var is absent. Extension story is now: set DYNACELL_SHARED_HF_CACHE in your shell / sbatch env block and the rest of the mechanism just works. - Docstring on _maybe_set_shared_hf_cache named the sibling helper _external_configs_dirs; rephrase in semantic terms ("external Hydra searchpaths resolve") so the docstring doesn't rot if the detection mechanism ever changes. - Four test methods in TestMaybeSetSharedHfCache imported `os` inline instead of at module top (violates CLAUDE.md). Plus a redundant monkeypatch.delenv() cleanup at the end of the last test that pytest's monkeypatch fixture already handles on teardown. Lift `import os` to module top, drop the inline imports, drop the redundant cleanup. Skipped /simplify nitpicks: - functools.cache on _external_configs_dirs (6-8 stat calls saved per CLI invocation; not worth the cognitive cost of a cache that has to coexist with tests that patch the function reference) - Parametrize the 4 tests (4 is small enough to leave explicit) - Inline comment on opaque group names (marginal signal) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Setting HF_HOME to the team-shared project-storage dir inadvertently broke per-user gated-repo authentication: HF_HOME relocates the entire HF directory tree including the auth token file. The first precompute attempt with DINOv3 enabled came back 401 "Access to model ... is restricted. You must have access to it and be authenticated to access it" — not a missing grant (the user had been granted access, which would have returned 403), but a missing token. HF was looking under $HF_HOME/token, where no token file lives. Switch to HF_HUB_CACHE, which only relocates the weights/datasets cache and leaves auth state at its default ~/.cache/huggingface/token. Each user's personal token now enforces gate ACLs on their own account; the cache of downloaded weights is shared on project storage as intended. The _maybe_set_shared_hf_cache helper and its four tests flip from HF_HOME → HF_HUB_CACHE. The runtime_shared.yml launcher env block does the same. README gains a paragraph explaining *why* we chose HF_HUB_CACHE over HF_HOME, and clarifies that subsequent reads from the shared cache are plain disk reads that don't round-trip to HF and so don't need a token at all. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Previous 4-GPU FCMAE runs OOM-killed before first training step. On gpu-c-1 sstat showed MaxVMSize=103G/rank at startup — 4 ranks × 103G ≈ 412G of virtual memory, plus mmap_preload /dev/shm staging, pushed past the 512G cgroup cap. num_workers=8 per rank meant 32 dataloader workers inflating per-rank VM before the first step. Halving to 4 cuts startup VM roughly in half while keeping prefetch above the single-worker bottleneck. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…odel>/ Regroup the virtual_staining benchmark tree so a training experiment lives in one directory. Paths go from <org>/<train_set>/<model>/ with separate train/predict/eval subdirs to <org>/<model>/<train_set>/ with flat predict__<predict_set>.yml and eval__<predict_set>.yaml filenames. The old layout assumed each (org, model) cell had exactly one training run. New research requirements — joint ipsc_confocal + a549_mantis training, plus evaluating each trained model on both held-out splits — make train_set an index within a cell rather than a hierarchy. Grouping by train set puts a training run (train + its predicts + its evals) in a single subdir so adding or removing an experiment is one `mkdir` or one `rm -r`. 33 leaf moves (17 train + 8 predict + 8 eval), 8 _internal/leaf symlink rewrites (5-up target instead of 6-up, new eval__<X>.yaml filename), predict base: paths lose one ../ (depth 4 -> 3), tests + tool docstring + 4 READMEs track the new layout. New selectors: leaf=er/celldiff/ ipsc_confocal/eval__ipsc_confocal. No __main__.py code change — search path roots (_internal/ and _internal/shared/eval/) are unaffected. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ext early return, fill eval defaults - nrmse/psnr/ssim now normalize each image independently (min-max) instead of jointly - init_cache_context early return (no gt_cache_dir) was missing dinov3/dynaclr fields - eval.yaml: fill in default dinov3 model name and dynaclr encoder config Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…aves Stage 1 of the A549 expansion roadmap. Benchmark leaves now declare `benchmark.dataset_ref: {dataset, target}` instead of hardcoding `data_path`, `source_channel`, and `target_channel` per target. A composition-time resolver reads a Pydantic DatasetManifest YAML from roots configured via (in order) CLI roots → DYNACELL_MANIFEST_ROOTS env var → entry points under `dynacell.manifest_roots`. The resolver is a plain callable threaded through `viscy_utils.compose.load_composed_config` as a keyword-only `resolver=` kwarg so viscy-utils stays generic. Partial `dataset_ref` is a strict no-op: leaves that inherit only one half (e.g. `{dataset: aics-hipsc}` from the shared ipsc_confocal train_set) compose byte-identically until their target fragment is also migrated. This makes Stage 2 (per-organelle migration) safe to roll out incrementally. Migrates the full ER-on-iPSC slice as a proof of end-to-end parity: targets/er_sec61b.yml, train_sets/ipsc_confocal.yml, predict_sets/ipsc_confocal.yml, and both ER predict leaves (celldiff, unetvit3d). Non-ER leaves inherit partial refs and are untouched until their targets migrate. Shared `source_channel: Phase3D` stays in the ipsc_confocal fragments; the resolver tolerates matching values and raises only on mismatch, which keeps non-ER leaves composing correctly during the staged rollout. The Lightning branch of dynacell's `main_cli` catches NoManifestRootsError / ManifestNotFoundError / TargetNotFoundError and prints the message to stderr with exit code 2 — no stack trace for user misconfiguration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

A549_EXPANSION_ROADMAP.md sequences the 6-stage rollout that moves the benchmark tree from hardcoded per-target paths to manifest-driven resolution, so iPSC confocal and a549 mantis datasets can coexist without path duplication. DATASET_REF_RESOLVER_SPEC.md is the authoritative spec for the Stage 1 resolver that just landed in 0aa7e1a. Both docs live in-tree so future readers can reconstruct why the resolver exists and how the remaining stages (per-target migration, eval-side resolution, eval target migration, a549 manifest, a549 leaf expansion) should land. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ot fixture Resolve drift from the Stage 1 plan: source_channel moves out of the shared ipsc_confocal train/predict fragments and onto each target fragment (mito/nucleus/membrane), since dataset_ref is the single source of truth for data_path/source_channel/target_channel on migrated ER leaves. Restore the strict collision check that raises whenever a composed fragment also declares a resolved field, dropping the agreement-tolerant transitional policy. Also: cache load_manifest with lru_cache (same policy as _load_yaml_cached); move DatasetManifest.source_channel validation onto the schema; delete the unused _DATA_FIELDS constant; rename a shadowing loop variable in the conflict-details comprehension; switch DYNACELL_MANIFEST_ROOTS parsing to os.pathsep; consolidate the autouse manifest-root fixture into conftest.py. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Mirrors the companion dynacell-paper manifest additions so that Stage 2 resolver tests can splice data_path / target_channel for the two multi-marker targets derived from cell.zarr. No production-side effect — the fixture is consumed only by the autouse test conftest. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Stage 2 of the A549 expansion roadmap. Each target fragment now declares benchmark.dataset_ref.target and drops its hardcoded data_path / source_channel / target_channel; the resolver splices concrete values from the manifest (either the repo-local fixture in tests or the dynacell-paper canonical manifest at runtime). The 6 non-ER predict leaves drop their explicit data.init_args.data_path for the same reason — strict collision would otherwise raise. Tests swap the mito-only partial-ref no-op parity pair for parametrized resolver tests that cover every model in TRAIN_LEAVES / PREDICT_LEAVES composing a migrated fragment (pytest.skip handles missing FCMAE leaves for nucleus/membrane). The roadmap Stage 2 paragraph is updated to match the batched single-PR shape. Requires the dynacell-paper companion PR to land before production nucleus/membrane runs — VisCy CI is self-contained via the fixture. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…d organelles Collapses the ER-specific train/predict resolver tests and the parametrized non-ER block into unified table-driven tests keyed off a single _MIGRATED_TARGET_INFO dict. Eliminates duplicate skip/fixture setup and the N=5 model sub-parametrization that existed solely to accommodate pytest.skip for non-existent leaves. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Extends StoreLocations with gt_cache_dir: Path | None, and extends ResolvedDataset + resolve_dataset_ref to surface cell_segmentation_path and gt_cache_dir alongside the existing manifest fields. Adds a Hydra-side apply_dataset_ref hook in evaluation/_ref_hook.py that mirrors the Lightning-side compose hook: partial refs are no-ops, collisions raise ValueError, and _splice writes io.* + derived pred_channel_name + pixel_metrics.spacing into a DictConfig. Introduced as unused code — the hook is only wired at eval/precompute entry points in the follow-up commit, which keeps this change bisect-safe. Fixture manifest now carries gt_cache_dir on all four targets so the next commit's integration tests can assert resolved values byte-for-byte against the pre-migration hardcoded paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wires apply_dataset_ref at the top of evaluate_model and precompute_gt, and extends the __main__ Hydra branch with the same ManifestNotFound/NoManifestRoots/TargetNotFound catch the Lightning branch already has. Sets HYDRA_FULL_ERROR=1 so @hydra.main does not swallow resolver errors before our clean SystemExit(2) catch. Migrates all four eval target YAMLs (er_sec61b, mito_tomm20, nucleus, membrane) plus predict_set/ipsc_confocal.yaml together so io.* and pixel_metrics.spacing come from the manifest via benchmark.dataset_ref. Adds benchmark: null placeholder to eval.yaml so struct-mode merges accept the new key. Nucleus and membrane share cell.zarr but get distinct gt_cache_dir paths, keyed by organelle not channel. Test coverage: integration tests compose every canonical eval leaf and verify splice output (Layer 1) and invoke evaluate_model.__wrapped__ / precompute_gt.__wrapped__ to prove the hook is actually wired at both @hydra.main entry points (Layer 2). Router tests assert clean SystemExit(2) with the expected stderr for all three resolver error classes on both evaluate and precompute-gt. Companion dynacell-paper manifest bump adds gt_cache_dir to the canonical aics-hipsc manifest; safe to merge either order because Pydantic extra='ignore' makes the field optional. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Factors the partial-ref no-op + Pydantic validation pipeline shared by _compose_hook and _ref_hook into a single dataset_ref_from_dict helper on dynacell.data.resolver, so REQUIRED_REF_KEYS lives in exactly one place and both hook surfaces use identical semantics (ValidationError propagates as-is rather than being rewrapped). Restores the caller's struct-mode state in _splice via try/finally so the hook no longer silently unlocks the whole Hydra config for the rest of evaluate_model / precompute_gt. Test cleanup: promotes clear_global_hydra to a shared conftest fixture (deletes the two local copies), drops the redundant DYNACELL_MANIFEST_ROOTS setenv calls now that conftest autouses it, removes the dead _FIXTURE_ROOT module constants, drops the _clear_manifest_cache autouse (the path-keyed lru cache is already stable across tests), and aligns the null-dataset scenario on the pydantic.ValidationError the hook now surfaces. YAML cleanup: deletes the WHAT-narrating header blocks from the four migrated eval target YAMLs, predict_set/ipsc_confocal.yaml, and eval.yaml — the behavior is already visible from dataset_ref + the hook module, so the comments added noise without WHY. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ints/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

alxndrkalinin and others added 30 commits April 7, 2026 14:28

fix: remove stale crop_at_read reference from error message

e05d21d

crop_at_read was removed in the mmap preload refactor but this error message still referenced it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: update CellDiff SEC61B config comment and add SLURM script

37764bb

Spell out full read/input sizes in header comment. Add single-GPU SLURM script for A100/H100 (80GB). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: use /dev/shm for CellDiff SEC61B mmap preload

6af4055

Add scratch_dir: /dev/shm and /dev/shm cleanup to SLURM trap, matching the UNeXt2 config. 256G RAM allocation covers the 86 GB mmap buffer in tmpfs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore: reduce CellDiff SEC61B batch_size from 8 to 2

50137b3

ViT with 8×512×512 patches is significantly larger than UNeXt2; batch_size=8 OOMs on a single GPU. Reduce to 2 (1 FOV → 2 patches). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs: use modern typing notation in ckpt_path docstring

5394307

`str or None` → `str | None` for consistency with the rest of the file and project conventions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

alxndrkalinin and others added 9 commits April 20, 2026 18:47

docs(configs): remove stale benchmark schema doc

94ed736

alxndrkalinin requested a review from Copilot April 21, 2026 05:47

Copilot started reviewing on behalf of alxndrkalinin April 21, 2026 05:48 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

alxndrkalinin and others added 18 commits April 21, 2026 08:09

feat(dynacell): add evaluation scripts and CUDA envrc, ignore checkpo…

46be278

…ints/ Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor applications/DynaCell into a reusable benchmark package#404

Refactor applications/DynaCell into a reusable benchmark package#404
alxndrkalinin wants to merge 164 commits intomodular-viscy-stagingfrom
dynacell-models

alxndrkalinin commented Apr 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alxndrkalinin commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Config and package reorg

Absorb reusable benchmark modules from dynacell-paper

Package and CLI shape

Runtime, training, and transform fixes

Evaluation and reporting follow-up fixes

Benchmark schema + launcher (landed earlier on this branch)

Topology / trainer-recipe ownership cleanup

Dependency and packaging updates

Why

Verification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alxndrkalinin commented Apr 7, 2026 •

edited

Loading

Absorb reusable benchmark modules from `dynacell-paper`