Project-MONAI · aylward · May 19, 2026 · May 18, 2026 · May 19, 2026 · May 19, 2026
diff --git a/.agents/agents/testing.md b/.agents/agents/testing.md
@@ -12,14 +12,18 @@ pytest tests that exercise the library's scientific pipelines.
 - `tests/conftest.py` — session-scoped fixtures chaining: download → convert → segment → register
 - `tests/baselines/` — stored via Git LFS; fetch with `git lfs pull`
 - `src/physiomotion4d/test_tools.py` — baseline comparison utilities
-- Markers: `slow`, `requires_gpu`, `requires_data`, `experiment`
+- Markers (all opt-in via `--run-<bucket>`): `slow`, `requires_gpu`,
+  `requires_simpleware`, `experiment`, `tutorial`. Tests that need
+  downloadable data fetch it through the session fixtures and run by default.
 
 ## Run commands (use `py`, not `python`)
 
 ```bash
-py -m pytest tests/ -m "not slow and not requires_data" -v   # fast, recommended
+py -m pytest tests/ -v                                        # fast, recommended (slow/GPU/etc auto-skipped)
 py -m pytest tests/test_contour_tools.py -v                   # single file
-py -m pytest tests/test_contour_tools.py::TestContourTools -v      # single class
+py -m pytest tests/test_contour_tools.py::TestContourTools -v # single class
+py -m pytest tests/ -v --run-slow                             # opt into slow tests
+py -m pytest tests/ -v --run-gpu --run-slow                   # typical local GPU profile (CI runner adds --run-simpleware --run-experiments --run-tutorials)
 py -m pytest tests/ --create-baselines                        # create missing baselines
 ```
 
@@ -28,7 +32,9 @@ py -m pytest tests/ --create-baselines                        # create missing b
 1. Read the implementation file first; understand the public interface.
 2. Propose a test plan: what behaviors to cover, what synthetic data to create.
 3. Build synthetic `itk.Image` objects or small `pv.PolyData` surfaces — 32–64 voxels/side.
-   Never depend on real data unless unavoidable; mark those `@pytest.mark.requires_data`.
+   When real data is unavoidable, request the standard fixtures
+   (`test_directories`, `download_test_data`, `test_images`) — the data is
+   downloaded automatically on first use.
 4. State image shape and axis order in the test docstring:
    e.g. `"""...image shape: (64, 64, 32), axes: X, Y, Z."""`
 5. Use `test_tools.py` baseline utilities for surface and image regression checks.

diff --git a/.agents/skills/test-feature/SKILL.md b/.agents/skills/test-feature/SKILL.md
@@ -1,5 +1,5 @@
 ---
-description: Inspect a PhysioMotion4D implementation and its existing tests, propose a synthetic-data test plan, then create or update pytest tests. Explains how to run them.
+description: Inspect a PhysioMotion4D implementation and its existing tests, propose a real-data-driven test plan with baseline comparisons, then create or update pytest tests. Explains how to run them.
 ---
 
 Write or update tests for the following in PhysioMotion4D:
@@ -9,10 +9,38 @@ $ARGUMENTS
 Instructions:
 1. Read the implementation file(s) to understand the public interface.
 2. Read the existing test file for this module if one exists (e.g. `tests/test_<module>.py`).
-3. Propose a test plan: list the behaviors to cover and the synthetic data to create.
-4. Implement tests using synthetic `itk.Image` objects (32–64 voxels/side) or small
-   `pv.PolyData` surfaces — not real patient data.
-5. State image shape and axis order in every test docstring.
-6. Mark any test that genuinely requires real data with `@pytest.mark.requires_data`.
-7. Show the exact command to run the new tests:
-   `py -m pytest tests/test_<module>.py -v`
+3. Propose a test plan: list the behaviors to cover and the inputs each behavior
+   needs.
+4. **Strongly prefer real (downloaded) test data over synthetic data.** Request
+   the session fixtures (`test_directories`, `download_test_data`,
+   `test_images`) so the standard test datasets are pulled automatically on
+   first use. Real data exercises the production code paths — preprocessing,
+   resampling, dtype handling, world-frame metadata — that synthetic toy
+   volumes silently bypass. Only fall back to synthetic `itk.Image` or
+   `pv.PolyData` inputs when:
+     - the behavior under test is a pure unit (e.g. axis arithmetic, dict
+       routing) where real data adds no signal, or
+     - real data would push the test into a slow / GPU / Simpleware bucket
+       that doesn't fit the test's purpose.
+   When using synthetic inputs anyway, keep volumes ≤64 voxels per side and
+   say so in the docstring.
+5. **When a test produces an image or surface as output, compare against a
+   baseline** using the `test_tools.py` utilities (e.g. `TestTools`) rather
+   than ad-hoc value assertions. Store baselines under `tests/baselines/`
+   (Git LFS-tracked). Run with `--create-baselines` to materialize missing
+   baselines on first use; afterward, regression compares to the stored
+   baseline. This catches drift that hand-written numeric thresholds miss.
+6. State image shape and axis order in every test docstring (e.g.
+   `"""...image shape: (X, Y, Z, T) = (64, 64, 32, 1), LPS world frame."""`).
+7. Mark tests that need a GPU, a slow runtime, or a licensed Simpleware
+   install with `@pytest.mark.requires_gpu`, `@pytest.mark.slow`, or
+   `@pytest.mark.requires_simpleware` so they fall into the right opt-in
+   bucket (`--run-gpu`, `--run-slow`, `--run-simpleware`). Tests that just
+   need downloadable data need **no** marker — the fixture chain handles it.
+8. Show the exact command to run the new tests, including any opt-in flags
+   the markers require. Examples:
+   - `py -m pytest tests/test_<module>.py -v`
+   - `py -m pytest tests/test_<module>.py -v --run-slow`
+   - `py -m pytest tests/test_<module>.py -v --run-gpu --run-slow`
+   - `py -m pytest tests/test_<module>.py --create-baselines` (first run, to
+     materialize new baselines)
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -8,9 +8,13 @@ name: CI
 # - GPU tests: Self-hosted runners with CUDA support
 # - Code quality: Linting and formatting checks
 #
-# Test markers:
-# - requires_data: Tests that need external data downloads
-# - slow: Tests that are computationally intensive or require GPU
+# Test markers (gated by opt-in pytest flags; default = skip):
+# - slow            -> --run-slow
+# - requires_gpu    -> --run-gpu
+# - requires_simpleware -> --run-simpleware  (also implies GPU)
+# - experiment      -> --run-experiments
+# - tutorial        -> --run-tutorials
+# Tests that need external data download it automatically via fixtures.
 
 on:
   push:
@@ -101,16 +105,16 @@ jobs:
       run: |
         pip list
 
-    - name: Run unit tests (fast, no external data) - Ubuntu
+    - name: Run unit tests (fast, no GPU/slow/experiment) - Ubuntu
       if: matrix.os == 'ubuntu-latest'
       run: |
         xvfb-run -a --server-args="-screen 0 1024x768x24" \
-          pytest tests/ -v -m "not slow and not requires_data and not experiment" --cov=physiomotion4d --cov-report=xml --cov-report=term --cov-report=html
+          pytest tests/ -v --cov=physiomotion4d --cov-report=xml --cov-report=term --cov-report=html
 
-    - name: Run unit tests (fast, no external data) - Windows
+    - name: Run unit tests (fast, no GPU/slow/experiment) - Windows
       if: matrix.os == 'windows-latest'
       run: |
-        pytest tests/ -v -m "not slow and not requires_data and not experiment" --cov=physiomotion4d --cov-report=xml --cov-report=term --cov-report=html
+        pytest tests/ -v --cov=physiomotion4d --cov-report=xml --cov-report=term --cov-report=html
 
     - name: Upload coverage to Codecov
       uses: codecov/codecov-action@v4
@@ -224,13 +228,13 @@ jobs:
 
     - name: Run contour tools tests
       run: |
-        pytest tests/test_contour_tools.py -v -m "not slow" --cov=physiomotion4d --cov-append --cov-report=xml
+        pytest tests/test_contour_tools.py -v --cov=physiomotion4d --cov-append --cov-report=xml
       continue-on-error: true
 
     - name: Run USD conversion tests
       run: |
         xvfb-run -a --server-args="-screen 0 1024x768x24" \
-          pytest tests/test_convert_vtk_to_usd_polymesh.py -v -m "not slow" --cov=physiomotion4d --cov-append --cov-report=xml
+          pytest tests/test_convert_vtk_to_usd_polymesh.py -v --cov=physiomotion4d --cov-append --cov-report=xml
       continue-on-error: true
 
     - name: Run USD utility tests
@@ -242,7 +246,7 @@ jobs:
     - name: Run all integration tests
       run: |
         xvfb-run -a --server-args="-screen 0 1024x768x24" \
-          pytest tests/ -v -m "not slow and not experiment and not requires_gpu"
+          pytest tests/ -v
       continue-on-error: true
 
     - name: Upload coverage to Codecov
@@ -325,8 +329,12 @@ jobs:
         pip list
 
     - name: Run GPU tests
+      # External self-hosted GPU runner: enable every opt-in bucket.
+      # Tests whose host requirements (e.g. a licensed Simpleware install)
+      # aren't met on the runner will runtime-skip cleanly via their
+      # internal availability guards.
       run: |
-        pytest tests/ -v -m "not slow and not experiment" --cov=physiomotion4d --cov-report=xml --cov-report=term --cov-report=html
+        pytest tests/ -v --run-gpu --run-slow --run-simpleware --run-experiments --run-tutorials --cov=physiomotion4d --cov-report=xml --cov-report=term --cov-report=html
       env:
         CUDA_VISIBLE_DEVICES: 0
 
@@ -404,11 +412,14 @@ jobs:
 #   They execute end-to-end workflows that may take multiple hours
 #
 # To run locally:
-#   pytest tests/ -v -m "slow"  # Run all slow tests
-#   pytest tests/test_register_images_ants.py -v
-#   pytest tests/test_register_images_icon.py -v
-#   pytest tests/test_segment_chest_total_segmentator.py -v
+#   pytest tests/ -v --run-slow                 # Run all slow tests
+#   pytest tests/ -v --run-gpu --run-slow       # GPU + slow (typical local dev profile)
+#   pytest tests/ -v --run-simpleware --run-gpu --run-slow  # Full Simpleware coverage
+#   pytest tests/test_register_images_ants.py -v --run-slow
+#
+# Self-hosted GPU runner enables ALL buckets:
+#   --run-gpu --run-slow --run-simpleware --run-experiments --run-tutorials
 #
 # To run experiment tests (manual only, extremely slow):
-#   pytest tests/test_experiments.py -v -m experiment
-#   pytest tests/test_experiments.py::test_experiment_heart_gated_ct_to_usd -v
+#   pytest tests/test_experiments.py -v --run-experiments
+#   pytest tests/test_experiments.py::test_experiment_heart_gated_ct_to_usd -v --run-experiments
diff --git a/.github/workflows/nightly-health.yml b/.github/workflows/nightly-health.yml
@@ -108,7 +108,7 @@ jobs:
         # The step outcome (success/failure) is still captured and passed downstream.
         continue-on-error: true
         run: |
-          pytest tests/ -v --run-experiments `
+          pytest tests/ -v --run-experiments --run-tutorials --run-slow --run-gpu --run-simpleware `
             --cov=physiomotion4d `
             --cov-report=xml `
             --cov-report=json `

diff --git a/.github/workflows/test-slow.yml b/.github/workflows/test-slow.yml
@@ -72,8 +72,12 @@ jobs:
         "
 
     - name: Run slow tests
+      # Self-hosted GPU runner: enable every opt-in bucket.
+      # Tests whose host requirements (e.g. a licensed Simpleware install)
+      # aren't met on the runner will runtime-skip cleanly via their
+      # internal availability guards.
       run: |
-        pytest tests/ -v -m "slow and not experiment" --cov=physiomotion4d --cov-report=xml --cov-report=term
+        pytest tests/ -v --run-slow --run-gpu --run-simpleware --run-experiments --run-tutorials --cov=physiomotion4d --cov-report=xml --cov-report=term
       env:
         CUDA_VISIBLE_DEVICES: 0
 

diff --git a/AGENTS.md b/AGENTS.md
@@ -29,7 +29,11 @@ Non-Python tools used by contributor workflows:
   `self.log_debug()` — never `print()`. Standalone scripts may use `print()`.
 - Single quotes for strings; double quotes for docstrings. 88-char line limit.
 - Full type hints (`mypy` strict). Use `Optional[X]` not `X | None`.
-- Run `py -m pytest tests/ -m "not slow and not requires_data" -v` to verify changes.
+- Run `py -m pytest tests/ -v` to verify changes. Slow / GPU / Simpleware /
+  experiment / tutorial tests are auto-skipped; opt in per bucket with
+  `--run-slow`, `--run-gpu`, `--run-simpleware`, `--run-experiments`,
+  `--run-tutorials`. The `requires_data` marker no longer exists — tests that
+  need external data download it automatically via the session fixtures.
 - Consult `docs/API_MAP.md` to locate classes and methods before searching manually.
 
 ## Implementation role
@@ -41,11 +45,27 @@ Non-Python tools used by contributor workflows:
 
 ## Testing role
 
-- Prefer synthetic `itk.Image` and small `pv.PolyData` surfaces — not real patient data.
-- State image shape and axis order in every test docstring: e.g. `shape (X, Y, Z, T)`.
-- Keep synthetic volumes ≤64 voxels per side for speed.
-- Mark tests that genuinely need real data with `@pytest.mark.requires_data`.
-- Use `test_tools.py` baseline utilities for surface and image regression checks.
+- **Strongly prefer real (downloaded) test data over synthetic data.** Request
+  the session fixtures (`test_directories`, `download_test_data`,
+  `test_images`) so the standard datasets are fetched automatically on first
+  use. Real data exercises preprocessing, resampling, dtype handling, and
+  world-frame metadata paths that synthetic toy volumes silently bypass.
+- Only fall back to synthetic `itk.Image` or `pv.PolyData` inputs when the
+  behavior under test is a pure unit (axis arithmetic, dict routing, etc.)
+  where real data adds no signal, or when real data would push the test into
+  a slow/GPU/Simpleware bucket that doesn't fit the test's purpose. Keep
+  synthetic volumes ≤64 voxels per side and say so in the docstring.
+- State image shape and axis order in every test docstring: e.g.
+  `shape (X, Y, Z, T) = (64, 64, 32, 1), LPS world frame`.
+- **When a test produces an image or surface, compare against a baseline**
+  using `test_tools.py` utilities (e.g. `TestTools`) and store baselines under
+  `tests/baselines/` (Git LFS-tracked). Run with `--create-baselines` to
+  materialize missing baselines on first use.
+- Mark tests that need a GPU, a slow runtime, or a licensed Simpleware
+  install with `@pytest.mark.requires_gpu`, `@pytest.mark.slow`, or
+  `@pytest.mark.requires_simpleware` so they fall into the right opt-in
+  bucket. Tests that just need downloadable data need no marker — the
+  fixture chain handles it.
 
 ## Documentation role
 

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -38,23 +38,28 @@ mypy src/ tests/
 # All pre-commit hooks
 pre-commit run --all-files
 
-# Fast tests (recommended for development)
-py -m pytest tests/ -m "not slow and not requires_data" -v
+# Fast tests (recommended for development — slow/GPU/Simpleware/experiment
+# /tutorial tests are auto-skipped unless their opt-in flag is passed)
+py -m pytest tests/ -v
 
 # Single test file or test by name
 py -m pytest tests/test_contour_tools.py -v
 py -m pytest tests/test_contour_tools.py::test_extract_surface -v
 
-# Skip GPU-dependent tests
-py -m pytest tests/ --ignore=tests/test_segment_chest_total_segmentator.py \
-              --ignore=tests/test_register_images_icon.py
+# Opt-in buckets (each flag enables one marker family)
+py -m pytest tests/ -v --run-slow         # tests marked 'slow'
+py -m pytest tests/ -v --run-gpu          # tests marked 'requires_gpu'
+py -m pytest tests/ -v --run-simpleware   # tests marked 'requires_simpleware'
+py -m pytest tests/ -v --run-experiments  # tests marked 'experiment'
+py -m pytest tests/ -v --run-tutorials    # tests marked 'tutorial'
+
+# Typical local GPU profile. The self-hosted CI GPU runner enables every
+# bucket: --run-gpu --run-slow --run-simpleware --run-experiments --run-tutorials
+py -m pytest tests/ -v --run-gpu --run-slow
 
 # With coverage
 py -m pytest tests/ --cov=src/physiomotion4d --cov-report=html
 
-# Experiment script tests (very slow, opt-in)
-py -m pytest tests/ --run-experiments
-
 # Create missing baselines
 py -m pytest tests/ --create-baselines
 ```
@@ -86,7 +91,9 @@ Regenerate it after any public API change: `py utils/generate_api_map.py`
 - Baselines in `tests/baselines/` via Git LFS — run `git lfs pull` after cloning
 - `tests/conftest.py`: session-scoped fixtures chaining download → convert → segment → register
 - `src/physiomotion4d/test_tools.py`: baseline comparison utilities (`TestTools`, etc.)
-- Markers: `slow`, `requires_gpu`, `requires_data`, `experiment`, `tutorial`
+- Markers (all opt-in via `--run-<bucket>`): `slow`, `requires_gpu`,
+  `requires_simpleware`, `experiment`, `tutorial`. Data-dependent tests no
+  longer use a marker — they pull data through fixtures and run by default.
 - Prefer images from `ROOT/data/test/slicer_heart_small` for tests
 - Prefer storing results in subdirs `./results/<test_name>`
 

diff --git a/README.md b/README.md
@@ -632,27 +632,34 @@ See `docs/contributing.rst` for complete IDE setup instructions.
 PhysioMotion4D includes comprehensive tests covering the complete pipeline from data download to USD generation.
 
 ```bash
-# Run all tests
-pytest tests/
-
-# Run fast tests only (recommended for development)
-pytest tests/ -m "not slow and not requires_data" -v
+# Fast tests (recommended for development).
+# slow / GPU / Simpleware / experiment / tutorial tests are auto-skipped
+# unless their opt-in flag is passed (see below). Tests that need
+# downloadable data fetch it automatically via the session fixtures.
+pytest tests/ -v
+
+# Opt-in buckets (each flag enables one marker family)
+pytest tests/ -v --run-slow            # tests marked 'slow'
+pytest tests/ -v --run-gpu             # tests marked 'requires_gpu'
+pytest tests/ -v --run-simpleware      # tests marked 'requires_simpleware'
+pytest tests/ -v --run-experiments     # tests marked 'experiment'
+pytest tests/ -v --run-tutorials       # tests marked 'tutorial'
+
+# Typical local GPU profile. The self-hosted CI GPU runner enables every
+# bucket: --run-gpu --run-slow --run-simpleware --run-experiments --run-tutorials
+pytest tests/ -v --run-gpu --run-slow
 
 # Run specific test categories
 pytest tests/test_usd_merge.py -v                           # USD merge functionality
 pytest tests/test_usd_time_preservation.py -v               # Time-varying data preservation
-pytest tests/test_register_images_ants.py -v                # ANTs registration
-pytest tests/test_register_images_greedy.py -v             # Greedy registration
-pytest tests/test_register_images_icon.py -v                # Icon registration
-pytest tests/test_register_time_series_images.py -v         # Time series registration
-pytest tests/test_segment_chest_total_segmentator.py -v     # TotalSegmentator
+pytest tests/test_register_images_ants.py -v --run-slow     # ANTs registration
+pytest tests/test_register_images_greedy.py -v              # Greedy registration
+pytest tests/test_register_images_icon.py -v --run-gpu --run-slow      # Icon registration (GPU)
+pytest tests/test_register_time_series_images.py -v --run-slow         # Time series registration
+pytest tests/test_segment_chest_total_segmentator.py -v --run-slow     # TotalSegmentator
 pytest tests/test_contour_tools.py -v                       # Mesh and contour tools
 pytest tests/test_image_tools.py -v                         # Image processing utilities
-pytest tests/test_transform_tools.py -v                     # Transform operations
-
-# Skip GPU-dependent tests (segmentation and registration)
-pytest tests/ --ignore=tests/test_segment_chest_total_segmentator.py \
-              --ignore=tests/test_register_images_icon.py
+pytest tests/test_transform_tools.py -v --run-slow          # Transform operations
 
 # Run with coverage report
 pytest tests/ --cov=src/physiomotion4d --cov-report=html
@@ -742,9 +749,9 @@ Use `/test-feature` to get a test plan and a complete pytest file using syntheti
 /test-feature RegisterImagesANTs with a pair of small synthetic ITK images
 ```
 
-The agent will state image shapes and axis orders in every test docstring, mark
-any real-data dependency with `@pytest.mark.requires_data`, and show the exact
-run command.
+The agent will state image shapes and axis orders in every test docstring, wire
+real-data dependencies through the session fixtures (so the data is downloaded
+on first use), and show the exact run command.
 
 ---