BASE-Laboratory · jameslehoux · May 18, 2026 · May 18, 2026
diff --git a/notebooks/braggtrack_demo.ipynb b/notebooks/braggtrack_demo.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "id": "e65e834e",
    "metadata": {},
-   "source": "# BraggTrack end-to-end demo\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/BASE-Laboratory/BraggTrack/blob/main/notebooks/braggtrack_demo.ipynb)\n\nRuns the full pipeline on the bundled `data/sample_operando/` scans:\n\n1. **Discover** — find the per-scan H5 files.\n2. **Segment (Week 2)** — LoG → h-maxima → seeded watershed → instance features.\n3. **Track physics-only (Week 3)** — Hungarian over a geometry cost with per-axis gating; build a lifecycle DAG.\n4. **Semantic descriptors (Week 4)** — orthogonal MIPs + frozen-encoder embeddings.\n5. **Geometry + semantic tracking (Week 4)** — compose `α · geometry + β · (1 − cos)`.\n6. **α/β ablation** — how the semantic weight shifts tracking metrics.\n7. **Synthetic crossing** — a case where geometry alone fails and semantics recover identity.\n\nFinal section shows the one-line CLI equivalents for each stage.\n\nThis notebook uses the **mock** DINO backend by default, so no PyTorch / HuggingFace weights are required. Set `BRAGGTRACK_DINO_BACKEND=torch` if you have them installed and want real embeddings."
+   "source": "# BraggTrack end-to-end demo\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/BASE-Laboratory/BraggTrack/blob/main/notebooks/braggtrack_demo.ipynb)\n\nRuns the full pipeline on the bundled `data/sample_operando/` scans:\n\n1. **Discover** — find the per-scan H5 files.\n2. **Segment** — LoG → h-maxima → seeded watershed → instance features.\n3. **Track (physics-only)** — Hungarian over a geometry cost with per-axis gating; build a lifecycle DAG.\n4. **Semantic descriptors** — orthogonal MIPs + frozen-encoder embeddings.\n5. **Geometry + semantic tracking** — compose `α · geometry + β · (1 − cos)`.\n6. **α/β ablation** — how the semantic weight shifts tracking metrics.\n7. **Synthetic crossing** — a case where geometry alone fails and semantics recover identity.\n\nFinal section shows the one-line CLI equivalents for each stage.\n\nThis notebook uses the **mock** DINO backend by default, so no PyTorch / HuggingFace weights are required. Set `BRAGGTRACK_DINO_BACKEND=torch` if you have them installed and want real embeddings."
   },
   {
    "cell_type": "markdown",
@@ -103,7 +103,7 @@
    "cell_type": "markdown",
    "id": "d74a25a8",
    "metadata": {},
-   "source": "## 2 — Week 2: classical segmentation\n\n`segment_classical` runs 3-D Gaussian blur → Laplacian → h-maxima seeds → seeded watershed.\n\n### Threshold stabilisation across scans\n\nEach scan produces its own Otsu threshold on the raw intensity histogram.\nIn theory these should be nearly identical for back-to-back operando acquisitions,\nbut minor intensity fluctuations (beam drift, detector warm-up, etc.) cause\nper-frame Otsu to jitter — and because everything downstream (foreground mask → seed\nfloor → watershed) is threshold-sensitive, small jitter produces wildly different\nspot counts.\n\n**Fix:** compute per-frame Otsu thresholds, then pass them through a\nrolling-median smoother (`smooth_thresholds`). The median suppresses isolated\noutliers (beam drops, detector flashes) while still tracking genuine long-term\ndrift. For 500+ frame sequences this runs in O(N·W) on scalar thresholds —\nno need to pool raw volumes in memory.\n\nTwo further knobs that matter for real data:\n\n* `threshold` — **intensity-domain** foreground, now smoothed across scans. Controls the watershed mask.\n* `seed_peak_fraction` / `seed_response_percentile` — **LoG-response-domain** admissibility floor inside the foreground."
+   "source": "## 2 — Classical segmentation\n\n`segment_classical` runs 3-D Gaussian blur → Laplacian → h-maxima seeds → seeded watershed.\n\n### Threshold stabilisation across scans\n\nEach scan produces its own Otsu threshold on the raw intensity histogram.\nIn theory these should be nearly identical for back-to-back operando acquisitions,\nbut minor intensity fluctuations (beam drift, detector warm-up, etc.) cause\nper-frame Otsu to jitter — and because everything downstream (foreground mask → seed\nfloor → watershed) is threshold-sensitive, small jitter produces wildly different\nspot counts.\n\n**Fix:** compute per-frame Otsu thresholds, then pass them through a\nrolling-median smoother (`smooth_thresholds`). The median suppresses isolated\noutliers (beam drops, detector flashes) while still tracking genuine long-term\ndrift. For 500+ frame sequences this runs in O(N·W) on scalar thresholds —\nno need to pool raw volumes in memory.\n\nTwo further knobs that matter for real data:\n\n* `threshold` — **intensity-domain** foreground, now smoothed across scans. Controls the watershed mask.\n* `seed_peak_fraction` / `seed_response_percentile` — **LoG-response-domain** admissibility floor inside the foreground."
   },
   {
    "cell_type": "code",
@@ -229,11 +229,7 @@
    "cell_type": "markdown",
    "id": "6fc3abf4",
    "metadata": {},
-   "source": [
-    "## 3 — Week 3: physics-only tracking\n",
-    "\n",
-    "`PositionShapeCost` combines squared centroid distance with squared eigenvalue distance; `build_tracks` runs pairwise Hungarian assignments and stitches them into a NetworkX `DiGraph` with `TrackEvent` annotations."
-   ]
+   "source": "## 3 — Physics-only tracking\n\n`PositionShapeCost` combines squared centroid distance with squared eigenvalue distance; `build_tracks` runs pairwise Hungarian assignments and stitches them into a NetworkX `DiGraph` with `TrackEvent` annotations."
   },
   {
    "cell_type": "code",
@@ -382,11 +378,7 @@
    "cell_type": "markdown",
    "id": "94bdcacb",
    "metadata": {},
-   "source": [
-    "## 4 — Week 4: multi-view MIPs\n",
-    "\n",
-    "For each spot, crop a padded sub-volume, zero out voxels that don't belong to the instance, and take three maximum-intensity projections — one along each physical axis."
-   ]
+   "source": "## 4 — Multi-view MIPs\n\nFor each spot, crop a padded sub-volume, zero out voxels that don't belong to the instance, and take three maximum-intensity projections — one along each physical axis."
   },
   {
    "cell_type": "code",
@@ -724,35 +716,7 @@
    "cell_type": "markdown",
    "id": "7e605de2",
    "metadata": {},
-   "source": [
-    "## 8 — The same pipeline from the command line\n",
-    "\n",
-    "Every library call above is exposed as a CLI — feed a dataset root and an output directory, get reproducible artifacts under `artifacts/`.\n",
-    "\n",
-    "```bash\n",
-    "# 1. Segment every scan under data/sample_operando/\n",
-    "python -m braggtrack.cli.segment_dataset --outdir artifacts/week2\n",
-    "\n",
-    "# 2. Compute mock multi-view embeddings\n",
-    "python -m braggtrack.cli.embed_dataset --segdir artifacts/week2 --outdir artifacts/week4 --backend mock\n",
-    "\n",
-    "# 3. Track with geometry + semantic cost (β=0.5)\n",
-    "python -m braggtrack.cli.track_dataset artifacts/week2 \\\n",
-    "    --outdir artifacts/week3 \\\n",
-    "    --embedding-dir artifacts/week4 \\\n",
-    "    --cost-alpha 1.0 --cost-beta 0.5\n",
-    "\n",
-    "# 4. Ablate α/β and write a JSON report\n",
-    "python scripts/ablation_week4.py \\\n",
-    "    --indir artifacts/week2 \\\n",
-    "    --embedding-dir artifacts/week4 \\\n",
-    "    --betas 0,0.25,0.5,1.0 \\\n",
-    "    --output artifacts/week4_ablation/report.json\n",
-    "\n",
-    "# 5. Full CI-equivalent check (unit tests + all weekly acceptance gates)\n",
-    "python scripts/ci_report.py\n",
-    "```"
-   ]
+   "source": "## 8 — The same pipeline from the command line\n\nEvery library call above is exposed as a CLI — feed a dataset root and an output directory, get reproducible artifacts under `artifacts/`.\n\n```bash\n# 1. Segment every scan under data/sample_operando/\npython -m braggtrack.cli.segment_dataset --outdir artifacts/segmentation\n\n# 2. Compute mock multi-view embeddings\npython -m braggtrack.cli.embed_dataset --segdir artifacts/segmentation --outdir artifacts/embedding --backend mock\n\n# 3. Track with geometry + semantic cost (β=0.5)\npython -m braggtrack.cli.track_dataset artifacts/segmentation \\\n    --outdir artifacts/tracking \\\n    --embedding-dir artifacts/embedding \\\n    --cost-alpha 1.0 --cost-beta 0.5\n\n# 4. Ablate α/β and write a JSON report\npython scripts/ablation_semantic.py \\\n    --indir artifacts/segmentation \\\n    --embedding-dir artifacts/embedding \\\n    --betas 0,0.25,0.5,1.0 \\\n    --output artifacts/ablation/report.json\n\n# 5. Full CI-equivalent check (unit tests + all acceptance gates)\npython scripts/ci_report.py\n```"
   }
  ],
  "metadata": {