From 47809a670d1513ee622c132ef5ce909d303acce8 Mon Sep 17 00:00:00 2001
From: James Le Houx <james.lehoux@gre.ac.uk>
Date: Mon, 18 May 2026 08:03:45 +0000
Subject: [PATCH] docs: add Colab badge to DINO notebook and remove week
 references from demo notebook

https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk
---
 notebooks/braggtrack_demo.ipynb              | 46 +++-----------------
 notebooks/dino_segmentation_comparison.ipynb | 30 ++++++-------
 2 files changed, 20 insertions(+), 56 deletions(-)

diff --git a/notebooks/braggtrack_demo.ipynb b/notebooks/braggtrack_demo.ipynb
index 02b0e49..6aeb29c 100644
--- a/notebooks/braggtrack_demo.ipynb
+++ b/notebooks/braggtrack_demo.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "id": "e65e834e",
    "metadata": {},
-   "source": "# BraggTrack end-to-end demo\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/BASE-Laboratory/BraggTrack/blob/main/notebooks/braggtrack_demo.ipynb)\n\nRuns the full pipeline on the bundled `data/sample_operando/` scans:\n\n1. **Discover** — find the per-scan H5 files.\n2. **Segment (Week 2)** — LoG → h-maxima → seeded watershed → instance features.\n3. **Track physics-only (Week 3)** — Hungarian over a geometry cost with per-axis gating; build a lifecycle DAG.\n4. **Semantic descriptors (Week 4)** — orthogonal MIPs + frozen-encoder embeddings.\n5. **Geometry + semantic tracking (Week 4)** — compose `α · geometry + β · (1 − cos)`.\n6. **α/β ablation** — how the semantic weight shifts tracking metrics.\n7. **Synthetic crossing** — a case where geometry alone fails and semantics recover identity.\n\nFinal section shows the one-line CLI equivalents for each stage.\n\nThis notebook uses the **mock** DINO backend by default, so no PyTorch / HuggingFace weights are required. Set `BRAGGTRACK_DINO_BACKEND=torch` if you have them installed and want real embeddings."
+   "source": "# BraggTrack end-to-end demo\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/BASE-Laboratory/BraggTrack/blob/main/notebooks/braggtrack_demo.ipynb)\n\nRuns the full pipeline on the bundled `data/sample_operando/` scans:\n\n1. **Discover** — find the per-scan H5 files.\n2. **Segment** — LoG → h-maxima → seeded watershed → instance features.\n3. **Track (physics-only)** — Hungarian over a geometry cost with per-axis gating; build a lifecycle DAG.\n4. **Semantic descriptors** — orthogonal MIPs + frozen-encoder embeddings.\n5. **Geometry + semantic tracking** — compose `α · geometry + β · (1 − cos)`.\n6. **α/β ablation** — how the semantic weight shifts tracking metrics.\n7. **Synthetic crossing** — a case where geometry alone fails and semantics recover identity.\n\nFinal section shows the one-line CLI equivalents for each stage.\n\nThis notebook uses the **mock** DINO backend by default, so no PyTorch / HuggingFace weights are required. Set `BRAGGTRACK_DINO_BACKEND=torch` if you have them installed and want real embeddings."
   },
   {
    "cell_type": "markdown",
@@ -103,7 +103,7 @@
    "cell_type": "markdown",
    "id": "d74a25a8",
    "metadata": {},
-   "source": "## 2 — Week 2: classical segmentation\n\n`segment_classical` runs 3-D Gaussian blur → Laplacian → h-maxima seeds → seeded watershed.\n\n### Threshold stabilisation across scans\n\nEach scan produces its own Otsu threshold on the raw intensity histogram.\nIn theory these should be nearly identical for back-to-back operando acquisitions,\nbut minor intensity fluctuations (beam drift, detector warm-up, etc.) cause\nper-frame Otsu to jitter — and because everything downstream (foreground mask → seed\nfloor → watershed) is threshold-sensitive, small jitter produces wildly different\nspot counts.\n\n**Fix:** compute per-frame Otsu thresholds, then pass them through a\nrolling-median smoother (`smooth_thresholds`). The median suppresses isolated\noutliers (beam drops, detector flashes) while still tracking genuine long-term\ndrift. For 500+ frame sequences this runs in O(N·W) on scalar thresholds —\nno need to pool raw volumes in memory.\n\nTwo further knobs that matter for real data:\n\n* `threshold` — **intensity-domain** foreground, now smoothed across scans. Controls the watershed mask.\n* `seed_peak_fraction` / `seed_response_percentile` — **LoG-response-domain** admissibility floor inside the foreground."
+   "source": "## 2 — Classical segmentation\n\n`segment_classical` runs 3-D Gaussian blur → Laplacian → h-maxima seeds → seeded watershed.\n\n### Threshold stabilisation across scans\n\nEach scan produces its own Otsu threshold on the raw intensity histogram.\nIn theory these should be nearly identical for back-to-back operando acquisitions,\nbut minor intensity fluctuations (beam drift, detector warm-up, etc.) cause\nper-frame Otsu to jitter — and because everything downstream (foreground mask → seed\nfloor → watershed) is threshold-sensitive, small jitter produces wildly different\nspot counts.\n\n**Fix:** compute per-frame Otsu thresholds, then pass them through a\nrolling-median smoother (`smooth_thresholds`). The median suppresses isolated\noutliers (beam drops, detector flashes) while still tracking genuine long-term\ndrift. For 500+ frame sequences this runs in O(N·W) on scalar thresholds —\nno need to pool raw volumes in memory.\n\nTwo further knobs that matter for real data:\n\n* `threshold` — **intensity-domain** foreground, now smoothed across scans. Controls the watershed mask.\n* `seed_peak_fraction` / `seed_response_percentile` — **LoG-response-domain** admissibility floor inside the foreground."
   },
   {
    "cell_type": "code",
@@ -229,11 +229,7 @@
    "cell_type": "markdown",
    "id": "6fc3abf4",
    "metadata": {},
-   "source": [
-    "## 3 — Week 3: physics-only tracking\n",
-    "\n",
-    "`PositionShapeCost` combines squared centroid distance with squared eigenvalue distance; `build_tracks` runs pairwise Hungarian assignments and stitches them into a NetworkX `DiGraph` with `TrackEvent` annotations."
-   ]
+   "source": "## 3 — Physics-only tracking\n\n`PositionShapeCost` combines squared centroid distance with squared eigenvalue distance; `build_tracks` runs pairwise Hungarian assignments and stitches them into a NetworkX `DiGraph` with `TrackEvent` annotations."
   },
   {
    "cell_type": "code",
@@ -382,11 +378,7 @@
    "cell_type": "markdown",
    "id": "94bdcacb",
    "metadata": {},
-   "source": [
-    "## 4 — Week 4: multi-view MIPs\n",
-    "\n",
-    "For each spot, crop a padded sub-volume, zero out voxels that don't belong to the instance, and take three maximum-intensity projections — one along each physical axis."
-   ]
+   "source": "## 4 — Multi-view MIPs\n\nFor each spot, crop a padded sub-volume, zero out voxels that don't belong to the instance, and take three maximum-intensity projections — one along each physical axis."
   },
   {
    "cell_type": "code",
@@ -724,35 +716,7 @@
    "cell_type": "markdown",
    "id": "7e605de2",
    "metadata": {},
-   "source": [
-    "## 8 — The same pipeline from the command line\n",
-    "\n",
-    "Every library call above is exposed as a CLI — feed a dataset root and an output directory, get reproducible artifacts under `artifacts/`.\n",
-    "\n",
-    "```bash\n",
-    "# 1. Segment every scan under data/sample_operando/\n",
-    "python -m braggtrack.cli.segment_dataset --outdir artifacts/week2\n",
-    "\n",
-    "# 2. Compute mock multi-view embeddings\n",
-    "python -m braggtrack.cli.embed_dataset --segdir artifacts/week2 --outdir artifacts/week4 --backend mock\n",
-    "\n",
-    "# 3. Track with geometry + semantic cost (β=0.5)\n",
-    "python -m braggtrack.cli.track_dataset artifacts/week2 \\\n",
-    "    --outdir artifacts/week3 \\\n",
-    "    --embedding-dir artifacts/week4 \\\n",
-    "    --cost-alpha 1.0 --cost-beta 0.5\n",
-    "\n",
-    "# 4. Ablate α/β and write a JSON report\n",
-    "python scripts/ablation_week4.py \\\n",
-    "    --indir artifacts/week2 \\\n",
-    "    --embedding-dir artifacts/week4 \\\n",
-    "    --betas 0,0.25,0.5,1.0 \\\n",
-    "    --output artifacts/week4_ablation/report.json\n",
-    "\n",
-    "# 5. Full CI-equivalent check (unit tests + all weekly acceptance gates)\n",
-    "python scripts/ci_report.py\n",
-    "```"
-   ]
+   "source": "## 8 — The same pipeline from the command line\n\nEvery library call above is exposed as a CLI — feed a dataset root and an output directory, get reproducible artifacts under `artifacts/`.\n\n```bash\n# 1. Segment every scan under data/sample_operando/\npython -m braggtrack.cli.segment_dataset --outdir artifacts/segmentation\n\n# 2. Compute mock multi-view embeddings\npython -m braggtrack.cli.embed_dataset --segdir artifacts/segmentation --outdir artifacts/embedding --backend mock\n\n# 3. Track with geometry + semantic cost (β=0.5)\npython -m braggtrack.cli.track_dataset artifacts/segmentation \\\n    --outdir artifacts/tracking \\\n    --embedding-dir artifacts/embedding \\\n    --cost-alpha 1.0 --cost-beta 0.5\n\n# 4. Ablate α/β and write a JSON report\npython scripts/ablation_semantic.py \\\n    --indir artifacts/segmentation \\\n    --embedding-dir artifacts/embedding \\\n    --betas 0,0.25,0.5,1.0 \\\n    --output artifacts/ablation/report.json\n\n# 5. Full CI-equivalent check (unit tests + all acceptance gates)\npython scripts/ci_report.py\n```"
   }
  ],
  "metadata": {
diff --git a/notebooks/dino_segmentation_comparison.ipynb b/notebooks/dino_segmentation_comparison.ipynb
index a216609..5204d15 100644
--- a/notebooks/dino_segmentation_comparison.ipynb
+++ b/notebooks/dino_segmentation_comparison.ipynb
@@ -3,7 +3,7 @@
   {
    "cell_type": "markdown",
    "id": "ddb2fb4c",
-   "source": "# DINO vs Classical Segmentation Comparison\n\nThis notebook runs both segmentation backends on the bundled `data/sample_operando/` scans and compares their outputs side-by-side.\n\n| Method | How it works | Strengths |\n|--------|-------------|-----------|\n| **Classical** | Otsu threshold → LoG enhancement → h-maxima seeds → seeded watershed → merge nearby | Fast, interpretable, well-tuned for this beamline |\n| **DINO** | DINOv3 patch features → PCA → HDBSCAN clustering → 3D slice stitching → Otsu foreground mask | Learns in feature space — should generalise across beamlines/detectors without re-tuning |\n\nUses the **mock** DINO backend by default (no GPU required). Set `BRAGGTRACK_DINO_BACKEND=torch` for real DINOv3 features.",
+   "source": "# DINO vs Classical Segmentation Comparison\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/BASE-Laboratory/BraggTrack/blob/main/notebooks/dino_segmentation_comparison.ipynb)\n\nThis notebook runs both segmentation backends on the bundled `data/sample_operando/` scans and compares their outputs side-by-side.\n\n| Method | How it works | Strengths |\n|--------|-------------|-----------|\n| **Classical** | Otsu threshold \u2192 LoG enhancement \u2192 h-maxima seeds \u2192 seeded watershed \u2192 merge nearby | Fast, interpretable, well-tuned for this beamline |\n| **DINO** | DINOv3 patch features \u2192 PCA \u2192 HDBSCAN clustering \u2192 3D slice stitching \u2192 Otsu foreground mask | Learns in feature space \u2014 should generalise across beamlines/detectors without re-tuning |\n\nUses the **mock** DINO backend by default (no GPU required). Set `BRAGGTRACK_DINO_BACKEND=torch` for real DINOv3 features.",
    "metadata": {}
   },
   {
@@ -15,7 +15,7 @@
   {
    "cell_type": "code",
    "id": "320c0082",
-   "source": "import os, subprocess, sys\n\n_ON_COLAB = \"google.colab\" in sys.modules or os.environ.get(\"COLAB_RELEASE_TAG\")\n\nif _ON_COLAB:\n    print(\"Colab detected — installing BraggTrack + sample data...\")\n    subprocess.check_call([\n        sys.executable, \"-m\", \"pip\", \"install\", \"-q\",\n        \"braggtrack[notebook] @ git+https://github.com/BASE-Laboratory/BraggTrack.git\",\n    ])\n    if not os.path.isdir(\"data/sample_operando\"):\n        subprocess.check_call([\n            \"git\", \"clone\", \"--depth=1\", \"--filter=blob:none\", \"--sparse\",\n            \"https://github.com/BASE-Laboratory/BraggTrack.git\", \"_braggtrack_repo\",\n        ])\n        subprocess.check_call(\n            [\"git\", \"sparse-checkout\", \"set\", \"data/sample_operando\"],\n            cwd=\"_braggtrack_repo\",\n        )\n        os.makedirs(\"data\", exist_ok=True)\n        os.rename(\"_braggtrack_repo/data/sample_operando\", \"data/sample_operando\")\n        subprocess.check_call([\"rm\", \"-rf\", \"_braggtrack_repo\"])\n    os.environ.setdefault(\"BRAGGTRACK_DATA_ROOT\", os.path.abspath(\"data/sample_operando\"))\n    print(\"Done.\")\nelse:\n    print(\"Local environment — skipping Colab setup.\")",
+   "source": "import os, subprocess, sys\n\n_ON_COLAB = \"google.colab\" in sys.modules or os.environ.get(\"COLAB_RELEASE_TAG\")\n\nif _ON_COLAB:\n    print(\"Colab detected \u2014 installing BraggTrack + sample data...\")\n    subprocess.check_call([\n        sys.executable, \"-m\", \"pip\", \"install\", \"-q\",\n        \"braggtrack[notebook] @ git+https://github.com/BASE-Laboratory/BraggTrack.git\",\n    ])\n    if not os.path.isdir(\"data/sample_operando\"):\n        subprocess.check_call([\n            \"git\", \"clone\", \"--depth=1\", \"--filter=blob:none\", \"--sparse\",\n            \"https://github.com/BASE-Laboratory/BraggTrack.git\", \"_braggtrack_repo\",\n        ])\n        subprocess.check_call(\n            [\"git\", \"sparse-checkout\", \"set\", \"data/sample_operando\"],\n            cwd=\"_braggtrack_repo\",\n        )\n        os.makedirs(\"data\", exist_ok=True)\n        os.rename(\"_braggtrack_repo/data/sample_operando\", \"data/sample_operando\")\n        subprocess.check_call([\"rm\", \"-rf\", \"_braggtrack_repo\"])\n    os.environ.setdefault(\"BRAGGTRACK_DATA_ROOT\", os.path.abspath(\"data/sample_operando\"))\n    print(\"Done.\")\nelse:\n    print(\"Local environment \u2014 skipping Colab setup.\")",
    "metadata": {},
    "execution_count": null,
    "outputs": []
@@ -31,7 +31,7 @@
   {
    "cell_type": "markdown",
    "id": "12606936",
-   "source": "## 1 — Load real data\n\nRead the largest 3D numeric dataset from each H5 file (bypasses the fixed NeXus path shortlist).",
+   "source": "## 1 \u2014 Load real data\n\nRead the largest 3D numeric dataset from each H5 file (bypasses the fixed NeXus path shortlist).",
    "metadata": {}
   },
   {
@@ -45,7 +45,7 @@
   {
    "cell_type": "markdown",
    "id": "61e5cbd4",
-   "source": "## 2 — Run both segmentation methods\n\n### Classical pipeline\nOtsu → LoG → h-maxima → seeded watershed → remove small → fill holes → merge nearby → relabel.",
+   "source": "## 2 \u2014 Run both segmentation methods\n\n### Classical pipeline\nOtsu \u2192 LoG \u2192 h-maxima \u2192 seeded watershed \u2192 remove small \u2192 fill holes \u2192 merge nearby \u2192 relabel.",
    "metadata": {}
   },
   {
@@ -59,7 +59,7 @@
   {
    "cell_type": "markdown",
    "id": "a5cfd4e0",
-   "source": "### DINO pipeline\nDINOv3 patch features → PCA → HDBSCAN → upsample → 3D stitch → Otsu foreground mask → post-process.\n\nThe post-processing (remove small, fill holes, merge nearby, relabel) is identical to keep the comparison fair.",
+   "source": "### DINO pipeline\nDINOv3 patch features \u2192 PCA \u2192 HDBSCAN \u2192 upsample \u2192 3D stitch \u2192 Otsu foreground mask \u2192 post-process.\n\nThe post-processing (remove small, fill holes, merge nearby, relabel) is identical to keep the comparison fair.",
    "metadata": {}
   },
   {
@@ -73,7 +73,7 @@
   {
    "cell_type": "markdown",
    "id": "1b93c867",
-   "source": "## 3 — Spot count comparison",
+   "source": "## 3 \u2014 Spot count comparison",
    "metadata": {}
   },
   {
@@ -95,13 +95,13 @@
   {
    "cell_type": "markdown",
    "id": "dfd3d99c",
-   "source": "## 4 — Visual comparison: tri-axis label projections\n\nSide-by-side label overlays for each scan, projected along all three physical axes (μ, χ, d). Each row is a scan; left column = classical, right column = DINO.",
+   "source": "## 4 \u2014 Visual comparison: tri-axis label projections\n\nSide-by-side label overlays for each scan, projected along all three physical axes (\u03bc, \u03c7, d). Each row is a scan; left column = classical, right column = DINO.",
    "metadata": {}
   },
   {
    "cell_type": "code",
    "id": "569f7b23",
-   "source": "# Build a shared colormap large enough for both methods.\nmax_labels = max(\n    max(int(l.max()) for l in classical_labels),\n    max(int(l.max()) for l in dino_labels),\n) + 1\nrng_cm = np.random.RandomState(42)\nlabel_colors = np.zeros((max_labels, 4))\nlabel_colors[0] = [0, 0, 0, 0]\nfor i in range(1, max_labels):\n    label_colors[i] = [*rng_cm.uniform(0.2, 0.95, 3), 0.65]\nlabel_cmap = ListedColormap(label_colors)\n\naxis_info = [\n    (0, \"MIP along mu\", \"chi\", \"d\"),\n    (1, \"MIP along chi\", \"d\", \"mu\"),\n    (2, \"MIP along d\", \"chi\", \"mu\"),\n]\n\nfig, axes = plt.subplots(len(scans), 6, figsize=(22, len(scans) * 3.5))\n\nfor row, (s, v, c_lab, d_lab) in enumerate(zip(scans, all_volumes, classical_labels, dino_labels)):\n    for col_offset, (method_name, labels) in enumerate([(\"Classical\", c_lab), (\"DINO\", d_lab)]):\n        for ax_idx, (axis_id, title, xlabel, ylabel) in enumerate(axis_info):\n            ax = axes[row, col_offset * 3 + ax_idx]\n            mip = v.max(axis=axis_id)\n            floor = otsu_floor_from_mip(v, axis=axis_id)\n            proj_l = label_projection_by_intensity(v, labels, axis=axis_id, mip_floor=floor)\n\n            vlo, vhi = np.percentile(mip, [1, 99.9])\n            ax.imshow(mip, cmap=\"gray\", vmin=vlo, vmax=vhi)\n            mask = np.ma.masked_where(proj_l == 0, proj_l)\n            ax.imshow(mask, cmap=label_cmap, interpolation=\"nearest\", vmin=0, vmax=max_labels - 1)\n\n            if row == 0:\n                ax.set_title(f\"{method_name}\\n{title}\", fontsize=9)\n            ax.tick_params(labelsize=6)\n            if ax_idx == 0 and col_offset == 0:\n                n_c = int(c_lab.max())\n                n_d = int(d_lab.max())\n                ax.set_ylabel(f\"{s.scan_name}\\nC={n_c} D={n_d}\", fontsize=9)\n\nplt.suptitle(\"Classical (left 3 cols) vs DINO (right 3 cols) — tri-axis label projection\", y=1.01, fontsize=13)\nplt.tight_layout()\nplt.show()",
+   "source": "# Build a shared colormap large enough for both methods.\nmax_labels = max(\n    max(int(l.max()) for l in classical_labels),\n    max(int(l.max()) for l in dino_labels),\n) + 1\nrng_cm = np.random.RandomState(42)\nlabel_colors = np.zeros((max_labels, 4))\nlabel_colors[0] = [0, 0, 0, 0]\nfor i in range(1, max_labels):\n    label_colors[i] = [*rng_cm.uniform(0.2, 0.95, 3), 0.65]\nlabel_cmap = ListedColormap(label_colors)\n\naxis_info = [\n    (0, \"MIP along mu\", \"chi\", \"d\"),\n    (1, \"MIP along chi\", \"d\", \"mu\"),\n    (2, \"MIP along d\", \"chi\", \"mu\"),\n]\n\nfig, axes = plt.subplots(len(scans), 6, figsize=(22, len(scans) * 3.5))\n\nfor row, (s, v, c_lab, d_lab) in enumerate(zip(scans, all_volumes, classical_labels, dino_labels)):\n    for col_offset, (method_name, labels) in enumerate([(\"Classical\", c_lab), (\"DINO\", d_lab)]):\n        for ax_idx, (axis_id, title, xlabel, ylabel) in enumerate(axis_info):\n            ax = axes[row, col_offset * 3 + ax_idx]\n            mip = v.max(axis=axis_id)\n            floor = otsu_floor_from_mip(v, axis=axis_id)\n            proj_l = label_projection_by_intensity(v, labels, axis=axis_id, mip_floor=floor)\n\n            vlo, vhi = np.percentile(mip, [1, 99.9])\n            ax.imshow(mip, cmap=\"gray\", vmin=vlo, vmax=vhi)\n            mask = np.ma.masked_where(proj_l == 0, proj_l)\n            ax.imshow(mask, cmap=label_cmap, interpolation=\"nearest\", vmin=0, vmax=max_labels - 1)\n\n            if row == 0:\n                ax.set_title(f\"{method_name}\\n{title}\", fontsize=9)\n            ax.tick_params(labelsize=6)\n            if ax_idx == 0 and col_offset == 0:\n                n_c = int(c_lab.max())\n                n_d = int(d_lab.max())\n                ax.set_ylabel(f\"{s.scan_name}\\nC={n_c} D={n_d}\", fontsize=9)\n\nplt.suptitle(\"Classical (left 3 cols) vs DINO (right 3 cols) \u2014 tri-axis label projection\", y=1.01, fontsize=13)\nplt.tight_layout()\nplt.show()",
    "metadata": {},
    "execution_count": null,
    "outputs": []
@@ -109,7 +109,7 @@
   {
    "cell_type": "markdown",
    "id": "49b9b0e3",
-   "source": "## 5 — Instance feature comparison\n\nCompare the per-spot properties (voxel count, integrated intensity, centroid, eigenvalues) between the two methods.",
+   "source": "## 5 \u2014 Instance feature comparison\n\nCompare the per-spot properties (voxel count, integrated intensity, centroid, eigenvalues) between the two methods.",
    "metadata": {}
   },
   {
@@ -131,7 +131,7 @@
   {
    "cell_type": "markdown",
    "id": "d3693a3f",
-   "source": "## 6 — Spatial overlap (Dice coefficient)\n\nFor each scan, compute the Dice coefficient between the binary foreground masks produced by the two methods. This measures how much the methods agree on *where* spots are, regardless of how they partition them into instances.",
+   "source": "## 6 \u2014 Spatial overlap (Dice coefficient)\n\nFor each scan, compute the Dice coefficient between the binary foreground masks produced by the two methods. This measures how much the methods agree on *where* spots are, regardless of how they partition them into instances.",
    "metadata": {}
   },
   {
@@ -145,7 +145,7 @@
   {
    "cell_type": "markdown",
    "id": "4b7a8bbc",
-   "source": "## 7 — Centroid scatter: classical vs DINO\n\nPlot the centroids from both methods on the same axes. Matching centroids (spots found by both methods) will overlap; method-unique detections will stand alone.",
+   "source": "## 7 \u2014 Centroid scatter: classical vs DINO\n\nPlot the centroids from both methods on the same axes. Matching centroids (spots found by both methods) will overlap; method-unique detections will stand alone.",
    "metadata": {}
   },
   {
@@ -159,7 +159,7 @@
   {
    "cell_type": "markdown",
    "id": "77d2e6e2",
-   "source": "## 8 — Consistency across scans\n\nA key motivation for DINO-based segmentation is consistency: the same physical spots should produce the same segmentation across consecutive scans. Compare the coefficient of variation (std/mean) of spot counts across scans for each method.",
+   "source": "## 8 \u2014 Consistency across scans\n\nA key motivation for DINO-based segmentation is consistency: the same physical spots should produce the same segmentation across consecutive scans. Compare the coefficient of variation (std/mean) of spot counts across scans for each method.",
    "metadata": {}
   },
   {
@@ -173,13 +173,13 @@
   {
    "cell_type": "markdown",
    "id": "ec6c4ef9",
-   "source": "## 9 — Per-scan feature tables\n\nFull feature tables for both methods on scan 1, for detailed inspection.",
+   "source": "## 9 \u2014 Per-scan feature tables\n\nFull feature tables for both methods on scan 1, for detailed inspection.",
    "metadata": {}
   },
   {
    "cell_type": "code",
    "id": "49e37ae2",
-   "source": "cols = [\"label\", \"voxel_count\", \"integrated_intensity\",\n        \"centroid_mu\", \"centroid_chi\", \"centroid_d\",\n        \"eig_1\", \"eig_2\", \"eig_3\"]\n\nprint(\"=== Classical — scan0001 ===\")\ndisplay(pd.DataFrame(classical_features[0])[cols]) if classical_features[0] else print(\"(no spots)\")\n\nprint(\"\\n=== DINO — scan0001 ===\")\ndisplay(pd.DataFrame(dino_features[0])[cols]) if dino_features[0] else print(\"(no spots)\")",
+   "source": "cols = [\"label\", \"voxel_count\", \"integrated_intensity\",\n        \"centroid_mu\", \"centroid_chi\", \"centroid_d\",\n        \"eig_1\", \"eig_2\", \"eig_3\"]\n\nprint(\"=== Classical \u2014 scan0001 ===\")\ndisplay(pd.DataFrame(classical_features[0])[cols]) if classical_features[0] else print(\"(no spots)\")\n\nprint(\"\\n=== DINO \u2014 scan0001 ===\")\ndisplay(pd.DataFrame(dino_features[0])[cols]) if dino_features[0] else print(\"(no spots)\")",
    "metadata": {},
    "execution_count": null,
    "outputs": []
@@ -187,7 +187,7 @@
   {
    "cell_type": "markdown",
    "id": "a64fefce",
-   "source": "## 10 — Notes and next steps\n\n**What the mock backend shows:** The mock DINO backend produces hash-based random features, so the clustering is not semantically meaningful — it tests the *pipeline plumbing* (slice extraction → PCA → HDBSCAN → stitching → foreground mask) but not the quality of learned representations.\n\n**What changes with real DINOv3 weights:**\n- Set `BRAGGTRACK_DINO_BACKEND=torch` (requires `torch` + `transformers` + GPU)\n- The encoder extracts genuine patch-level features where similar textures cluster together\n- Expect better instance separation without hand-tuned LoG/watershed parameters\n- The same model should work across different beamlines and detectors\n\n**CLI equivalent:**\n```bash\n# Classical\nbraggtrack-segment-dataset --method classical --outdir artifacts/classical\n\n# DINO (mock)\nbraggtrack-segment-dataset --method dino --dino-backend mock --outdir artifacts/dino\n\n# DINO (real weights)\nbraggtrack-segment-dataset --method dino --dino-backend torch --outdir artifacts/dino_real\n```",
+   "source": "## 10 \u2014 Notes and next steps\n\n**What the mock backend shows:** The mock DINO backend produces hash-based random features, so the clustering is not semantically meaningful \u2014 it tests the *pipeline plumbing* (slice extraction \u2192 PCA \u2192 HDBSCAN \u2192 stitching \u2192 foreground mask) but not the quality of learned representations.\n\n**What changes with real DINOv3 weights:**\n- Set `BRAGGTRACK_DINO_BACKEND=torch` (requires `torch` + `transformers` + GPU)\n- The encoder extracts genuine patch-level features where similar textures cluster together\n- Expect better instance separation without hand-tuned LoG/watershed parameters\n- The same model should work across different beamlines and detectors\n\n**CLI equivalent:**\n```bash\n# Classical\nbraggtrack-segment-dataset --method classical --outdir artifacts/classical\n\n# DINO (mock)\nbraggtrack-segment-dataset --method dino --dino-backend mock --outdir artifacts/dino\n\n# DINO (real weights)\nbraggtrack-segment-dataset --method dino --dino-backend torch --outdir artifacts/dino_real\n```",
    "metadata": {}
   }
  ],