From 47809a670d1513ee622c132ef5ce909d303acce8 Mon Sep 17 00:00:00 2001 From: James Le Houx Date: Mon, 18 May 2026 08:03:45 +0000 Subject: [PATCH] docs: add Colab badge to DINO notebook and remove week references from demo notebook https://claude.ai/code/session_015Y9zQk4A8uKJAorKuvBoCk --- notebooks/braggtrack_demo.ipynb | 46 +++----------------- notebooks/dino_segmentation_comparison.ipynb | 30 ++++++------- 2 files changed, 20 insertions(+), 56 deletions(-) diff --git a/notebooks/braggtrack_demo.ipynb b/notebooks/braggtrack_demo.ipynb index 02b0e49..6aeb29c 100644 --- a/notebooks/braggtrack_demo.ipynb +++ b/notebooks/braggtrack_demo.ipynb @@ -4,7 +4,7 @@ "cell_type": "markdown", "id": "e65e834e", "metadata": {}, - "source": "# BraggTrack end-to-end demo\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/BASE-Laboratory/BraggTrack/blob/main/notebooks/braggtrack_demo.ipynb)\n\nRuns the full pipeline on the bundled `data/sample_operando/` scans:\n\n1. **Discover** — find the per-scan H5 files.\n2. **Segment (Week 2)** — LoG → h-maxima → seeded watershed → instance features.\n3. **Track physics-only (Week 3)** — Hungarian over a geometry cost with per-axis gating; build a lifecycle DAG.\n4. **Semantic descriptors (Week 4)** — orthogonal MIPs + frozen-encoder embeddings.\n5. **Geometry + semantic tracking (Week 4)** — compose `α · geometry + β · (1 − cos)`.\n6. **α/β ablation** — how the semantic weight shifts tracking metrics.\n7. **Synthetic crossing** — a case where geometry alone fails and semantics recover identity.\n\nFinal section shows the one-line CLI equivalents for each stage.\n\nThis notebook uses the **mock** DINO backend by default, so no PyTorch / HuggingFace weights are required. Set `BRAGGTRACK_DINO_BACKEND=torch` if you have them installed and want real embeddings." + "source": "# BraggTrack end-to-end demo\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/BASE-Laboratory/BraggTrack/blob/main/notebooks/braggtrack_demo.ipynb)\n\nRuns the full pipeline on the bundled `data/sample_operando/` scans:\n\n1. **Discover** — find the per-scan H5 files.\n2. **Segment** — LoG → h-maxima → seeded watershed → instance features.\n3. **Track (physics-only)** — Hungarian over a geometry cost with per-axis gating; build a lifecycle DAG.\n4. **Semantic descriptors** — orthogonal MIPs + frozen-encoder embeddings.\n5. **Geometry + semantic tracking** — compose `α · geometry + β · (1 − cos)`.\n6. **α/β ablation** — how the semantic weight shifts tracking metrics.\n7. **Synthetic crossing** — a case where geometry alone fails and semantics recover identity.\n\nFinal section shows the one-line CLI equivalents for each stage.\n\nThis notebook uses the **mock** DINO backend by default, so no PyTorch / HuggingFace weights are required. Set `BRAGGTRACK_DINO_BACKEND=torch` if you have them installed and want real embeddings." }, { "cell_type": "markdown", @@ -103,7 +103,7 @@ "cell_type": "markdown", "id": "d74a25a8", "metadata": {}, - "source": "## 2 — Week 2: classical segmentation\n\n`segment_classical` runs 3-D Gaussian blur → Laplacian → h-maxima seeds → seeded watershed.\n\n### Threshold stabilisation across scans\n\nEach scan produces its own Otsu threshold on the raw intensity histogram.\nIn theory these should be nearly identical for back-to-back operando acquisitions,\nbut minor intensity fluctuations (beam drift, detector warm-up, etc.) cause\nper-frame Otsu to jitter — and because everything downstream (foreground mask → seed\nfloor → watershed) is threshold-sensitive, small jitter produces wildly different\nspot counts.\n\n**Fix:** compute per-frame Otsu thresholds, then pass them through a\nrolling-median smoother (`smooth_thresholds`). The median suppresses isolated\noutliers (beam drops, detector flashes) while still tracking genuine long-term\ndrift. For 500+ frame sequences this runs in O(N·W) on scalar thresholds —\nno need to pool raw volumes in memory.\n\nTwo further knobs that matter for real data:\n\n* `threshold` — **intensity-domain** foreground, now smoothed across scans. Controls the watershed mask.\n* `seed_peak_fraction` / `seed_response_percentile` — **LoG-response-domain** admissibility floor inside the foreground." + "source": "## 2 — Classical segmentation\n\n`segment_classical` runs 3-D Gaussian blur → Laplacian → h-maxima seeds → seeded watershed.\n\n### Threshold stabilisation across scans\n\nEach scan produces its own Otsu threshold on the raw intensity histogram.\nIn theory these should be nearly identical for back-to-back operando acquisitions,\nbut minor intensity fluctuations (beam drift, detector warm-up, etc.) cause\nper-frame Otsu to jitter — and because everything downstream (foreground mask → seed\nfloor → watershed) is threshold-sensitive, small jitter produces wildly different\nspot counts.\n\n**Fix:** compute per-frame Otsu thresholds, then pass them through a\nrolling-median smoother (`smooth_thresholds`). The median suppresses isolated\noutliers (beam drops, detector flashes) while still tracking genuine long-term\ndrift. For 500+ frame sequences this runs in O(N·W) on scalar thresholds —\nno need to pool raw volumes in memory.\n\nTwo further knobs that matter for real data:\n\n* `threshold` — **intensity-domain** foreground, now smoothed across scans. Controls the watershed mask.\n* `seed_peak_fraction` / `seed_response_percentile` — **LoG-response-domain** admissibility floor inside the foreground." }, { "cell_type": "code", @@ -229,11 +229,7 @@ "cell_type": "markdown", "id": "6fc3abf4", "metadata": {}, - "source": [ - "## 3 — Week 3: physics-only tracking\n", - "\n", - "`PositionShapeCost` combines squared centroid distance with squared eigenvalue distance; `build_tracks` runs pairwise Hungarian assignments and stitches them into a NetworkX `DiGraph` with `TrackEvent` annotations." - ] + "source": "## 3 — Physics-only tracking\n\n`PositionShapeCost` combines squared centroid distance with squared eigenvalue distance; `build_tracks` runs pairwise Hungarian assignments and stitches them into a NetworkX `DiGraph` with `TrackEvent` annotations." }, { "cell_type": "code", @@ -382,11 +378,7 @@ "cell_type": "markdown", "id": "94bdcacb", "metadata": {}, - "source": [ - "## 4 — Week 4: multi-view MIPs\n", - "\n", - "For each spot, crop a padded sub-volume, zero out voxels that don't belong to the instance, and take three maximum-intensity projections — one along each physical axis." - ] + "source": "## 4 — Multi-view MIPs\n\nFor each spot, crop a padded sub-volume, zero out voxels that don't belong to the instance, and take three maximum-intensity projections — one along each physical axis." }, { "cell_type": "code", @@ -724,35 +716,7 @@ "cell_type": "markdown", "id": "7e605de2", "metadata": {}, - "source": [ - "## 8 — The same pipeline from the command line\n", - "\n", - "Every library call above is exposed as a CLI — feed a dataset root and an output directory, get reproducible artifacts under `artifacts/`.\n", - "\n", - "```bash\n", - "# 1. Segment every scan under data/sample_operando/\n", - "python -m braggtrack.cli.segment_dataset --outdir artifacts/week2\n", - "\n", - "# 2. Compute mock multi-view embeddings\n", - "python -m braggtrack.cli.embed_dataset --segdir artifacts/week2 --outdir artifacts/week4 --backend mock\n", - "\n", - "# 3. Track with geometry + semantic cost (β=0.5)\n", - "python -m braggtrack.cli.track_dataset artifacts/week2 \\\n", - " --outdir artifacts/week3 \\\n", - " --embedding-dir artifacts/week4 \\\n", - " --cost-alpha 1.0 --cost-beta 0.5\n", - "\n", - "# 4. Ablate α/β and write a JSON report\n", - "python scripts/ablation_week4.py \\\n", - " --indir artifacts/week2 \\\n", - " --embedding-dir artifacts/week4 \\\n", - " --betas 0,0.25,0.5,1.0 \\\n", - " --output artifacts/week4_ablation/report.json\n", - "\n", - "# 5. Full CI-equivalent check (unit tests + all weekly acceptance gates)\n", - "python scripts/ci_report.py\n", - "```" - ] + "source": "## 8 — The same pipeline from the command line\n\nEvery library call above is exposed as a CLI — feed a dataset root and an output directory, get reproducible artifacts under `artifacts/`.\n\n```bash\n# 1. Segment every scan under data/sample_operando/\npython -m braggtrack.cli.segment_dataset --outdir artifacts/segmentation\n\n# 2. Compute mock multi-view embeddings\npython -m braggtrack.cli.embed_dataset --segdir artifacts/segmentation --outdir artifacts/embedding --backend mock\n\n# 3. Track with geometry + semantic cost (β=0.5)\npython -m braggtrack.cli.track_dataset artifacts/segmentation \\\n --outdir artifacts/tracking \\\n --embedding-dir artifacts/embedding \\\n --cost-alpha 1.0 --cost-beta 0.5\n\n# 4. Ablate α/β and write a JSON report\npython scripts/ablation_semantic.py \\\n --indir artifacts/segmentation \\\n --embedding-dir artifacts/embedding \\\n --betas 0,0.25,0.5,1.0 \\\n --output artifacts/ablation/report.json\n\n# 5. Full CI-equivalent check (unit tests + all acceptance gates)\npython scripts/ci_report.py\n```" } ], "metadata": { diff --git a/notebooks/dino_segmentation_comparison.ipynb b/notebooks/dino_segmentation_comparison.ipynb index a216609..5204d15 100644 --- a/notebooks/dino_segmentation_comparison.ipynb +++ b/notebooks/dino_segmentation_comparison.ipynb @@ -3,7 +3,7 @@ { "cell_type": "markdown", "id": "ddb2fb4c", - "source": "# DINO vs Classical Segmentation Comparison\n\nThis notebook runs both segmentation backends on the bundled `data/sample_operando/` scans and compares their outputs side-by-side.\n\n| Method | How it works | Strengths |\n|--------|-------------|-----------|\n| **Classical** | Otsu threshold → LoG enhancement → h-maxima seeds → seeded watershed → merge nearby | Fast, interpretable, well-tuned for this beamline |\n| **DINO** | DINOv3 patch features → PCA → HDBSCAN clustering → 3D slice stitching → Otsu foreground mask | Learns in feature space — should generalise across beamlines/detectors without re-tuning |\n\nUses the **mock** DINO backend by default (no GPU required). Set `BRAGGTRACK_DINO_BACKEND=torch` for real DINOv3 features.", + "source": "# DINO vs Classical Segmentation Comparison\n\n[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/BASE-Laboratory/BraggTrack/blob/main/notebooks/dino_segmentation_comparison.ipynb)\n\nThis notebook runs both segmentation backends on the bundled `data/sample_operando/` scans and compares their outputs side-by-side.\n\n| Method | How it works | Strengths |\n|--------|-------------|-----------|\n| **Classical** | Otsu threshold \u2192 LoG enhancement \u2192 h-maxima seeds \u2192 seeded watershed \u2192 merge nearby | Fast, interpretable, well-tuned for this beamline |\n| **DINO** | DINOv3 patch features \u2192 PCA \u2192 HDBSCAN clustering \u2192 3D slice stitching \u2192 Otsu foreground mask | Learns in feature space \u2014 should generalise across beamlines/detectors without re-tuning |\n\nUses the **mock** DINO backend by default (no GPU required). Set `BRAGGTRACK_DINO_BACKEND=torch` for real DINOv3 features.", "metadata": {} }, { @@ -15,7 +15,7 @@ { "cell_type": "code", "id": "320c0082", - "source": "import os, subprocess, sys\n\n_ON_COLAB = \"google.colab\" in sys.modules or os.environ.get(\"COLAB_RELEASE_TAG\")\n\nif _ON_COLAB:\n print(\"Colab detected — installing BraggTrack + sample data...\")\n subprocess.check_call([\n sys.executable, \"-m\", \"pip\", \"install\", \"-q\",\n \"braggtrack[notebook] @ git+https://github.com/BASE-Laboratory/BraggTrack.git\",\n ])\n if not os.path.isdir(\"data/sample_operando\"):\n subprocess.check_call([\n \"git\", \"clone\", \"--depth=1\", \"--filter=blob:none\", \"--sparse\",\n \"https://github.com/BASE-Laboratory/BraggTrack.git\", \"_braggtrack_repo\",\n ])\n subprocess.check_call(\n [\"git\", \"sparse-checkout\", \"set\", \"data/sample_operando\"],\n cwd=\"_braggtrack_repo\",\n )\n os.makedirs(\"data\", exist_ok=True)\n os.rename(\"_braggtrack_repo/data/sample_operando\", \"data/sample_operando\")\n subprocess.check_call([\"rm\", \"-rf\", \"_braggtrack_repo\"])\n os.environ.setdefault(\"BRAGGTRACK_DATA_ROOT\", os.path.abspath(\"data/sample_operando\"))\n print(\"Done.\")\nelse:\n print(\"Local environment — skipping Colab setup.\")", + "source": "import os, subprocess, sys\n\n_ON_COLAB = \"google.colab\" in sys.modules or os.environ.get(\"COLAB_RELEASE_TAG\")\n\nif _ON_COLAB:\n print(\"Colab detected \u2014 installing BraggTrack + sample data...\")\n subprocess.check_call([\n sys.executable, \"-m\", \"pip\", \"install\", \"-q\",\n \"braggtrack[notebook] @ git+https://github.com/BASE-Laboratory/BraggTrack.git\",\n ])\n if not os.path.isdir(\"data/sample_operando\"):\n subprocess.check_call([\n \"git\", \"clone\", \"--depth=1\", \"--filter=blob:none\", \"--sparse\",\n \"https://github.com/BASE-Laboratory/BraggTrack.git\", \"_braggtrack_repo\",\n ])\n subprocess.check_call(\n [\"git\", \"sparse-checkout\", \"set\", \"data/sample_operando\"],\n cwd=\"_braggtrack_repo\",\n )\n os.makedirs(\"data\", exist_ok=True)\n os.rename(\"_braggtrack_repo/data/sample_operando\", \"data/sample_operando\")\n subprocess.check_call([\"rm\", \"-rf\", \"_braggtrack_repo\"])\n os.environ.setdefault(\"BRAGGTRACK_DATA_ROOT\", os.path.abspath(\"data/sample_operando\"))\n print(\"Done.\")\nelse:\n print(\"Local environment \u2014 skipping Colab setup.\")", "metadata": {}, "execution_count": null, "outputs": [] @@ -31,7 +31,7 @@ { "cell_type": "markdown", "id": "12606936", - "source": "## 1 — Load real data\n\nRead the largest 3D numeric dataset from each H5 file (bypasses the fixed NeXus path shortlist).", + "source": "## 1 \u2014 Load real data\n\nRead the largest 3D numeric dataset from each H5 file (bypasses the fixed NeXus path shortlist).", "metadata": {} }, { @@ -45,7 +45,7 @@ { "cell_type": "markdown", "id": "61e5cbd4", - "source": "## 2 — Run both segmentation methods\n\n### Classical pipeline\nOtsu → LoG → h-maxima → seeded watershed → remove small → fill holes → merge nearby → relabel.", + "source": "## 2 \u2014 Run both segmentation methods\n\n### Classical pipeline\nOtsu \u2192 LoG \u2192 h-maxima \u2192 seeded watershed \u2192 remove small \u2192 fill holes \u2192 merge nearby \u2192 relabel.", "metadata": {} }, { @@ -59,7 +59,7 @@ { "cell_type": "markdown", "id": "a5cfd4e0", - "source": "### DINO pipeline\nDINOv3 patch features → PCA → HDBSCAN → upsample → 3D stitch → Otsu foreground mask → post-process.\n\nThe post-processing (remove small, fill holes, merge nearby, relabel) is identical to keep the comparison fair.", + "source": "### DINO pipeline\nDINOv3 patch features \u2192 PCA \u2192 HDBSCAN \u2192 upsample \u2192 3D stitch \u2192 Otsu foreground mask \u2192 post-process.\n\nThe post-processing (remove small, fill holes, merge nearby, relabel) is identical to keep the comparison fair.", "metadata": {} }, { @@ -73,7 +73,7 @@ { "cell_type": "markdown", "id": "1b93c867", - "source": "## 3 — Spot count comparison", + "source": "## 3 \u2014 Spot count comparison", "metadata": {} }, { @@ -95,13 +95,13 @@ { "cell_type": "markdown", "id": "dfd3d99c", - "source": "## 4 — Visual comparison: tri-axis label projections\n\nSide-by-side label overlays for each scan, projected along all three physical axes (μ, χ, d). Each row is a scan; left column = classical, right column = DINO.", + "source": "## 4 \u2014 Visual comparison: tri-axis label projections\n\nSide-by-side label overlays for each scan, projected along all three physical axes (\u03bc, \u03c7, d). Each row is a scan; left column = classical, right column = DINO.", "metadata": {} }, { "cell_type": "code", "id": "569f7b23", - "source": "# Build a shared colormap large enough for both methods.\nmax_labels = max(\n max(int(l.max()) for l in classical_labels),\n max(int(l.max()) for l in dino_labels),\n) + 1\nrng_cm = np.random.RandomState(42)\nlabel_colors = np.zeros((max_labels, 4))\nlabel_colors[0] = [0, 0, 0, 0]\nfor i in range(1, max_labels):\n label_colors[i] = [*rng_cm.uniform(0.2, 0.95, 3), 0.65]\nlabel_cmap = ListedColormap(label_colors)\n\naxis_info = [\n (0, \"MIP along mu\", \"chi\", \"d\"),\n (1, \"MIP along chi\", \"d\", \"mu\"),\n (2, \"MIP along d\", \"chi\", \"mu\"),\n]\n\nfig, axes = plt.subplots(len(scans), 6, figsize=(22, len(scans) * 3.5))\n\nfor row, (s, v, c_lab, d_lab) in enumerate(zip(scans, all_volumes, classical_labels, dino_labels)):\n for col_offset, (method_name, labels) in enumerate([(\"Classical\", c_lab), (\"DINO\", d_lab)]):\n for ax_idx, (axis_id, title, xlabel, ylabel) in enumerate(axis_info):\n ax = axes[row, col_offset * 3 + ax_idx]\n mip = v.max(axis=axis_id)\n floor = otsu_floor_from_mip(v, axis=axis_id)\n proj_l = label_projection_by_intensity(v, labels, axis=axis_id, mip_floor=floor)\n\n vlo, vhi = np.percentile(mip, [1, 99.9])\n ax.imshow(mip, cmap=\"gray\", vmin=vlo, vmax=vhi)\n mask = np.ma.masked_where(proj_l == 0, proj_l)\n ax.imshow(mask, cmap=label_cmap, interpolation=\"nearest\", vmin=0, vmax=max_labels - 1)\n\n if row == 0:\n ax.set_title(f\"{method_name}\\n{title}\", fontsize=9)\n ax.tick_params(labelsize=6)\n if ax_idx == 0 and col_offset == 0:\n n_c = int(c_lab.max())\n n_d = int(d_lab.max())\n ax.set_ylabel(f\"{s.scan_name}\\nC={n_c} D={n_d}\", fontsize=9)\n\nplt.suptitle(\"Classical (left 3 cols) vs DINO (right 3 cols) — tri-axis label projection\", y=1.01, fontsize=13)\nplt.tight_layout()\nplt.show()", + "source": "# Build a shared colormap large enough for both methods.\nmax_labels = max(\n max(int(l.max()) for l in classical_labels),\n max(int(l.max()) for l in dino_labels),\n) + 1\nrng_cm = np.random.RandomState(42)\nlabel_colors = np.zeros((max_labels, 4))\nlabel_colors[0] = [0, 0, 0, 0]\nfor i in range(1, max_labels):\n label_colors[i] = [*rng_cm.uniform(0.2, 0.95, 3), 0.65]\nlabel_cmap = ListedColormap(label_colors)\n\naxis_info = [\n (0, \"MIP along mu\", \"chi\", \"d\"),\n (1, \"MIP along chi\", \"d\", \"mu\"),\n (2, \"MIP along d\", \"chi\", \"mu\"),\n]\n\nfig, axes = plt.subplots(len(scans), 6, figsize=(22, len(scans) * 3.5))\n\nfor row, (s, v, c_lab, d_lab) in enumerate(zip(scans, all_volumes, classical_labels, dino_labels)):\n for col_offset, (method_name, labels) in enumerate([(\"Classical\", c_lab), (\"DINO\", d_lab)]):\n for ax_idx, (axis_id, title, xlabel, ylabel) in enumerate(axis_info):\n ax = axes[row, col_offset * 3 + ax_idx]\n mip = v.max(axis=axis_id)\n floor = otsu_floor_from_mip(v, axis=axis_id)\n proj_l = label_projection_by_intensity(v, labels, axis=axis_id, mip_floor=floor)\n\n vlo, vhi = np.percentile(mip, [1, 99.9])\n ax.imshow(mip, cmap=\"gray\", vmin=vlo, vmax=vhi)\n mask = np.ma.masked_where(proj_l == 0, proj_l)\n ax.imshow(mask, cmap=label_cmap, interpolation=\"nearest\", vmin=0, vmax=max_labels - 1)\n\n if row == 0:\n ax.set_title(f\"{method_name}\\n{title}\", fontsize=9)\n ax.tick_params(labelsize=6)\n if ax_idx == 0 and col_offset == 0:\n n_c = int(c_lab.max())\n n_d = int(d_lab.max())\n ax.set_ylabel(f\"{s.scan_name}\\nC={n_c} D={n_d}\", fontsize=9)\n\nplt.suptitle(\"Classical (left 3 cols) vs DINO (right 3 cols) \u2014 tri-axis label projection\", y=1.01, fontsize=13)\nplt.tight_layout()\nplt.show()", "metadata": {}, "execution_count": null, "outputs": [] @@ -109,7 +109,7 @@ { "cell_type": "markdown", "id": "49b9b0e3", - "source": "## 5 — Instance feature comparison\n\nCompare the per-spot properties (voxel count, integrated intensity, centroid, eigenvalues) between the two methods.", + "source": "## 5 \u2014 Instance feature comparison\n\nCompare the per-spot properties (voxel count, integrated intensity, centroid, eigenvalues) between the two methods.", "metadata": {} }, { @@ -131,7 +131,7 @@ { "cell_type": "markdown", "id": "d3693a3f", - "source": "## 6 — Spatial overlap (Dice coefficient)\n\nFor each scan, compute the Dice coefficient between the binary foreground masks produced by the two methods. This measures how much the methods agree on *where* spots are, regardless of how they partition them into instances.", + "source": "## 6 \u2014 Spatial overlap (Dice coefficient)\n\nFor each scan, compute the Dice coefficient between the binary foreground masks produced by the two methods. This measures how much the methods agree on *where* spots are, regardless of how they partition them into instances.", "metadata": {} }, { @@ -145,7 +145,7 @@ { "cell_type": "markdown", "id": "4b7a8bbc", - "source": "## 7 — Centroid scatter: classical vs DINO\n\nPlot the centroids from both methods on the same axes. Matching centroids (spots found by both methods) will overlap; method-unique detections will stand alone.", + "source": "## 7 \u2014 Centroid scatter: classical vs DINO\n\nPlot the centroids from both methods on the same axes. Matching centroids (spots found by both methods) will overlap; method-unique detections will stand alone.", "metadata": {} }, { @@ -159,7 +159,7 @@ { "cell_type": "markdown", "id": "77d2e6e2", - "source": "## 8 — Consistency across scans\n\nA key motivation for DINO-based segmentation is consistency: the same physical spots should produce the same segmentation across consecutive scans. Compare the coefficient of variation (std/mean) of spot counts across scans for each method.", + "source": "## 8 \u2014 Consistency across scans\n\nA key motivation for DINO-based segmentation is consistency: the same physical spots should produce the same segmentation across consecutive scans. Compare the coefficient of variation (std/mean) of spot counts across scans for each method.", "metadata": {} }, { @@ -173,13 +173,13 @@ { "cell_type": "markdown", "id": "ec6c4ef9", - "source": "## 9 — Per-scan feature tables\n\nFull feature tables for both methods on scan 1, for detailed inspection.", + "source": "## 9 \u2014 Per-scan feature tables\n\nFull feature tables for both methods on scan 1, for detailed inspection.", "metadata": {} }, { "cell_type": "code", "id": "49e37ae2", - "source": "cols = [\"label\", \"voxel_count\", \"integrated_intensity\",\n \"centroid_mu\", \"centroid_chi\", \"centroid_d\",\n \"eig_1\", \"eig_2\", \"eig_3\"]\n\nprint(\"=== Classical — scan0001 ===\")\ndisplay(pd.DataFrame(classical_features[0])[cols]) if classical_features[0] else print(\"(no spots)\")\n\nprint(\"\\n=== DINO — scan0001 ===\")\ndisplay(pd.DataFrame(dino_features[0])[cols]) if dino_features[0] else print(\"(no spots)\")", + "source": "cols = [\"label\", \"voxel_count\", \"integrated_intensity\",\n \"centroid_mu\", \"centroid_chi\", \"centroid_d\",\n \"eig_1\", \"eig_2\", \"eig_3\"]\n\nprint(\"=== Classical \u2014 scan0001 ===\")\ndisplay(pd.DataFrame(classical_features[0])[cols]) if classical_features[0] else print(\"(no spots)\")\n\nprint(\"\\n=== DINO \u2014 scan0001 ===\")\ndisplay(pd.DataFrame(dino_features[0])[cols]) if dino_features[0] else print(\"(no spots)\")", "metadata": {}, "execution_count": null, "outputs": [] @@ -187,7 +187,7 @@ { "cell_type": "markdown", "id": "a64fefce", - "source": "## 10 — Notes and next steps\n\n**What the mock backend shows:** The mock DINO backend produces hash-based random features, so the clustering is not semantically meaningful — it tests the *pipeline plumbing* (slice extraction → PCA → HDBSCAN → stitching → foreground mask) but not the quality of learned representations.\n\n**What changes with real DINOv3 weights:**\n- Set `BRAGGTRACK_DINO_BACKEND=torch` (requires `torch` + `transformers` + GPU)\n- The encoder extracts genuine patch-level features where similar textures cluster together\n- Expect better instance separation without hand-tuned LoG/watershed parameters\n- The same model should work across different beamlines and detectors\n\n**CLI equivalent:**\n```bash\n# Classical\nbraggtrack-segment-dataset --method classical --outdir artifacts/classical\n\n# DINO (mock)\nbraggtrack-segment-dataset --method dino --dino-backend mock --outdir artifacts/dino\n\n# DINO (real weights)\nbraggtrack-segment-dataset --method dino --dino-backend torch --outdir artifacts/dino_real\n```", + "source": "## 10 \u2014 Notes and next steps\n\n**What the mock backend shows:** The mock DINO backend produces hash-based random features, so the clustering is not semantically meaningful \u2014 it tests the *pipeline plumbing* (slice extraction \u2192 PCA \u2192 HDBSCAN \u2192 stitching \u2192 foreground mask) but not the quality of learned representations.\n\n**What changes with real DINOv3 weights:**\n- Set `BRAGGTRACK_DINO_BACKEND=torch` (requires `torch` + `transformers` + GPU)\n- The encoder extracts genuine patch-level features where similar textures cluster together\n- Expect better instance separation without hand-tuned LoG/watershed parameters\n- The same model should work across different beamlines and detectors\n\n**CLI equivalent:**\n```bash\n# Classical\nbraggtrack-segment-dataset --method classical --outdir artifacts/classical\n\n# DINO (mock)\nbraggtrack-segment-dataset --method dino --dino-backend mock --outdir artifacts/dino\n\n# DINO (real weights)\nbraggtrack-segment-dataset --method dino --dino-backend torch --outdir artifacts/dino_real\n```", "metadata": {} } ],