Skip to content

feat(preprocess): add spatially_diverse_sample for representative scan subsets#13

Merged
Garrett Bischof (gwbischof) merged 1 commit into
mainfrom
feat/spatially-diverse-sample
Jun 9, 2026
Merged

feat(preprocess): add spatially_diverse_sample for representative scan subsets#13
Garrett Bischof (gwbischof) merged 1 commit into
mainfrom
feat/spatially-diverse-sample

Conversation

@gwbischof

Copy link
Copy Markdown
Collaborator

Adds spatially_diverse_sample(x, y, n_target, rng=None) to ptychoml.preprocess.

Why

Bucket a 2-D scan's bounding box into a √n_target × √n_target grid and pick one random point per occupied cell, so a subset covers the scan area instead of clustering. (A uniform random sample can clump in one region, where coincidentally symmetric structure could make a wrong orientation score well — this guarantees coverage.) Returns sorted indices; pass a seeded rng for reproducibility.

Extracted from holoptycho's scripts/detect_orientation.py (_spatially_diverse_sample) so the orientation-detection CLI can import it. It's a generic position-sampling utility — not orientation-specific (reusable for normalization sampling, QC, etc.) — pure numpy, no I/O.

Tests

tests/test_preprocess.py (+5): returns-all when n_target ≥ n, sorted/in-range, one-per-occupied-cell, spread-not-cluster (100-point corner cluster + 3 lone far corners → all far corners chosen, cluster contributes ≤ a few), and seeded determinism.

pixi run --environment ci-py312 test   # 163 passed, 5 skipped

First of the H5 PRs (the orientation-detection CLI imports this); the in-pipeline live autodetect and the CLI itself follow in holoptycho.

…n subsets

Bucket a 2-D scan's bounding box into a ~sqrt(n_target) grid and pick one
random point per occupied cell, so a subset covers the whole scan area
instead of clustering (a uniform random sample can clump where coincidental
symmetry would mis-score an orientation). Returns sorted indices; optional
seeded rng for reproducibility.

Extracted from holoptycho/scripts/detect_orientation.py
(_spatially_diverse_sample) so the orientation-detection CLI imports it; it's
a generic position-sampling utility, not orientation-specific.

Tested in tests/test_preprocess.py: all-when-target>=n, sorted/in-range,
one-per-occupied-cell, spread-not-cluster (far corners always chosen, dense
cluster contributes few), and seeded determinism.

Co-authored-by: Himanshu Goel <4122621+himanshugoel2797@users.noreply.github.com>
@gwbischof Garrett Bischof (gwbischof) merged commit fc99bb6 into main Jun 9, 2026
5 checks passed
@gwbischof Garrett Bischof (gwbischof) deleted the feat/spatially-diverse-sample branch June 9, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant