feat(centroids): add return_area + SpatialData persist to get_centroids#1150
Open
timtreis wants to merge 7 commits into
Open
feat(centroids): add return_area + SpatialData persist to get_centroids#1150timtreis wants to merge 7 commits into
timtreis wants to merge 7 commits into
Conversation
Surface the per-instance area that the labels bincount already computes (and geometric area for shapes), and add a SpatialData dispatch that writes centroids/area into the element's annotating table, squidpy-style. - `return_area` on all element overloads: labels return the pixel/voxel count (free, already computed); shapes return `pi*r**2` for circles and `geometry.area` for polygons; points raise. Carried as a feature column on the returned Points element. - Harmonise the three element overload signatures (they previously disagreed on `return_background`). - New `get_centroids(sdata, element_name, ..., persist_as="adata")` dispatch: resolves the element's annotating table and writes `obsm["spatial"]` (x, y[, z]) and `obs["area"]` at the element's rows, in place. Raises (pointing to `persist_as="Points"`) when no table annotates the element. No standalone AnnData is ever created. - Element-level `persist_as="adata"` raises, directing to the SpatialData form (which can resolve the table). - Fast path: the table-writing path applies the coordinate transform to the tiny per-instance centroid array in-memory and short-circuits when it is an identity, avoiding the dask `transform()` round-trip. `coordinate_system=None` returns intrinsic coordinates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
for more information, see https://pre-commit.ci
13bde10 to
a784941
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## chore/python-3.12-3.14 #1150 +/- ##
==========================================================
- Coverage 92.43% 92.29% -0.14%
==========================================================
Files 51 51
Lines 7814 7906 +92
==========================================================
+ Hits 7223 7297 +74
- Misses 591 609 +18
🚀 New features to boost your workflow:
|
…turn type - narrow transformation to BaseTransformation before to_affine_matrix - assert coordinate_system is not None after _validate_persist_args in the element overloads (None is rejected there for persist_as != 'adata') - cast element-dispatch get_centroids results to DaskDataFrame at the two callers (vectorize, datasets); only the SpatialData overload returns SpatialData
…trim docs Correctness: - _write_centroids_into_table: align instance ids via a single get_indexer; raise on a total miss (instance_key dtype != element index) instead of silently writing an all-NaN obsm['spatial']; partial misses (background, filtered instances) still NaN-fill as documented. - refuse to overwrite obsm['spatial'] when an existing array has a different width (would wipe coordinates of other regions sharing the table). - validate coordinate_system on the persist_as='adata' path too. Cleanup (no behavior change): - collapse the three near-identical element overloads into one stacked singledispatch registration delegating to _intrinsic_centroid_frame. - trim verbose docstrings/comments. Tests: instance_key dtype mismatch raises; dimension mismatch refuses to overwrite; invalid-element message updated.
…ds method - persist_as='adata' gains inplace: bool = True (mirrors sanitize_table): inplace=True mutates the resolved table and returns None; inplace=False copies only that AnnData, writes into the copy, returns it (sdata untouched). No whole-SpatialData copy. - add SpatialData.get_centroids(element_name, ...) method delegating to the module function (mirrors sdata.aggregate), so both sd.get_centroids(sdata, ...) and sdata.get_centroids(...) work. Calls the named _get_centroids_sdata overload directly so the full signature type-checks. - widen return type to DaskDataFrame | AnnData | None; update tests.
…edup scatter Correctness hardening (review round 2): - raise a clear error when the element index is non-unique (was a cryptic pandas InvalidIndexError from get_indexer on e.g. duplicate-index shapes). - obsm width-mismatch guard is now 1-D-safe (compare full .shape instead of indexing .shape[1], which raised IndexError on a 1-D obsm['spatial']). - reword the total-miss error (it's 'no shared instances', not only a dtype bug). Cleanup: - reuse the existing _affine_matrix_multiplication helper in _transform_centroid_coords (drops the inline matmul; gives the dead helper a caller). - collapse the duplicated coords/area NaN-fill into a local _scatter(). - _points_from_centroids: df.assign instead of copy-then-mutate. - name the squidpy storage keys (_SPATIAL_KEY/_AREA_KEY) instead of magic strings.
…test assertion - _resolve_annotating_table returns annotators[0] directly (get_element_annotators already yields set[str]; the str() wrap was noise). - extract _assert_obsm_matches_points test helper to remove two near-identical written-obsm vs element-Points comparison blocks.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
get_centroidsalready computes per-label pixel counts (np.bincount) when finding label centroids, then discards them, we could store asarea. This PR surfaces that area into an element's annotating table, squidpy-style (obsm["spatial"]/obs["area"]), so downstream tools (e.g. squidpy) and repeated renders can reuse them instead of recomputing.stacked on #1151