Vectorise _partition_bumps to fix O(n_chunks x count) dask graph build (#3612) by brendancol · Pull Request #3617 · xarray-contrib/xarray-spatial

brendancol · 2026-07-02T15:48:39Z

Closes #3612.

_partition_bumps assigned each bump to the dask chunk holding its centre pixel by looping over every chunk and rescanning the full locs array with a fresh boolean mask. That is O(n_chunks x n_bumps), and it runs eagerly during graph construction, before any .compute(). On a finely chunked raster the rescan dominates the runtime.

This replaces the double loop with a single np.searchsorted pass that buckets every bump into its chunk, then groups by chunk. Complexity drops to about O(n_bumps log n_chunks). The output is byte-identical to the old code, including chunk-local coordinates and intra-chunk order, so compute results do not change.

Measured on the standalone helper (count = 100k):

n_chunks	old	new	speedup
100	0.036 s	0.010 s	3.4x
1,600	0.505 s	0.021 s	24.5x
6,400	2.156 s	0.041 s	52.1x
25,600	8.325 s	0.097 s	85.5x

Backends

The change is in _partition_bumps, shared by _bump_dask_numpy and _bump_dask_cupy. The numpy and cupy (non-dask) paths do not touch this helper and are unaffected. Verified on a CUDA host: the cupy and dask+cupy tests pass.

Benchmark

Adds benchmarks/benchmarks/bump.py. The dask case is finely chunked and times graph construction only, so a return of the per-chunk rescan would show up as a regression.

Test plan

pytest xrspatial/tests/test_bump.py (19 passed, includes cupy and dask+cupy on GPU)
New test_partition_bumps_matches_bruteforce checks the vectorised partition against the original boolean-mask reference across an irregular chunk layout
Benchmark imports and runs for numpy and dask

Found by /sweep-performance.

…3612) _partition_bumps rescanned the full locs array with a boolean mask once per chunk, so dask graph construction ran in O(n_chunks * n_bumps) before any compute. Bucket each bump into its chunk in a single searchsorted pass instead (byte-identical partitions, ~85x faster at 25.6k chunks). Add a benchmark (benchmarks/benchmarks/bump.py) and a regression test that locks the partition against the original boolean-mask reference. Sweep: performance. Backends: dask+numpy, dask+cupy (shared helper).

brendancol · 2026-07-02T15:50:50Z

PR Review: Vectorise _partition_bumps to fix O(n_chunks x count) dask graph build (#3612)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

xrspatial/bump.py _partition_bumps: the searchsorted(..., side='right') - 1 bucketing assumes every loc is in [0, W) and [0, H). An out-of-range coordinate (x >= W or negative) would map to a chunk index outside the grid and silently land in the wrong (a, b) bucket or a nonexistent key, whereas the old boolean-mask version simply dropped it. This is not reachable from bump() (locs come from np.random.choice(range(w/h)), so they are always in range), and this is a private helper, so it is not a real bug. A one-line comment stating the in-range precondition would make the assumption explicit for future callers.
heights[idx].copy(): fancy indexing already returns a fresh copy, so the trailing .copy() is redundant. The original code had the same redundant .copy() on the masked path, so keeping it preserves style; only worth changing if you are touching the line anyway.

What looks good

The vectorised partition is byte-identical to the original, including chunk-local coordinates and intra-chunk order (stable argsort), so compute results do not change. The new test_partition_bumps_matches_bruteforce locks this against the original boolean-mask reference over an irregular chunk layout with empty chunks.
Empty locs is guarded and returns an all-None dict, matching the old behaviour.
The change is confined to the shared _partition_bumps helper; the numpy and cupy non-dask paths are untouched. dask+numpy and dask+cupy both use the helper.
A benchmark was added; its dask case is finely chunked and times graph construction only, so a return of the per-chunk rescan would register as a regression.

Checklist

Algorithm matches reference (byte-identical to prior partition, verified)
All implemented backends produce consistent results (cupy + dask+cupy pass on GPU)
NaN handling is correct (unchanged; helper does not touch NaN semantics)
Edge cases are covered by tests (irregular chunks, empty chunks)
Dask chunk boundaries handled correctly (half-open assignment preserved)
No premature materialization or unnecessary copies
Benchmark exists
README feature matrix updated (not applicable, no new public function)
Docstrings present and accurate (helper docstring unchanged and still accurate)

brendancol · 2026-07-02T15:51:44Z

Follow-up on the review nits:

Nit 1 (in-range precondition): applied in eb32e89 as a comment on _partition_bumps documenting that callers pass locs clamped to the raster bounds.
Nit 2 (redundant heights[idx].copy()): kept intentionally. The copy keeps the dask graph from holding a view into the caller's heights array; the cosmetic gain is not worth the aliasing risk.

No blockers or suggestions were raised. Tests still green (19 passed, includes cupy + dask+cupy on GPU).

…e-bump-2026-07-02

Document _partition_bumps in-range precondition (review nit, #3612)

eb32e89

Merge remote-tracking branch 'origin/main' into deep-sweep-performanc…

fb5009c

…e-bump-2026-07-02

brendancol merged commit 05fb3f7 into main Jul 2, 2026
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vectorise _partition_bumps to fix O(n_chunks x count) dask graph build (#3612)#3617

Vectorise _partition_bumps to fix O(n_chunks x count) dask graph build (#3612)#3617
brendancol merged 3 commits into
mainfrom
deep-sweep-performance-bump-2026-07-02

brendancol commented Jul 2, 2026

Uh oh!

brendancol commented Jul 2, 2026

Uh oh!

brendancol commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

brendancol commented Jul 2, 2026

Backends

Benchmark

Test plan

Uh oh!

brendancol commented Jul 2, 2026

PR Review: Vectorise _partition_bumps to fix O(n_chunks x count) dask graph build (#3612)

Blockers (must fix before merge)

Suggestions (should fix, not blocking)

Nits (optional improvements)

What looks good

Checklist

Uh oh!

brendancol commented Jul 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant