Skip to content

Fix EUCLIDEAN proximity/allocation/direction dropping targets at exact max_distance#3443

Merged
brendancol merged 4 commits into
mainfrom
deep-sweep-accuracy-proximity-2026-06-22-01
Jun 22, 2026
Merged

Fix EUCLIDEAN proximity/allocation/direction dropping targets at exact max_distance#3443
brendancol merged 4 commits into
mainfrom
deep-sweep-accuracy-proximity-2026-06-22-01

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3442

What changed

  • For distance_metric='EUCLIDEAN', the numpy and dask+numpy cKDTree paths dropped a target sitting exactly at max_distance to NaN, while cupy and dask+cupy (brute force) kept it. The clearest case is max_distance=0: a target pixel is distance 0 from itself and should qualify, but numpy/dask returned NaN there.
  • Root cause: cKDTree's distance_upper_bound is exclusive, and for p=2 it compares squared distances. The old one-ulp widen of 0 underflows when squared, collapsing the bound back to max_distance. _kdtree_query_lowest_index did not widen at all.
  • Routed every cKDTree call through one _inclusive_upper_bound helper that widens by a relative epsilon with an absolute floor, so the bound survives squaring for p=2 and stays inclusive for p=1.

Backend coverage

numpy and dask+numpy were the affected backends and are fixed. cupy and dask+cupy already behaved correctly (brute force); the new tests pin all four together.

Test plan

  • max_distance=0 keeps the target pixel (proximity 0, allocation target value, direction 0) on numpy, dask+numpy, cupy, dask+cupy, for EUCLIDEAN and MANHATTAN
  • exact-distance boundary pixel qualifies on numpy and dask+numpy
  • numpy matches dask+numpy at max_distance=0 for all three functions
  • full test_proximity.py suite passes (543 tests)

…3442)

cKDTree's distance_upper_bound is exclusive. The numpy/dask KDTree paths
widened it by one ulp to emulate the inclusive dist <= max_distance check
the brute-force/CUDA backends use, but for p=2 (EUCLIDEAN) cKDTree squares
the bound internally and nextafter(0, inf)**2 underflows to 0, so a target
exactly at the bound (most visibly a target pixel at max_distance=0) was
dropped to NaN. _kdtree_query_lowest_index passed the raw exclusive bound
with no widening at all.

Route every cKDTree call through a shared _inclusive_upper_bound helper
that widens by a relative epsilon with an absolute floor, robust for p=1
and p=2 including max_distance=0. numpy/dask now match cupy/dask+cupy.
)

Cover max_distance=0 keeping the target pixel and an exact-distance
boundary pixel, across numpy/dask+numpy/cupy/dask+cupy for proximity,
allocation, and direction.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 22, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Fix EUCLIDEAN proximity/allocation/direction dropping targets at exact max_distance

Blockers

None.

Suggestions

None blocking. One thing worth a sentence in the helper docstring: the widen is relative (4*eps*max(|md|, 1.0)), so at large max_distance it could in principle include a target whose true distance is within ~8*eps*md of the bound. That gap is below float64 resolution at that magnitude, so the two distances are not distinguishable anyway and the behavior is correct. Calling that out would save the next reader the same check I just did.

Nits

  • The np.isfinite ... else np.inf guard in _tiled_chunk_query (proximity.py:1146) is now redundant since _inclusive_upper_bound handles inf, but it is pre-existing and harmless. Leave it.

What looks good

  • The root cause is correct: cKDTree squares distance_upper_bound for p=2, and nextafter(0, inf)**2 underflows to 0, so the old widen collapsed at max_distance=0. Verified directly against scipy.
  • One shared helper now feeds all three cKDTree call sites (_process_numpy_kdtree, both queries in _kdtree_query_lowest_index), so the numpy and dask paths can't drift apart again.
  • Checked over-inclusion at md=0, md=1, and md=1000: includes exactly-at-bound, excludes just-below, never pulls in the next real distance.
  • The tie-break in _kdtree_query_lowest_index is preserved: both k=2 neighbors use the same widened bound, so the equidistant test is unaffected.
  • Tests cover all four backends for proximity/allocation/direction at max_distance=0 plus an exact-distance boundary pixel, and pin numpy against dask+numpy.

Checklist

  • Algorithm matches reference (cKDTree exclusive-bound semantics verified against scipy)
  • All implemented backends produce consistent results (numpy/dask/cupy/dask+cupy)
  • NaN handling is correct (inf/unbounded returns inf; non-target stays NaN)
  • Edge cases covered (max_distance=0, exact-distance boundary)
  • Dask chunk boundaries handled (bounded map_overlap + global/tiled KDTree paths all routed through the helper)
  • No premature materialization or unnecessary copies
  • Benchmark not needed (numerical bug fix, no perf-critical change)
  • README not applicable (no new function, no backend change)
  • Docstrings present and accurate

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review after addressing the prior pass.

Disposition of the earlier findings:

  • Suggestion (document the relative-widen behavior in the helper docstring): fixed in 04c6c73. Added a sentence noting that a target within ~8epsmax_distance of the bound could qualify, but the two distances are indistinguishable in float64 at that magnitude, so the result is still correct.
  • Nit (redundant np.isfinite ... else np.inf guard in _tiled_chunk_query): dismissed. It is pre-existing, harmless, and outside the scope of this fix; touching it would be an unrelated change.

No new findings. The follow-up was a docstring-only edit; the 25 regression cases still pass and flake8 is clean on the changed module.

@brendancol brendancol merged commit 4be7b9b into main Jun 22, 2026
10 checks passed
@brendancol brendancol deleted the deep-sweep-accuracy-proximity-2026-06-22-01 branch June 25, 2026 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

EUCLIDEAN proximity/allocation/direction: KDTree backends return NaN at exact max_distance (incl. max_distance=0)

1 participant