Skip to content

proximity: match GPU max_distance precision to the CPU brute-force (#3389)#3391

Merged
brendancol merged 2 commits into
mainfrom
deep-sweep-accuracy-proximity-2026-06-18-wt-ae97104
Jun 19, 2026
Merged

proximity: match GPU max_distance precision to the CPU brute-force (#3389)#3391
brendancol merged 2 commits into
mainfrom
deep-sweep-accuracy-proximity-2026-06-18-wt-ae97104

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3389

The CPU brute-force search compares a float32-rounded distance against max_distance (_distance returns np.float32(d)), while the CUDA kernel compared the float64 distance. A max_distance sitting in a target's float32 ulp gap then qualified the target on numpy/dask+numpy but rejected it on cupy/dask+cupy for the same input.

  • Round best_dist to float32 in _proximity_cuda_kernel before the best_dist <= max_distance range test, so the in-range decision matches the CPU path.
  • Add two regression tests pinning numpy == cupy at the float32 boundary: GREAT_CIRCLE proximity and EUCLIDEAN allocation (both run the brute-force on CPU and the kernel on GPU).

Backend coverage: the change is in the shared CUDA kernel, so it covers cupy and dask+cupy (which reuses the kernel per chunk). numpy and dask+numpy were already on the float32 side and are unchanged.

A separate, pre-existing edge is out of scope here: the bounded dask+numpy EUCLIDEAN allocation path can still return NaN for a target sitting exactly at a float32-ulp max_distance, via the halo-sizing path rather than the kernel. It predates this change and is noted in the sweep state for follow-up.

Test plan:

  • pytest xrspatial/tests/test_proximity.py (518 passed on a CUDA host, including cupy and dask+cupy)
  • New boundary tests fail before the fix (cupy returns NaN) and pass after

The CPU brute-force _distance casts to float32 before the
best_dist <= max_distance check, while the CUDA kernel compared the
float64 distance. A max_distance landing in a target's float32 ulp gap
qualified the target on numpy/dask+numpy but rejected it on
cupy/dask+cupy. Round best_dist to float32 in the kernel before the
range test so all four backends agree.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 19, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: proximity: match GPU max_distance precision to the CPU brute-force (#3389)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

  • The bounded-dask EUCLIDEAN-allocation float32-ulp edge is recorded in the PR body and the sweep state, but it is out of scope here. A one-line follow-up issue would make it easier to track than a buried note.

What looks good

  • Root cause is correct. _distance returns np.float32(d), so the CPU compares a float32-rounded distance against max_distance, while _proximity_cuda_kernel compared the float64 distance. Rounding best_dist to float32 before the <= max_distance test makes the in-range decision identical on both.
  • The stored PROXIMITY value does not change in practice: out is float32, so the previous float64 best_dist was already cast on store. The fix only moves the cast ahead of the comparison.
  • The tests pin the real divergence. Each sets max_distance strictly between the float32 and float64 distance (asserted with float(np.float32(d64)) < md < d64) and checks numpy against cupy. They fail before the fix (cupy returns NaN) and pass after.
  • Both brute-force output branches of the kernel are covered: GREAT_CIRCLE proximity and EUCLIDEAN allocation.
  • No dispatch change. The fix lives in the shared kernel, so cupy and dask+cupy (which reuses the kernel per chunk) both pick it up.

Checklist

  • Algorithm matches the CPU reference behaviour
  • All implemented backends produce consistent results at the boundary
  • NaN handling correct (in-range decision now matches across backends)
  • Edge case (float32 ulp boundary) covered by tests
  • Dask chunk boundaries unaffected (kernel-level change, dask+cupy reuses it)
  • No premature materialization or unnecessary copies
  • Benchmark not needed (no performance-relevant change)
  • README feature matrix unchanged (no new function, no backend change)
  • Docstrings unaffected (internal kernel comment added)

@brendancol

Copy link
Copy Markdown
Contributor Author

Filed the out-of-scope bounded-dask EUCLIDEAN-allocation float32-ulp edge as #3392 for tracking (review nit).

@brendancol brendancol merged commit 680f764 into main Jun 19, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

proximity: max_distance comparison precision differs between CPU and GPU backends

1 participant