Drop non-finite points and segments in kde/line_density (#3628)#3636
Open
brendancol wants to merge 2 commits into
Open
Drop non-finite points and segments in kde/line_density (#3628)#3636brendancol wants to merge 2 commits into
brendancol wants to merge 2 commits into
Conversation
A NaN coordinate previously poisoned the auto-computed extent (all backends returned an all-zero grid with NaN coords), made eager cupy gaussian return an all-NaN grid, and was silently skipped by the other backends. Filter non-finite coordinates, weights, and segment endpoints up front, matching the xrspatial.interpolate policy, and raise when a non-empty input has no finite entries left. Documented in both docstrings.
brendancol
commented
Jul 3, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review: Drop non-finite points and segments in kde/line_density (#3628)
Blockers (must fix before merge)
- none
Suggestions (should fix, not blocking)
- xrspatial/tests/test_kde.py (test_nan_point_auto_extent_not_poisoned): the assertion is only "coords finite and sum > 0". Since the filter runs before the extent is derived, the NaN-containing call should produce a grid identical to the clean two-point call; asserting that equality pins the behavior much harder than sum > 0. Same for the line_density auto-extent test.
- xrspatial/tests/test_kde.py: no test combines a NaN point with bandwidth='silverman'. Before this PR the NaN reached np.std/np.percentile and made the bandwidth NaN; the filter fixes that path too, but nothing guards it against regression.
Nits (optional improvements)
- The GeoDataFrame input path inherits the filter (it runs after points_from_geodataframe), but no test feeds a GeoDataFrame with a NaN-coordinate point through kde. Optional since geopandas empty/invalid geometries may be caught earlier; fine to leave if that turns out to be dead territory.
What looks good
- The filter placement is right: after the length checks (so mismatched inputs still raise on original lengths), before bandwidth selection and extent derivation (so both see only finite data), and before backend dispatch (so all four backends behave identically).
- The empty-input case is preserved:
valid.all()on an empty array is True, sokde([], [], template=...)behaves exactly as before, and the new ValueError only fires when a non-empty input loses everything. - Verified on this CUDA host: numpy, cupy, dask+numpy, and dask+cupy now return identical finite output for a NaN-containing input; before the fix eager cupy gaussian returned a 256-pixel all-NaN grid.
- Policy matches interpolate/_validation.py rather than inventing a new convention, and both docstrings say what happens.
Checklist
- Algorithm matches reference/paper (n/a, input validation change)
- All implemented backends produce consistent results (verified all four on this host)
- NaN handling is correct (that is the PR)
- Edge cases are covered by tests (NaN/Inf coords, NaN weights, all-NaN raise, auto extent, cupy parity, line_density variants)
- Dask chunk boundaries handled correctly (filter is pre-dispatch, no chunk interaction)
- No premature materialization or unnecessary copies (one boolean mask; fancy-index only when something is dropped)
- Benchmark exists or is not needed (validation-only, negligible cost)
- README feature matrix updated (n/a, no API change)
- Docstrings present and accurate (both updated)
…taFrame NaN coverage (#3628)
brendancol
commented
Jul 3, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review follow-up (#3628)
Disposition of the earlier findings, after 4ab2aae:
- Suggestion (exact-equality auto-extent assertions): fixed for both kde and line_density. The NaN-containing calls are now asserted byte-identical to the finite-only calls, coordinates included.
- Suggestion (silverman + NaN coverage): fixed. New test compares bandwidth='silverman' with and without a NaN point; results must be identical.
- Nit (GeoDataFrame NaN point): fixed. New test feeds a GeoDataFrame containing Point(nan, 0.5) and asserts the output matches the finite-only array-input call. geopandas is importorskip-gated like the interpolate tests.
66 tests pass locally, cupy paths included. No new findings. No blockers.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3628
kde()now drops points with non-finite coordinates or weights before doing anything else, andline_density()does the same for segments with non-finite endpoints or weights. This is the policyxrspatial.interpolatealready uses (interpolate/_validation.py).ValueErrorwhen a non-empty input has no finite entries left. Truly empty input behaves as before.Backends: the filter runs before dispatch, so numpy, cupy, dask+numpy, and dask+cupy now agree. Verified on this host with CUDA: all four backends return identical finite output for a NaN-containing input.
Test plan:
TestNonFiniteInputsclass: NaN/Inf points, NaN weights, NaN auto-extent, all-NaN raise, cupy gaussian parity, line_density variantstest_kde.pysuite: 64 passed