Skip to content

Drop non-finite points and segments in kde/line_density (#3628)#3636

Open
brendancol wants to merge 2 commits into
mainfrom
issue-3628
Open

Drop non-finite points and segments in kde/line_density (#3628)#3636
brendancol wants to merge 2 commits into
mainfrom
issue-3628

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3628

  • kde() now drops points with non-finite coordinates or weights before doing anything else, and line_density() does the same for segments with non-finite endpoints or weights. This is the policy xrspatial.interpolate already uses (interpolate/_validation.py).
  • Fixes three inconsistent failure modes from one NaN row: the auto-computed extent going NaN (all-zero grid with NaN coords, every backend), eager cupy gaussian returning an all-NaN grid, and the other backends silently skipping the point.
  • Raises ValueError when a non-empty input has no finite entries left. Truly empty input behaves as before.
  • Both docstrings document the behavior.

Backends: the filter runs before dispatch, so numpy, cupy, dask+numpy, and dask+cupy now agree. Verified on this host with CUDA: all four backends return identical finite output for a NaN-containing input.

Test plan:

  • New TestNonFiniteInputs class: NaN/Inf points, NaN weights, NaN auto-extent, all-NaN raise, cupy gaussian parity, line_density variants
  • The cupy gaussian test fails on main (all-NaN grid), passes with the fix
  • Full test_kde.py suite: 64 passed

A NaN coordinate previously poisoned the auto-computed extent (all
backends returned an all-zero grid with NaN coords), made eager cupy
gaussian return an all-NaN grid, and was silently skipped by the other
backends. Filter non-finite coordinates, weights, and segment endpoints
up front, matching the xrspatial.interpolate policy, and raise when a
non-empty input has no finite entries left. Documented in both
docstrings.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Drop non-finite points and segments in kde/line_density (#3628)

Blockers (must fix before merge)

  • none

Suggestions (should fix, not blocking)

  • xrspatial/tests/test_kde.py (test_nan_point_auto_extent_not_poisoned): the assertion is only "coords finite and sum > 0". Since the filter runs before the extent is derived, the NaN-containing call should produce a grid identical to the clean two-point call; asserting that equality pins the behavior much harder than sum > 0. Same for the line_density auto-extent test.
  • xrspatial/tests/test_kde.py: no test combines a NaN point with bandwidth='silverman'. Before this PR the NaN reached np.std/np.percentile and made the bandwidth NaN; the filter fixes that path too, but nothing guards it against regression.

Nits (optional improvements)

  • The GeoDataFrame input path inherits the filter (it runs after points_from_geodataframe), but no test feeds a GeoDataFrame with a NaN-coordinate point through kde. Optional since geopandas empty/invalid geometries may be caught earlier; fine to leave if that turns out to be dead territory.

What looks good

  • The filter placement is right: after the length checks (so mismatched inputs still raise on original lengths), before bandwidth selection and extent derivation (so both see only finite data), and before backend dispatch (so all four backends behave identically).
  • The empty-input case is preserved: valid.all() on an empty array is True, so kde([], [], template=...) behaves exactly as before, and the new ValueError only fires when a non-empty input loses everything.
  • Verified on this CUDA host: numpy, cupy, dask+numpy, and dask+cupy now return identical finite output for a NaN-containing input; before the fix eager cupy gaussian returned a 256-pixel all-NaN grid.
  • Policy matches interpolate/_validation.py rather than inventing a new convention, and both docstrings say what happens.

Checklist

  • Algorithm matches reference/paper (n/a, input validation change)
  • All implemented backends produce consistent results (verified all four on this host)
  • NaN handling is correct (that is the PR)
  • Edge cases are covered by tests (NaN/Inf coords, NaN weights, all-NaN raise, auto extent, cupy parity, line_density variants)
  • Dask chunk boundaries handled correctly (filter is pre-dispatch, no chunk interaction)
  • No premature materialization or unnecessary copies (one boolean mask; fancy-index only when something is dropped)
  • Benchmark exists or is not needed (validation-only, negligible cost)
  • README feature matrix updated (n/a, no API change)
  • Docstrings present and accurate (both updated)

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review follow-up (#3628)

Disposition of the earlier findings, after 4ab2aae:

  • Suggestion (exact-equality auto-extent assertions): fixed for both kde and line_density. The NaN-containing calls are now asserted byte-identical to the finite-only calls, coordinates included.
  • Suggestion (silverman + NaN coverage): fixed. New test compares bandwidth='silverman' with and without a NaN point; results must be identical.
  • Nit (GeoDataFrame NaN point): fixed. New test feeds a GeoDataFrame containing Point(nan, 0.5) and asserts the output matches the finite-only array-input call. geopandas is importorskip-gated like the interpolate tests.

66 tests pass locally, cupy paths included. No new findings. No blockers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

kde/line_density NaN inputs: backends disagree and the auto extent collapses to zeros

1 participant