Skip to content

Accept GeoDataFrames in interpolation + expose on .xrs accessor with coregister#3481

Merged
brendancol merged 5 commits into
mainfrom
issue-3480
Jun 24, 2026
Merged

Accept GeoDataFrames in interpolation + expose on .xrs accessor with coregister#3481
brendancol merged 5 commits into
mainfrom
issue-3480

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3480

Lets the interpolation functions consume a GeoDataFrame the way rasterize does, and exposes them on the .xrs accessor so a surface lands on the caller raster's grid:

  • kde, idw, kriging, and spline accept a GeoDataFrame of Point geometries as the first argument. column picks the value to interpolate (z for idw/kriging/spline, per-point weights for kde), defaulting to the first numeric column. The raw x, y, z array form is unchanged.
  • The four operations are on both .xrs accessor classes. They pass the caller array as template, so grid, chunks, CRS, and attrs are inherited. The existing array-form accessor methods (idw/kriging/spline) keep working; kde is new on the accessor.
  • coregister=True reprojects a GeoDataFrame's points from their CRS into the caller's CRS before interpolating, the vector analog of open_geotiff(coregister=True). With coregister=False (default), a genuine CRS mismatch raises, matching rasterize's check_crs.

So the headline call works:

surface = raster.xrs.kde(points_gdf, coregister=True)

geopandas stays an optional, lazily-imported dependency.

Note on reprojection

The plan floated reusing reproject._projections.transform_points for the point transform. The implementation uses GeoDataFrame.to_crs(target) instead: it is the idiomatic geopandas path, handles datum shifts and axis order through pyproj, and avoids reimplementing coordinate transforms. The coregistered result is bit-identical to interpolating the pre-projected points in the notebook check.

Backends

numpy / dask+numpy verified directly; cupy / dask+cupy inherit from the unchanged ArrayTypeFunctionMapping dispatch (the GeoDataFrame branch only touches input parsing, which runs before backend dispatch). Output backend and chunks follow the caller raster's template.

Test plan

  • GeoDataFrame form matches the array form for all four functions (numpy + dask templates)
  • Accessor methods match the standalone calls and inherit grid/chunks/CRS
  • coregister=True (EPSG:4326 points onto an EPSG:3857 grid) matches pre-projected points
  • CRS mismatch raises without coregister; missing CRS on either side is a no-op
  • Edge cases: non-Point geometry, missing/unknown column, positional y/z with a GeoDataFrame, kde weights+column conflict
  • Legacy array-form accessor signature still works
  • Existing test_interpolation.py / test_kde.py / test_accessor.py pass unchanged
  • User guide notebook executes end to end

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Accept GeoDataFrames in interpolation + expose on .xrs accessor with coregister

Reviewed the full changed files in a worktree on the PR branch. The design mirrors rasterize cleanly and keeps the array path intact. Findings are minor.

Blockers

None.

Suggestions

  • xrspatial/interpolate/_vector.py:38 (_resolve_value_column): an explicit column pointing at a non-numeric column surfaces a raw ValueError: could not convert string to float: 'a' from numpy rather than a function-prefixed message. The auto-pick path already gives a clean "no numeric column" error; consider catching the cast failure and re-raising as f"{func_name}(): column {column!r} is not numeric" so both paths read consistently. Low priority since it still raises and does not corrupt output.

Nits

  • Test coverage exercises numpy and dask templates only. cupy / dask+cupy are not asserted. This is acceptable because the GeoDataFrame branch runs entirely before ArrayTypeFunctionMapping dispatch, so it is backend-independent, but a single cupy smoke test (guarded by cuda_and_cupy_available) would document that explicitly.
  • No benchmark added. Reasonable here: the change is input parsing plus an optional to_crs, not a new compute path, so the existing interpolation benchmarks still cover the hot loop.

What looks good

  • The to_crs-based coregistration is the right call over a hand-rolled transform: it goes through pyproj for datum shifts and axis order, and the notebook check shows the coregistered result is identical to interpolating pre-projected points.
  • Backward compatibility is preserved: the legacy da.xrs.idw(x, y, z) array signature still works, and there is a test pinning it.
  • CRS handling matches rasterize semantics (mismatch raises, missing CRS on either side is a no-op), so the behavior is predictable for anyone who already uses rasterize.
  • Edge cases raise clear errors: MultiPoint, empty frame, missing column, positional y/z with a GeoDataFrame, and the kde weights+column conflict are all covered by tests.
  • Caller chunks are inherited through template, verified for all four functions on a dask-backed grid.

Checklist

  • Algorithm matches reference (no algorithm change; parsing only)
  • All implemented backends produce consistent results (parsing precedes dispatch)
  • NaN handling is correct (existing validate_points filtering unchanged)
  • Edge cases are covered by tests
  • Dask chunk boundaries handled correctly (inherited from template)
  • No premature materialization or unnecessary copies
  • Benchmark not needed
  • README feature matrix updated
  • Docstrings present and accurate

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after 7c4333b)

Re-reviewed the follow-up commit against the prior findings.

  • Suggestion (non-numeric column error) — fixed. _resolve_value_column now catches the cast failure on an explicit column and re-raises f"{func_name}(): column {column!r} is not numeric", matching the clean message the auto-pick path already gave. Covered by test_non_numeric_column_raises.
  • Nit (cupy coverage) — addressed. test_geodataframe_parity_cupy (guarded by cuda_and_cupy_available) asserts the GeoDataFrame branch matches numpy on a cupy template. It runs and passes on a CUDA box.
  • Nit (benchmark) — dismissed with reason: the change is input parsing plus an optional to_crs, not a new compute path, so the existing interpolation benchmarks still cover the hot loop.

No new findings. Remaining items are dismissed-with-reason only.

@brendancol brendancol merged commit 775e418 into main Jun 24, 2026
12 checks passed
@brendancol brendancol deleted the issue-3480 branch June 25, 2026 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Accept GeoDataFrames in the interpolation functions and expose them on the .xrs accessor with coregister

1 participant