Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .claude/sweep-metadata-state.csv
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
module,last_inspected,issue,severity_max,categories_found,notes
aspect,2026-05-29,2682,MEDIUM,4;5,"Audited 2026-05-29 (agent-a3b7c82e34312ffcb worktree, branch deep-sweep-metadata-aspect-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live for aspect/northness/eastness across planar and geodesic methods. Cat 1 attrs, Cat 2 coords, Cat 3 dims, and .name all preserved correctly on every backend: the 3 public functions re-emit coords=agg.coords, dims=agg.dims, attrs=agg.attrs at the xr.DataArray constructor. NEW MEDIUM finding #2682 (Cat 4 + Cat 5): the planar dask backends (_run_dask_numpy, _run_dask_cupy) called map_overlap with a default-dtype meta (np.array(()) / cupy.array(())), so the lazy DataArray advertised float64 while the chunk functions _cpu / _run_cupy cast to and return float32. numpy and cupy backends already reported float32, and the geodesic dask paths already passed dtype=np.float32, so only the two planar dask paths were inconsistent: a backend-inconsistent metadata bug where agg.dtype differs by backend and silently flips float64->float32 on .compute(). Fix in PR #2741: pass dtype=np.float32 / dtype=cupy.float32 to the planar dask meta. northness/eastness derive from aspect so they inherit the corrected dtype. 5 new tests (test_dask_numpy_advertised_dtype_matches_computed parametrized over 4 boundary modes, plus test_dask_cupy_advertised_dtype_matches_computed) assert lazy dtype == computed dtype == float32. Full aspect suite 69 passed. slope.py and curvature.py share the same default-dtype meta pattern on their planar dask paths (out of scope for this aspect-only sweep; likely same inconsistency). No CRITICAL/HIGH/LOW findings."
classify,2026-06-25,3508,MEDIUM,4;5,"Audited 2026-06-25 (agent-a5f16f6137723fc77 worktree, branch deep-sweep-metadata-classify-2026-06-25). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. All 10 public classifiers (binary/reclassify/quantile/natural_breaks/equal_interval/std_mean/head_tail_breaks/percentiles/maximum_breaks/box_plot) re-emit name=, dims=agg.dims, coords=agg.coords, attrs=agg.attrs at the xr.DataArray constructor, so Cat 1 attrs (res/crs/transform/nodatavals), Cat 2 coords (values+dtype), Cat 3 dims, and .name all preserved and identical across the 4 backends. NEW MEDIUM finding #3508 (Cat 4 + Cat 5): binary() output dtype differed by backend -- _cpu_binary allocated dtype=data.dtype so numpy/dask+numpy returned the input dtype while _run_cupy_binary used dtype='f4'. The docstring documents float32 and every other classifier emits float32 via _bin/_cpu_bin; binary was the only outlier, and for integer input the numpy path returned an integer dtype that can't hold the NaN sentinel. The _cpu_binary float32 fix + verify_dtype=True backend tests + a float64/float32/int32 dtype test landed on main via the duplicate accuracy-sweep PR #3514; PR #3513 was then rebased onto that and is now scoped to the remaining piece: _run_dask_cupy_binary passed an untyped meta=cupy.array(()) (float64) so the lazy dask+cupy array advertised float64 while computing float32 -- the same advertised-vs-computed mismatch class as aspect #2682 / focal #3217. #3513 types the meta as cupy.array((), dtype='f4') and asserts the lazy dtype in test_binary_dask_cupy. Full classify suite passes, GPU paths run live. The sibling classifiers' dask+cupy helpers (_run_dask_cupy_bin and friends) share the same untyped meta and likely the same latent lazy-dtype mismatch (out of scope, follow-up). Cat 4 nodatavals-vs-NaN is the library-wide attrs=agg.attrs convention, not classify-specific (documented, not fixed). No CRITICAL/HIGH/LOW findings."
contour,2026-05-29,2700,HIGH,1;5,"Audited 2026-05-29 (agent-ab7fff484a8f57de2 worktree, branch deep-sweep-metadata-contour-2026-05-29). CUDA available; cupy and dask+cupy paths exercised live. contours() returns a list of (level, ndarray) tuples or a GeoDataFrame, not a DataArray, so Cat 2/3 DataArray checks reinterpreted as coordinate-transform + CRS propagation. Coordinate transform (np.interp over input dims, descending y respected) is correct and identical across all 4 backends (tracing is host-side via _contours_numpy). Cat 4 N/A: library convention is NaN-as-nodata; slope/aspect/curvature/focal do not read attrs['nodatavals'] either, so contour not reading it is consistent, not a bug. NEW HIGH finding #2700 (Cat 1/Cat 5): contours(return_type='geopandas') crashed with 'Assigning CRS to a GeoDataFrame without a geometry column is not supported' whenever the input had attrs['crs'] but the result was empty (flat raster, levels outside data range) because _to_geopandas built gpd.GeoDataFrame([], crs=crs) with no geometry column; separately the all-NaN early-return passed crs=None and silently dropped the CRS. Fix (PR #2708): _to_geopandas builds an empty frame with an explicit geometry column so the CRS attaches; all-NaN early-return forwards agg.attrs['crs']. Both empty paths now return a well-formed empty GeoDataFrame carrying the CRS. 4 new tests in TestGeoDataFrame cover populated-CRS, empty-with-CRS, all-NaN-with-CRS, and empty-without-CRS. Full contour suite 28 passed. numpy-return path emits no DataArray attrs by design (list of tuples)."
corridor,2026-06-22,3446,HIGH,1;5,"Audited 2026-06-22 (agent-a8b2674b815bdfa3f worktree, branch deep-sweep-metadata-corridor-2026-06-22). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live end-to-end for least_cost_corridor across single/threshold/relative/unreachable/pairwise paths. Cat 2 coords (x/y values + float64 dtype) and Cat 3 dims (y,x) preserved on every backend: they flow through cost_distance (coords=raster.coords, dims=raster.dims) and survive xarray's binary intersection. NEW HIGH finding #3446 (Cat 1 + Cat 5): the corridor is cd_a + cd_b where each cost-distance surface carries its SOURCE raster's attrs (cost_distance copies attrs from the source, not friction). xarray's default keep_attrs on binary + keeps only attrs present-and-equal in both operands, so when the source masks are plain marker rasters with no geo-attrs (the common case) the corridor came back with attrs=={} even though the friction surface that defines the grid had res/crs/transform/nodatavals; a downstream slope/clip on the corridor silently lost cellsize/CRS. Secondary Cat 5: .name was None whenever the two sources had different names (cost_distance renames each surface to its source .name; summing differently-named arrays drops the name). Fix (PR on this branch): non-precomputed path re-emits friction.attrs + friction.name on every output via new _apply_geo_metadata helper (single, threshold, all-NaN-unreachable, and pairwise-Dataset paths); precomputed path left on the existing source-derived behaviour since there is no friction to draw from. Only .attrs/.name set -- data values, coords, dims, dtype untouched, dask stays lazy (no compute). 10 new tests (test_corridor_inherits_friction_geo_attrs x4 backends, test_corridor_threshold_keeps_geo_attrs x4 backends, test_corridor_unreachable_keeps_geo_attrs, test_pairwise_inherits_friction_geo_attrs, test_precomputed_keeps_source_attrs_not_friction). Full corridor suite 43 passed. Cat 4 N/A: NaN-as-nodata is the library convention; corridor never reads attrs['nodatavals'] for masking. No CRITICAL/MEDIUM/LOW findings."
cost_distance,2026-06-15,3344,MEDIUM,5,"Audited 2026-06-15 (agent-ad0b84e7f7b212360 worktree, branch deep-sweep-metadata-cost_distance-2026-06-15). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live end-to-end with a rich attrs set (res/crs/transform/nodatavals/_FillValue/units). Cat 1 attrs, Cat 2 coords (values + float64 dtype), and Cat 3 dims (y,x) all preserved and identical across the 4 backends -- public cost_distance() wraps with xr.DataArray(coords=raster.coords, dims=raster.dims, attrs=raster.attrs). NEW MEDIUM finding #3344 (Cat 5): the dask+numpy and dask+cupy backends leaked the internal dask graph name (_trim-<hash> from map_overlap, asarray-<hash> from the dask+cupy convert-back path) into result.name while numpy/cupy returned None; .name was a nondeterministic per-run token that breaks .to_dataset() variable keys and any name-keyed pipeline. Same .name-leak class as proximity #2723 and zonal #2611. Fix (PR #3349 on this branch): return result.rename(raster.name) -- a constructor name= kwarg does not override a named dask array, and name=None is treated as infer-from-data, so .rename() is required. supports_dataset path unaffected (keys by var_name, verified live). New parametrized regression test test_result_name_matches_input over 4 backends x {None, named}; full cost_distance suite 63 passed (post-merge with origin/main). LOW (documented, not fixed): output float32 uses NaN as the unreachable sentinel but input nodatavals/_FillValue (e.g. -9999) are carried through verbatim, so a downstream reader masks a value that never appears -- this is the library-wide attrs=raster.attrs convention shared by proximity/slope/aspect/focal, not a cost_distance-specific bug, so fixing it in isolation would diverge this module from every peer. No CRITICAL/HIGH findings."
Expand Down
3 changes: 2 additions & 1 deletion xrspatial/classify.py
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,8 @@ def _run_cupy_binary(data, values):


def _run_dask_cupy_binary(data, values_cupy):
out = data.map_blocks(lambda da: _run_cupy_binary(da, values_cupy), meta=cupy.array(()),
out = data.map_blocks(lambda da: _run_cupy_binary(da, values_cupy),
meta=cupy.array((), dtype='f4'),
**_dask_task_name_kwargs('xrspatial.binary'))
return out

Expand Down
3 changes: 3 additions & 0 deletions xrspatial/tests/test_classify.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,9 @@ def test_binary_dask_cupy(result_binary):
values, expected_result = result_binary
dask_cupy_agg = input_data(backend='dask+cupy')
dask_cupy_result = binary(dask_cupy_agg, values)
# the lazy dask array must advertise the same dtype it computes, otherwise
# a downstream consumer reads float64 metadata for a float32 result
assert dask_cupy_result.data.dtype == np.float32
general_output_checks(dask_cupy_agg, dask_cupy_result, expected_result, verify_dtype=True)


Expand Down
Loading