diff --git a/.claude/sweep-metadata-state.csv b/.claude/sweep-metadata-state.csv index 35d2545e2..c38760851 100644 --- a/.claude/sweep-metadata-state.csv +++ b/.claude/sweep-metadata-state.csv @@ -1,5 +1,6 @@ module,last_inspected,issue,severity_max,categories_found,notes aspect,2026-05-29,2682,MEDIUM,4;5,"Audited 2026-05-29 (agent-a3b7c82e34312ffcb worktree, branch deep-sweep-metadata-aspect-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live for aspect/northness/eastness across planar and geodesic methods. Cat 1 attrs, Cat 2 coords, Cat 3 dims, and .name all preserved correctly on every backend: the 3 public functions re-emit coords=agg.coords, dims=agg.dims, attrs=agg.attrs at the xr.DataArray constructor. NEW MEDIUM finding #2682 (Cat 4 + Cat 5): the planar dask backends (_run_dask_numpy, _run_dask_cupy) called map_overlap with a default-dtype meta (np.array(()) / cupy.array(())), so the lazy DataArray advertised float64 while the chunk functions _cpu / _run_cupy cast to and return float32. numpy and cupy backends already reported float32, and the geodesic dask paths already passed dtype=np.float32, so only the two planar dask paths were inconsistent: a backend-inconsistent metadata bug where agg.dtype differs by backend and silently flips float64->float32 on .compute(). Fix in PR #2741: pass dtype=np.float32 / dtype=cupy.float32 to the planar dask meta. northness/eastness derive from aspect so they inherit the corrected dtype. 5 new tests (test_dask_numpy_advertised_dtype_matches_computed parametrized over 4 boundary modes, plus test_dask_cupy_advertised_dtype_matches_computed) assert lazy dtype == computed dtype == float32. Full aspect suite 69 passed. slope.py and curvature.py share the same default-dtype meta pattern on their planar dask paths (out of scope for this aspect-only sweep; likely same inconsistency). No CRITICAL/HIGH/LOW findings." +classify,2026-06-25,3508,MEDIUM,4;5,"Audited 2026-06-25 (agent-a5f16f6137723fc77 worktree, branch deep-sweep-metadata-classify-2026-06-25). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. All 10 public classifiers (binary/reclassify/quantile/natural_breaks/equal_interval/std_mean/head_tail_breaks/percentiles/maximum_breaks/box_plot) re-emit name=, dims=agg.dims, coords=agg.coords, attrs=agg.attrs at the xr.DataArray constructor, so Cat 1 attrs (res/crs/transform/nodatavals), Cat 2 coords (values+dtype), Cat 3 dims, and .name all preserved and identical across the 4 backends. NEW MEDIUM finding #3508 (Cat 4 + Cat 5): binary() output dtype differed by backend -- _cpu_binary allocated dtype=data.dtype so numpy/dask+numpy returned the input dtype while _run_cupy_binary used dtype='f4'. The docstring documents float32 and every other classifier emits float32 via _bin/_cpu_bin; binary was the only outlier, and for integer input the numpy path returned an integer dtype that can't hold the NaN sentinel. The _cpu_binary float32 fix + verify_dtype=True backend tests + a float64/float32/int32 dtype test landed on main via the duplicate accuracy-sweep PR #3514; PR #3513 was then rebased onto that and is now scoped to the remaining piece: _run_dask_cupy_binary passed an untyped meta=cupy.array(()) (float64) so the lazy dask+cupy array advertised float64 while computing float32 -- the same advertised-vs-computed mismatch class as aspect #2682 / focal #3217. #3513 types the meta as cupy.array((), dtype='f4') and asserts the lazy dtype in test_binary_dask_cupy. Full classify suite passes, GPU paths run live. The sibling classifiers' dask+cupy helpers (_run_dask_cupy_bin and friends) share the same untyped meta and likely the same latent lazy-dtype mismatch (out of scope, follow-up). Cat 4 nodatavals-vs-NaN is the library-wide attrs=agg.attrs convention, not classify-specific (documented, not fixed). No CRITICAL/HIGH/LOW findings." contour,2026-05-29,2700,HIGH,1;5,"Audited 2026-05-29 (agent-ab7fff484a8f57de2 worktree, branch deep-sweep-metadata-contour-2026-05-29). CUDA available; cupy and dask+cupy paths exercised live. contours() returns a list of (level, ndarray) tuples or a GeoDataFrame, not a DataArray, so Cat 2/3 DataArray checks reinterpreted as coordinate-transform + CRS propagation. Coordinate transform (np.interp over input dims, descending y respected) is correct and identical across all 4 backends (tracing is host-side via _contours_numpy). Cat 4 N/A: library convention is NaN-as-nodata; slope/aspect/curvature/focal do not read attrs['nodatavals'] either, so contour not reading it is consistent, not a bug. NEW HIGH finding #2700 (Cat 1/Cat 5): contours(return_type='geopandas') crashed with 'Assigning CRS to a GeoDataFrame without a geometry column is not supported' whenever the input had attrs['crs'] but the result was empty (flat raster, levels outside data range) because _to_geopandas built gpd.GeoDataFrame([], crs=crs) with no geometry column; separately the all-NaN early-return passed crs=None and silently dropped the CRS. Fix (PR #2708): _to_geopandas builds an empty frame with an explicit geometry column so the CRS attaches; all-NaN early-return forwards agg.attrs['crs']. Both empty paths now return a well-formed empty GeoDataFrame carrying the CRS. 4 new tests in TestGeoDataFrame cover populated-CRS, empty-with-CRS, all-NaN-with-CRS, and empty-without-CRS. Full contour suite 28 passed. numpy-return path emits no DataArray attrs by design (list of tuples)." corridor,2026-06-22,3446,HIGH,1;5,"Audited 2026-06-22 (agent-a8b2674b815bdfa3f worktree, branch deep-sweep-metadata-corridor-2026-06-22). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live end-to-end for least_cost_corridor across single/threshold/relative/unreachable/pairwise paths. Cat 2 coords (x/y values + float64 dtype) and Cat 3 dims (y,x) preserved on every backend: they flow through cost_distance (coords=raster.coords, dims=raster.dims) and survive xarray's binary intersection. NEW HIGH finding #3446 (Cat 1 + Cat 5): the corridor is cd_a + cd_b where each cost-distance surface carries its SOURCE raster's attrs (cost_distance copies attrs from the source, not friction). xarray's default keep_attrs on binary + keeps only attrs present-and-equal in both operands, so when the source masks are plain marker rasters with no geo-attrs (the common case) the corridor came back with attrs=={} even though the friction surface that defines the grid had res/crs/transform/nodatavals; a downstream slope/clip on the corridor silently lost cellsize/CRS. Secondary Cat 5: .name was None whenever the two sources had different names (cost_distance renames each surface to its source .name; summing differently-named arrays drops the name). Fix (PR on this branch): non-precomputed path re-emits friction.attrs + friction.name on every output via new _apply_geo_metadata helper (single, threshold, all-NaN-unreachable, and pairwise-Dataset paths); precomputed path left on the existing source-derived behaviour since there is no friction to draw from. Only .attrs/.name set -- data values, coords, dims, dtype untouched, dask stays lazy (no compute). 10 new tests (test_corridor_inherits_friction_geo_attrs x4 backends, test_corridor_threshold_keeps_geo_attrs x4 backends, test_corridor_unreachable_keeps_geo_attrs, test_pairwise_inherits_friction_geo_attrs, test_precomputed_keeps_source_attrs_not_friction). Full corridor suite 43 passed. Cat 4 N/A: NaN-as-nodata is the library convention; corridor never reads attrs['nodatavals'] for masking. No CRITICAL/MEDIUM/LOW findings." cost_distance,2026-06-15,3344,MEDIUM,5,"Audited 2026-06-15 (agent-ad0b84e7f7b212360 worktree, branch deep-sweep-metadata-cost_distance-2026-06-15). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live end-to-end with a rich attrs set (res/crs/transform/nodatavals/_FillValue/units). Cat 1 attrs, Cat 2 coords (values + float64 dtype), and Cat 3 dims (y,x) all preserved and identical across the 4 backends -- public cost_distance() wraps with xr.DataArray(coords=raster.coords, dims=raster.dims, attrs=raster.attrs). NEW MEDIUM finding #3344 (Cat 5): the dask+numpy and dask+cupy backends leaked the internal dask graph name (_trim- from map_overlap, asarray- from the dask+cupy convert-back path) into result.name while numpy/cupy returned None; .name was a nondeterministic per-run token that breaks .to_dataset() variable keys and any name-keyed pipeline. Same .name-leak class as proximity #2723 and zonal #2611. Fix (PR #3349 on this branch): return result.rename(raster.name) -- a constructor name= kwarg does not override a named dask array, and name=None is treated as infer-from-data, so .rename() is required. supports_dataset path unaffected (keys by var_name, verified live). New parametrized regression test test_result_name_matches_input over 4 backends x {None, named}; full cost_distance suite 63 passed (post-merge with origin/main). LOW (documented, not fixed): output float32 uses NaN as the unreachable sentinel but input nodatavals/_FillValue (e.g. -9999) are carried through verbatim, so a downstream reader masks a value that never appears -- this is the library-wide attrs=raster.attrs convention shared by proximity/slope/aspect/focal, not a cost_distance-specific bug, so fixing it in isolation would diverge this module from every peer. No CRITICAL/HIGH findings." diff --git a/xrspatial/classify.py b/xrspatial/classify.py index 7731847c7..2a3d59e48 100644 --- a/xrspatial/classify.py +++ b/xrspatial/classify.py @@ -104,7 +104,8 @@ def _run_cupy_binary(data, values): def _run_dask_cupy_binary(data, values_cupy): - out = data.map_blocks(lambda da: _run_cupy_binary(da, values_cupy), meta=cupy.array(()), + out = data.map_blocks(lambda da: _run_cupy_binary(da, values_cupy), + meta=cupy.array((), dtype='f4'), **_dask_task_name_kwargs('xrspatial.binary')) return out diff --git a/xrspatial/tests/test_classify.py b/xrspatial/tests/test_classify.py index f0bbe0c84..494f75cc9 100644 --- a/xrspatial/tests/test_classify.py +++ b/xrspatial/tests/test_classify.py @@ -69,6 +69,9 @@ def test_binary_dask_cupy(result_binary): values, expected_result = result_binary dask_cupy_agg = input_data(backend='dask+cupy') dask_cupy_result = binary(dask_cupy_agg, values) + # the lazy dask array must advertise the same dtype it computes, otherwise + # a downstream consumer reads float64 metadata for a float32 result + assert dask_cupy_result.data.dtype == np.float32 general_output_checks(dask_cupy_agg, dask_cupy_result, expected_result, verify_dtype=True)