Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .claude/sweep-metadata-state.csv
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
module,last_inspected,issue,severity_max,categories_found,notes
aspect,2026-05-29,2682,MEDIUM,4;5,"Audited 2026-05-29 (agent-a3b7c82e34312ffcb worktree, branch deep-sweep-metadata-aspect-2026-05-29). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live for aspect/northness/eastness across planar and geodesic methods. Cat 1 attrs, Cat 2 coords, Cat 3 dims, and .name all preserved correctly on every backend: the 3 public functions re-emit coords=agg.coords, dims=agg.dims, attrs=agg.attrs at the xr.DataArray constructor. NEW MEDIUM finding #2682 (Cat 4 + Cat 5): the planar dask backends (_run_dask_numpy, _run_dask_cupy) called map_overlap with a default-dtype meta (np.array(()) / cupy.array(())), so the lazy DataArray advertised float64 while the chunk functions _cpu / _run_cupy cast to and return float32. numpy and cupy backends already reported float32, and the geodesic dask paths already passed dtype=np.float32, so only the two planar dask paths were inconsistent: a backend-inconsistent metadata bug where agg.dtype differs by backend and silently flips float64->float32 on .compute(). Fix in PR #2741: pass dtype=np.float32 / dtype=cupy.float32 to the planar dask meta. northness/eastness derive from aspect so they inherit the corrected dtype. 5 new tests (test_dask_numpy_advertised_dtype_matches_computed parametrized over 4 boundary modes, plus test_dask_cupy_advertised_dtype_matches_computed) assert lazy dtype == computed dtype == float32. Full aspect suite 69 passed. slope.py and curvature.py share the same default-dtype meta pattern on their planar dask paths (out of scope for this aspect-only sweep; likely same inconsistency). No CRITICAL/HIGH/LOW findings."
contour,2026-05-29,2700,HIGH,1;5,"Audited 2026-05-29 (agent-ab7fff484a8f57de2 worktree, branch deep-sweep-metadata-contour-2026-05-29). CUDA available; cupy and dask+cupy paths exercised live. contours() returns a list of (level, ndarray) tuples or a GeoDataFrame, not a DataArray, so Cat 2/3 DataArray checks reinterpreted as coordinate-transform + CRS propagation. Coordinate transform (np.interp over input dims, descending y respected) is correct and identical across all 4 backends (tracing is host-side via _contours_numpy). Cat 4 N/A: library convention is NaN-as-nodata; slope/aspect/curvature/focal do not read attrs['nodatavals'] either, so contour not reading it is consistent, not a bug. NEW HIGH finding #2700 (Cat 1/Cat 5): contours(return_type='geopandas') crashed with 'Assigning CRS to a GeoDataFrame without a geometry column is not supported' whenever the input had attrs['crs'] but the result was empty (flat raster, levels outside data range) because _to_geopandas built gpd.GeoDataFrame([], crs=crs) with no geometry column; separately the all-NaN early-return passed crs=None and silently dropped the CRS. Fix (PR #2708): _to_geopandas builds an empty frame with an explicit geometry column so the CRS attaches; all-NaN early-return forwards agg.attrs['crs']. Both empty paths now return a well-formed empty GeoDataFrame carrying the CRS. 4 new tests in TestGeoDataFrame cover populated-CRS, empty-with-CRS, all-NaN-with-CRS, and empty-without-CRS. Full contour suite 28 passed. numpy-return path emits no DataArray attrs by design (list of tuples)."
corridor,2026-06-22,3446,HIGH,1;5,"Audited 2026-06-22 (agent-a8b2674b815bdfa3f worktree, branch deep-sweep-metadata-corridor-2026-06-22). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live end-to-end for least_cost_corridor across single/threshold/relative/unreachable/pairwise paths. Cat 2 coords (x/y values + float64 dtype) and Cat 3 dims (y,x) preserved on every backend: they flow through cost_distance (coords=raster.coords, dims=raster.dims) and survive xarray's binary intersection. NEW HIGH finding #3446 (Cat 1 + Cat 5): the corridor is cd_a + cd_b where each cost-distance surface carries its SOURCE raster's attrs (cost_distance copies attrs from the source, not friction). xarray's default keep_attrs on binary + keeps only attrs present-and-equal in both operands, so when the source masks are plain marker rasters with no geo-attrs (the common case) the corridor came back with attrs=={} even though the friction surface that defines the grid had res/crs/transform/nodatavals; a downstream slope/clip on the corridor silently lost cellsize/CRS. Secondary Cat 5: .name was None whenever the two sources had different names (cost_distance renames each surface to its source .name; summing differently-named arrays drops the name). Fix (PR on this branch): non-precomputed path re-emits friction.attrs + friction.name on every output via new _apply_geo_metadata helper (single, threshold, all-NaN-unreachable, and pairwise-Dataset paths); precomputed path left on the existing source-derived behaviour since there is no friction to draw from. Only .attrs/.name set -- data values, coords, dims, dtype untouched, dask stays lazy (no compute). 10 new tests (test_corridor_inherits_friction_geo_attrs x4 backends, test_corridor_threshold_keeps_geo_attrs x4 backends, test_corridor_unreachable_keeps_geo_attrs, test_pairwise_inherits_friction_geo_attrs, test_precomputed_keeps_source_attrs_not_friction). Full corridor suite 43 passed. Cat 4 N/A: NaN-as-nodata is the library convention; corridor never reads attrs['nodatavals'] for masking. No CRITICAL/MEDIUM/LOW findings."
cost_distance,2026-06-15,3344,MEDIUM,5,"Audited 2026-06-15 (agent-ad0b84e7f7b212360 worktree, branch deep-sweep-metadata-cost_distance-2026-06-15). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live end-to-end with a rich attrs set (res/crs/transform/nodatavals/_FillValue/units). Cat 1 attrs, Cat 2 coords (values + float64 dtype), and Cat 3 dims (y,x) all preserved and identical across the 4 backends -- public cost_distance() wraps with xr.DataArray(coords=raster.coords, dims=raster.dims, attrs=raster.attrs). NEW MEDIUM finding #3344 (Cat 5): the dask+numpy and dask+cupy backends leaked the internal dask graph name (_trim-<hash> from map_overlap, asarray-<hash> from the dask+cupy convert-back path) into result.name while numpy/cupy returned None; .name was a nondeterministic per-run token that breaks .to_dataset() variable keys and any name-keyed pipeline. Same .name-leak class as proximity #2723 and zonal #2611. Fix (PR #3349 on this branch): return result.rename(raster.name) -- a constructor name= kwarg does not override a named dask array, and name=None is treated as infer-from-data, so .rename() is required. supports_dataset path unaffected (keys by var_name, verified live). New parametrized regression test test_result_name_matches_input over 4 backends x {None, named}; full cost_distance suite 63 passed (post-merge with origin/main). LOW (documented, not fixed): output float32 uses NaN as the unreachable sentinel but input nodatavals/_FillValue (e.g. -9999) are carried through verbatim, so a downstream reader masks a value that never appears -- this is the library-wide attrs=raster.attrs convention shared by proximity/slope/aspect/focal, not a cost_distance-specific bug, so fixing it in isolation would diverge this module from every peer. No CRITICAL/HIGH findings."
focal,2026-06-10,3217,MEDIUM,4;5,"Re-audited 2026-06-10 (agent-ad0d55a894c6abc60 worktree, branch deep-sweep-metadata-focal-2026-06-10). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live for mean, apply, focal_stats, hotspots. Cats 1-3 clean: attrs (res/crs/nodatavals/_FillValue/unit), coords (values, dtype, coord attrs), dims, .name, 3D per-band path, and hotspots unit=% all preserved and identical across the 4 backends. NEW MEDIUM finding #3217 (Cat 4 + Cat 5): (a) mean() hardcoded float32 on the GPU paths (_mean_cupy cupy.asarray(dtype=float32), _mean_dask_cupy astype(float32)) while numpy/dask+numpy returned float64 (mean() casts astype(float) before dispatch), so float64 input silently lost precision on cupy/dask+cupy; dask+cupy also advertised float64 (untyped meta) but computed float32. (b) apply()/focal_stats() dask paths passed untyped meta (np.array(()) / cupy.array(())) to map_overlap, so for float32/int input the lazy DataArray advertised float64 but computed the promoted float32 (#2805 typed the chunk fns but not the meta). Same class as aspect #2682 and proximity #2723. Fix: the mean() GPU dtype half landed on main first via duplicate issue #3214/PR #3221 (_promote_float contract: float dtypes preserved, ints->float32, GPU bit-exact vs CPU in float64); PR #3226 (branch deep-sweep-metadata-focal-2026-06-10-01) types every map_overlap meta with data.dtype and aligns tests to the _promote_float contract; 25 new parametrized regression tests (4 backends x 3 dtypes mean; dask backends x 3 dtypes apply/focal_stats; exact CPU/GPU parity). Full focal suite 258 passed. No other CRITICAL/HIGH/MEDIUM/LOW findings."
geotiff,2026-06-09,3116,HIGH,2;3,"Re-audited 2026-06-09 (agent-ae89ff94a64e3ee8f worktree, branch deep-sweep-metadata-geotiff-2026-06-09). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. Focus: surfaces changed since the 2026-05-18 audit (unpack rename + GPU/dask+GPU support #3075, pack=True #3065/#3079, masked int->float promotion #2994, bbox= reads, rioxarray param alignment #2963, no-georef VRT coord synthesis #2824, GeoTransform omission #2971). Live probes: unpack attrs (scale_factor/add_offset/mask_and_scale_dtype/nodata/masked_nodata), masked=True promotion, default masked=False, bbox window+transform shift, multi-band band=N, dims/name/coords (incl. coord dtype) all identical across the 4 backends; nodata_pixels_present absent on dask paths is the documented lazy contract, not a bug. pack->unpack round trips verified on numpy/dask/gpu-write; pack of a cupy-backed read raises via the known cupy+xarray xp.astype incompat (see memory cupy_where_astype_incompat; dependency-pin fix, raises loudly, not a metadata bug). VRT reads (full/masked/window/bbox) and no-georef TIFF reads agree across the 4 backends. NEW HIGH finding #3116 (Cat 2+3): to_geotiff(non_georef_da, out.vrt, tile_size=N) wrote a corrupt index for arrays spanning >1 tile -- write_vrt derives placement from each source GeoTransform and non-georef tiles all carry the identity transform, so rasterX/YSize collapsed to one tile and every DstRect landed at the origin; reads silently returned a single tile (24x32 in -> 16x16 out). Gap left by #2966/#2971 (tests only covered one non-georef source). Fix: _write_vrt_tiled threads per-tile pixel offsets through _build_vrt -> write_vrt via internal dst_offsets kwarg; write_vrt refuses >1 all-non-georef sources without explicit placement and rejects dst_offsets alongside georeferenced sources. 18 new tests in tests/vrt/test_non_georef_placement_3116.py incl. 4-backend round trip, dask-backed and plain-ndarray writes, XML DstRect assertions, georef placement regression, and the write_vrt error contract. Full vrt suite 520 passed; write+round-trip suites 1292 passed."
Expand Down
36 changes: 32 additions & 4 deletions xrspatial/corridor.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,19 +38,41 @@ def _scalar_to_float(da_scalar):
return float(data)


def _apply_geo_metadata(
result: xr.DataArray,
geo_source: xr.DataArray | None,
) -> xr.DataArray:
"""Re-emit the grid-defining raster's attrs/name on the corridor output.

The corridor is ``cd_a + cd_b``, and each cost-distance surface carries
the attrs of its *source* raster (see ``cost_distance``). Source rasters
are usually plain marker masks with no geo-attrs, so the xarray binary
sum drops ``res``/``crs``/``transform``/``nodatavals`` even when the
friction surface that defines the grid had them. When a grid-defining
raster is available (the non-precomputed case, where *friction* is it),
copy its attrs and name back onto the result.
"""
if geo_source is None:
return result
result = result.copy()
result.attrs = dict(geo_source.attrs)
return result.rename(geo_source.name)


def _compute_corridor(
cd_a: xr.DataArray,
cd_b: xr.DataArray,
threshold: float | None,
relative: bool,
geo_source: xr.DataArray | None = None,
) -> xr.DataArray:
"""Sum two cost-distance surfaces, normalize, and optionally threshold."""
corridor = cd_a + cd_b
corridor_min = _scalar_to_float(corridor.min())

if not np.isfinite(corridor_min):
# Sources are mutually unreachable -- return all-NaN.
return corridor * np.nan
return _apply_geo_metadata(corridor * np.nan, geo_source)

normalized = corridor - corridor_min

Expand All @@ -74,7 +96,7 @@ def _compute_corridor(
data = np.where(data <= cutoff, data, np.nan)
normalized = normalized.copy(data=data)

return normalized
return _apply_geo_metadata(normalized, geo_source)


def least_cost_corridor(
Expand Down Expand Up @@ -185,6 +207,9 @@ def least_cost_corridor(
# ------------------------------------------------------------------
if precomputed:
cd_surfaces = list(sources)
# No friction surface to draw grid metadata from -- keep the
# existing source-derived attrs/name behaviour.
geo_source = None
else:
cd_surfaces = [
cost_distance(
Expand All @@ -193,13 +218,16 @@ def least_cost_corridor(
)
for src in sources
]
# The friction surface defines the grid (res/crs/transform), so its
# attrs and name should ride through to the corridor output.
geo_source = friction

# ------------------------------------------------------------------
# Two-source case: return a single DataArray
# ------------------------------------------------------------------
if len(cd_surfaces) == 2 and not pairwise:
return _compute_corridor(
cd_surfaces[0], cd_surfaces[1], threshold, relative
cd_surfaces[0], cd_surfaces[1], threshold, relative, geo_source
)

# ------------------------------------------------------------------
Expand All @@ -209,7 +237,7 @@ def least_cost_corridor(
for i, j in itertools.combinations(range(len(cd_surfaces)), 2):
name = f"corridor_{i}_{j}"
corridors[name] = _compute_corridor(
cd_surfaces[i], cd_surfaces[j], threshold, relative
cd_surfaces[i], cd_surfaces[j], threshold, relative, geo_source
)

return xr.Dataset(corridors)
124 changes: 124 additions & 0 deletions xrspatial/tests/test_corridor.py
Original file line number Diff line number Diff line change
Expand Up @@ -374,3 +374,127 @@ def test_single_source_in_list_raises():
src = _make_raster(np.ones((3, 3)))
with pytest.raises(ValueError, match="at least 2"):
least_cost_corridor(friction, sources=[src])


# -----------------------------------------------------------------------
# Metadata propagation (issue #3446)
# -----------------------------------------------------------------------


_GEO_ATTRS = {
"res": (1.0, 1.0),
"crs": "EPSG:4326",
"transform": (1.0, 0.0, 0.0, 0.0, -1.0, 0.0),
"nodatavals": (-9999.0,),
}


def _make_geo_raster(data, attrs, name, backend="numpy"):
"""Build a raster with explicit attrs/name on top of ``_make_raster``."""
raster = _make_raster(data, backend=backend, chunks=data.shape)
raster.attrs = dict(attrs)
raster.name = name
return raster


@pytest.mark.parametrize(
"backend", ["numpy", "dask+numpy", "cupy", "dask+cupy"]
)
def test_corridor_inherits_friction_geo_attrs(backend):
"""Corridor carries friction's geo-attrs/name even when sources have none."""
n = 7
friction = _make_geo_raster(
np.ones((n, n)), _GEO_ATTRS, "friction", backend=backend
)
sa_d = np.zeros((n, n))
sa_d[3, 0] = 1.0
sb_d = np.zeros((n, n))
sb_d[3, 6] = 1.0
# Source masks deliberately carry no attrs and no name.
sa = _make_geo_raster(sa_d, {}, None, backend=backend)
sb = _make_geo_raster(sb_d, {}, None, backend=backend)

result = least_cost_corridor(friction, sa, sb)

assert dict(result.attrs) == _GEO_ATTRS
assert result.name == "friction"
assert result.dims == ("y", "x")
assert list(result.coords) == ["y", "x"]


@pytest.mark.parametrize(
"backend", ["numpy", "dask+numpy", "cupy", "dask+cupy"]
)
def test_corridor_threshold_keeps_geo_attrs(backend):
"""Thresholded corridor still carries friction's geo-attrs."""
n = 7
friction = _make_geo_raster(
np.ones((n, n)), _GEO_ATTRS, "friction", backend=backend
)
sa_d = np.zeros((n, n))
sa_d[3, 0] = 1.0
sb_d = np.zeros((n, n))
sb_d[3, 6] = 1.0
sa = _make_geo_raster(sa_d, {}, None, backend=backend)
sb = _make_geo_raster(sb_d, {}, None, backend=backend)

result = least_cost_corridor(friction, sa, sb, threshold=0.5)

assert dict(result.attrs) == _GEO_ATTRS
assert result.name == "friction"


def test_corridor_unreachable_keeps_geo_attrs():
"""All-NaN corridor (unreachable sources) still carries geo-attrs."""
n = 5
fr_d = np.ones((n, n))
fr_d[:, 2] = np.nan # impenetrable wall
friction = _make_geo_raster(fr_d, _GEO_ATTRS, "friction")
sa_d = np.zeros((n, n))
sa_d[2, 0] = 1.0
sb_d = np.zeros((n, n))
sb_d[2, 4] = 1.0
sa = _make_geo_raster(sa_d, {}, None)
sb = _make_geo_raster(sb_d, {}, None)

result = least_cost_corridor(friction, sa, sb)

assert np.all(np.isnan(_compute(result)))
assert dict(result.attrs) == _GEO_ATTRS


def test_pairwise_inherits_friction_geo_attrs():
"""Every variable in a pairwise Dataset carries friction's geo-attrs."""
n = 7
friction = _make_geo_raster(np.ones((n, n)), _GEO_ATTRS, "friction")
sources = []
for r, c in [(0, 0), (0, 6), (6, 3)]:
s = np.zeros((n, n))
s[r, c] = 1.0
sources.append(_make_geo_raster(s, {}, None))

result = least_cost_corridor(friction, sources=sources, pairwise=True)

assert isinstance(result, xr.Dataset)
for name in result.data_vars:
assert dict(result[name].attrs) == _GEO_ATTRS


def test_precomputed_keeps_source_attrs_not_friction():
"""Precomputed path leaves the source-derived attrs alone (no friction)."""
n = 7
friction = _make_geo_raster(np.ones((n, n)), _GEO_ATTRS, "friction")
sa_d = np.zeros((n, n))
sa_d[3, 0] = 1.0
sb_d = np.zeros((n, n))
sb_d[3, 6] = 1.0
src_attrs = {"res": (1.0, 1.0), "crs": "EPSG:3857"}
cd_a = _make_geo_raster(sa_d, src_attrs, "cd")
cd_b = _make_geo_raster(sb_d, src_attrs, "cd")

result = least_cost_corridor(friction, cd_a, cd_b, precomputed=True)

# Friction's attrs must NOT leak into a precomputed corridor; the
# matching source attrs survive xarray's binary-op intersection.
assert dict(result.attrs) == src_attrs
assert "EPSG:4326" not in result.attrs.get("crs", "")
Loading