Skip to content
Merged
2 changes: 1 addition & 1 deletion .claude/sweep-metadata-state.csv
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ contour,2026-05-29,2700,HIGH,1;5,"Audited 2026-05-29 (agent-ab7fff484a8f57de2 wo
corridor,2026-06-22,3446,HIGH,1;5,"Audited 2026-06-22 (agent-a8b2674b815bdfa3f worktree, branch deep-sweep-metadata-corridor-2026-06-22). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live end-to-end for least_cost_corridor across single/threshold/relative/unreachable/pairwise paths. Cat 2 coords (x/y values + float64 dtype) and Cat 3 dims (y,x) preserved on every backend: they flow through cost_distance (coords=raster.coords, dims=raster.dims) and survive xarray's binary intersection. NEW HIGH finding #3446 (Cat 1 + Cat 5): the corridor is cd_a + cd_b where each cost-distance surface carries its SOURCE raster's attrs (cost_distance copies attrs from the source, not friction). xarray's default keep_attrs on binary + keeps only attrs present-and-equal in both operands, so when the source masks are plain marker rasters with no geo-attrs (the common case) the corridor came back with attrs=={} even though the friction surface that defines the grid had res/crs/transform/nodatavals; a downstream slope/clip on the corridor silently lost cellsize/CRS. Secondary Cat 5: .name was None whenever the two sources had different names (cost_distance renames each surface to its source .name; summing differently-named arrays drops the name). Fix (PR on this branch): non-precomputed path re-emits friction.attrs + friction.name on every output via new _apply_geo_metadata helper (single, threshold, all-NaN-unreachable, and pairwise-Dataset paths); precomputed path left on the existing source-derived behaviour since there is no friction to draw from. Only .attrs/.name set -- data values, coords, dims, dtype untouched, dask stays lazy (no compute). 10 new tests (test_corridor_inherits_friction_geo_attrs x4 backends, test_corridor_threshold_keeps_geo_attrs x4 backends, test_corridor_unreachable_keeps_geo_attrs, test_pairwise_inherits_friction_geo_attrs, test_precomputed_keeps_source_attrs_not_friction). Full corridor suite 43 passed. Cat 4 N/A: NaN-as-nodata is the library convention; corridor never reads attrs['nodatavals'] for masking. No CRITICAL/MEDIUM/LOW findings."
cost_distance,2026-06-15,3344,MEDIUM,5,"Audited 2026-06-15 (agent-ad0b84e7f7b212360 worktree, branch deep-sweep-metadata-cost_distance-2026-06-15). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live end-to-end with a rich attrs set (res/crs/transform/nodatavals/_FillValue/units). Cat 1 attrs, Cat 2 coords (values + float64 dtype), and Cat 3 dims (y,x) all preserved and identical across the 4 backends -- public cost_distance() wraps with xr.DataArray(coords=raster.coords, dims=raster.dims, attrs=raster.attrs). NEW MEDIUM finding #3344 (Cat 5): the dask+numpy and dask+cupy backends leaked the internal dask graph name (_trim-<hash> from map_overlap, asarray-<hash> from the dask+cupy convert-back path) into result.name while numpy/cupy returned None; .name was a nondeterministic per-run token that breaks .to_dataset() variable keys and any name-keyed pipeline. Same .name-leak class as proximity #2723 and zonal #2611. Fix (PR #3349 on this branch): return result.rename(raster.name) -- a constructor name= kwarg does not override a named dask array, and name=None is treated as infer-from-data, so .rename() is required. supports_dataset path unaffected (keys by var_name, verified live). New parametrized regression test test_result_name_matches_input over 4 backends x {None, named}; full cost_distance suite 63 passed (post-merge with origin/main). LOW (documented, not fixed): output float32 uses NaN as the unreachable sentinel but input nodatavals/_FillValue (e.g. -9999) are carried through verbatim, so a downstream reader masks a value that never appears -- this is the library-wide attrs=raster.attrs convention shared by proximity/slope/aspect/focal, not a cost_distance-specific bug, so fixing it in isolation would diverge this module from every peer. No CRITICAL/HIGH findings."
focal,2026-06-10,3217,MEDIUM,4;5,"Re-audited 2026-06-10 (agent-ad0d55a894c6abc60 worktree, branch deep-sweep-metadata-focal-2026-06-10). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live for mean, apply, focal_stats, hotspots. Cats 1-3 clean: attrs (res/crs/nodatavals/_FillValue/unit), coords (values, dtype, coord attrs), dims, .name, 3D per-band path, and hotspots unit=% all preserved and identical across the 4 backends. NEW MEDIUM finding #3217 (Cat 4 + Cat 5): (a) mean() hardcoded float32 on the GPU paths (_mean_cupy cupy.asarray(dtype=float32), _mean_dask_cupy astype(float32)) while numpy/dask+numpy returned float64 (mean() casts astype(float) before dispatch), so float64 input silently lost precision on cupy/dask+cupy; dask+cupy also advertised float64 (untyped meta) but computed float32. (b) apply()/focal_stats() dask paths passed untyped meta (np.array(()) / cupy.array(())) to map_overlap, so for float32/int input the lazy DataArray advertised float64 but computed the promoted float32 (#2805 typed the chunk fns but not the meta). Same class as aspect #2682 and proximity #2723. Fix: the mean() GPU dtype half landed on main first via duplicate issue #3214/PR #3221 (_promote_float contract: float dtypes preserved, ints->float32, GPU bit-exact vs CPU in float64); PR #3226 (branch deep-sweep-metadata-focal-2026-06-10-01) types every map_overlap meta with data.dtype and aligns tests to the _promote_float contract; 25 new parametrized regression tests (4 backends x 3 dtypes mean; dask backends x 3 dtypes apply/focal_stats; exact CPU/GPU parity). Full focal suite 258 passed. No other CRITICAL/HIGH/MEDIUM/LOW findings."
geotiff,2026-06-09,3116,HIGH,2;3,"Re-audited 2026-06-09 (agent-ae89ff94a64e3ee8f worktree, branch deep-sweep-metadata-geotiff-2026-06-09). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. Focus: surfaces changed since the 2026-05-18 audit (unpack rename + GPU/dask+GPU support #3075, pack=True #3065/#3079, masked int->float promotion #2994, bbox= reads, rioxarray param alignment #2963, no-georef VRT coord synthesis #2824, GeoTransform omission #2971). Live probes: unpack attrs (scale_factor/add_offset/mask_and_scale_dtype/nodata/masked_nodata), masked=True promotion, default masked=False, bbox window+transform shift, multi-band band=N, dims/name/coords (incl. coord dtype) all identical across the 4 backends; nodata_pixels_present absent on dask paths is the documented lazy contract, not a bug. pack->unpack round trips verified on numpy/dask/gpu-write; pack of a cupy-backed read raises via the known cupy+xarray xp.astype incompat (see memory cupy_where_astype_incompat; dependency-pin fix, raises loudly, not a metadata bug). VRT reads (full/masked/window/bbox) and no-georef TIFF reads agree across the 4 backends. NEW HIGH finding #3116 (Cat 2+3): to_geotiff(non_georef_da, out.vrt, tile_size=N) wrote a corrupt index for arrays spanning >1 tile -- write_vrt derives placement from each source GeoTransform and non-georef tiles all carry the identity transform, so rasterX/YSize collapsed to one tile and every DstRect landed at the origin; reads silently returned a single tile (24x32 in -> 16x16 out). Gap left by #2966/#2971 (tests only covered one non-georef source). Fix: _write_vrt_tiled threads per-tile pixel offsets through _build_vrt -> write_vrt via internal dst_offsets kwarg; write_vrt refuses >1 all-non-georef sources without explicit placement and rejects dst_offsets alongside georeferenced sources. 18 new tests in tests/vrt/test_non_georef_placement_3116.py incl. 4-backend round trip, dask-backed and plain-ndarray writes, XML DstRect assertions, georef placement regression, and the write_vrt error contract. Full vrt suite 520 passed; write+round-trip suites 1292 passed."
geotiff,2026-07-01,3595,MEDIUM,1,"Re-audited 2026-07-01 (agent-adb0e639731d1209c worktree, branch deep-sweep-metadata-geotiff-2026-07-01). CUDA available; all 4 backends (numpy/cupy/dask+numpy/dask+cupy) run live. Focus: surfaces changed since the 2026-06-09 audit -- symbology sidecars (#3538/#3546), categorical PAM sidecar backends (#3519), xarray engine (#3375/#3377/#3380), pack/nodata attr changes (#3277/#3325/#3128). Live probes all clean: 4-backend read parity (attrs/coords/dims/name/dtype incl. coord dtype), engine open_dataset vs open_geotiff parity (attrs/coords/values, chunks={}, masked=True, var name), color_ramp PAM stats + QML byte-identical across numpy/dask/cupy/dask+cupy write inputs (global dask stats via one fused dask.compute, not per-chunk), nodata excluded from stats, categorical sidecar attrs attach on all 4 read paths + engine, stats-PAM never fakes category attrs (thematic gate). #3128 64-bit sentinel eager fix verified merged. NEW MEDIUM finding #3595 (Cat 1): to_geotiff left the previous file's PAM .aux.xml behind when the new write emitted no sidecar, so open_geotiff attached the overwritten file's category_names/category_colors to the new pixels and GDAL/QGIS stretched with stale STATISTICS_*; GDAL avoids this via GDALDriver::QuietDelete. Fix on this branch: _write_sidecars removes a pre-existing <path>.aux.xml on every successful string-path write (all 4 write paths: eager, dask streaming, GPU dispatch, VRT) before re-creating it; .qml deliberately kept (QGIS user styling persists across data updates; only a new color_ramp write replaces it); docstring documents the refresh. 10 new tests in tests/write/test_stale_sidecar_overwrite_3595.py (cat->plain, ramp->plain keeps qml, ramp->cat, cat->ramp, multiband-symbology no-op still removes, foreign sidecar on fresh path, bare ndarray, dask, VRT, GPU). Note: VRT writer refuses same-path overwrites (tiles-dir guard) so its stale case is foreign-sidecar only. Write suite 1213 passed, round-trip+attrs 63, rasterize-categorical+release-gates 182. LOW (documented, not fixed): read_pam_sidecar parses the first PAMRasterBand element and _attach_category_attrs applies it regardless of the band=N requested, so a foreign multiband sidecar with a band-1 thematic RAT labels a band=2 read; xrspatial's own writer only emits band-1 single-band RATs so in-library round-trips are unaffected. No CRITICAL/HIGH findings."
interpolate,2026-06-12,3288,MEDIUM,5,kriging K_inv-None fallback was numpy-backed on all backends and misnamed the variance raster; fixed via #3288. All 4 backends verified end-to-end on GPU host. LOW (documented only): template nodatavals/_FillValue copied verbatim while fill_value is the actual output sentinel; tests codify attrs==template.attrs
mcda,2026-06-10,3147,HIGH,1,"constrain() dropped all attrs (res/crs/nodatavals) whenever exclude non-empty (xr.where takes attrs from scalar fill); fixed via attrs restore, tests for numpy/dask/dask+cupy. All other mcda funcs keep attrs/coords/dims on all 4 backends. Out-of-scope crashes noted for backend-parity: owa broken on cupy (numpy order-weights x cupy) and on dask (da.sort does not exist); sensitivity monte_carlo crashes on cupy/dask+cupy (.values on cupy); xr.where compute on cupy/dask+cupy hits known cupy13.6/xarray2025.12 incompat."
multispectral,2026-06-20,3429,MEDIUM,2;3,"true_color() hardcoded y/x dims + dropped extra coords; fixed PR #3434 (all 4 backends verified, CUDA available)"
Expand Down
36 changes: 35 additions & 1 deletion xrspatial/geotiff/_writers/eager.py
Original file line number Diff line number Diff line change
Expand Up @@ -152,6 +152,14 @@ def to_geotiff(data: xr.DataArray | np.ndarray,
``docs/source/user_guide/attrs_contract.rst`` for the key
definitions.

Every successful write to a string path also refreshes the PAM
``<path>.aux.xml`` sidecar: a sidecar already at that path (from a
previous write, or a foreign tool) is removed and re-created only
when this write carries its own categories or statistics (#3595).
This matches GDAL's behaviour when creating a dataset over an
existing path. A ``.qml`` style file is never removed; see
``color_ramp``.

Parameters
----------
data : xr.DataArray or np.ndarray
Expand Down Expand Up @@ -442,7 +450,12 @@ def to_geotiff(data: xr.DataArray | np.ndarray,
(``gpu=True``) and VRT (``.vrt``) write paths execute a dask source
a second time for the statistics (see ``color_ramp_range`` to skip
that). Ignored when ``pack=True``, whose on-disk packed values would
not match a ramp built from the logical values.
not match a ramp built from the logical values. Every string-path
write refreshes the PAM ``.aux.xml``: a sidecar left by a previous
write at the same path is removed and re-created only when this
write carries its own categories or statistics (#3595). A
pre-existing ``.qml`` is kept unless ``color_ramp`` replaces it --
QGIS treats it as user styling that persists across data updates.
color_ramp_range : tuple of (float, float) or None, default None
[advanced] Explicit ``(min, max)`` for the ``color_ramp`` stretch.
Skips the statistics reduction -- useful for a dask source on the
Expand Down Expand Up @@ -534,6 +547,27 @@ def to_geotiff(data: xr.DataArray | np.ndarray,
else _resolve_nodata_attr(data.attrs))

def _write_sidecars():
if isinstance(path, str):
# A pre-existing PAM sidecar describes whatever file this write
# just replaced, and ``open_geotiff`` merges it back onto attrs,
# so leaving it behind hands the old file's categories /
# statistics to the new pixels (#3595). GDAL's GTiff driver
# removes the PAM sidecar when creating a dataset over an
# existing path (``GDALDriver::QuietDelete``); match that. The
# ``write_*_sidecar`` calls below re-create it when this write
# carries its own categories or statistics. The ``.qml`` style
# sidecar is deliberately left alone: QGIS treats it as user
# styling that persists across data updates, so only a new
# ``color_ramp=`` write replaces it.
from .._pam import sidecar_path
try:
os.remove(sidecar_path(path))
except OSError:
# Missing sidecar is the normal case; a locked one (e.g.
# PermissionError on Windows) is swallowed too, matching
# QuietDelete: the pixel write already succeeded, so a
# leftover sidecar beats failing the whole write.
pass
if _cat_names:
from .._pam import write_pam_sidecar
write_pam_sidecar(path, _cat_names, _cat_colors)
Expand Down
Loading
Loading