From 9fcee08e9cad2a91d277981fd8db5d7e671d6f3e Mon Sep 17 00:00:00 2001 From: Brendan Collins Date: Wed, 1 Jul 2026 22:52:20 -0400 Subject: [PATCH] Fix to_geotiff docstring tier for extra_tags pass-through (#3593) The release-contract tier list said extra_tags pass-through was [advanced] (no kwarg gate), but SUPPORTED_FEATURES['writer.extra_tags'] is 'experimental' and _validate_write_rich_tag_optin rejects a fresh DataArray carrying attrs['extra_tags'] without allow_experimental_codecs=True. Move the bullet to [experimental], name the opt-in and the round-trip exemption, and add a drift-guard test so the docstring cannot disagree with the tier map again. Also updates the api-consistency sweep state CSV for the geotiff row. --- .claude/sweep-api-consistency-state.csv | 2 +- xrspatial/geotiff/_writers/eager.py | 21 ++++++++++++------- .../geotiff/tests/unit/test_signatures.py | 21 +++++++++++++++++++ 3 files changed, 36 insertions(+), 8 deletions(-) diff --git a/.claude/sweep-api-consistency-state.csv b/.claude/sweep-api-consistency-state.csv index 09218bb3c..143e7c0d8 100644 --- a/.claude/sweep-api-consistency-state.csv +++ b/.claude/sweep-api-consistency-state.csv @@ -1,7 +1,7 @@ module,last_inspected,issue,severity_max,categories_found,notes classify,2026-06-20,3398,MEDIUM,1;3,"Sweep 2026-06-20 (deep-sweep-api-consistency-classify-2026-06-20). 1 MEDIUM Cat 1 finding filed as #3398 and fixed on this branch. (MEDIUM Cat 1 positional-order drift) natural_breaks ordered its params (agg, num_sample, name, k) while the other two classifiers that take the same trio order them (agg, k, num_sample, name): quantile(agg, k=4, num_sample, name), maximum_breaks(agg, k=5, num_sample, name). So natural_breaks(raster, 5) silently set num_sample=5 instead of k=5. Fix reorders natural_breaks to (agg, k=5, num_sample=20000, name) and adds a _natural_breaks_legacy_order shim: when k= is a keyword AND a second positional is present (the only way pre-1.0 callers passed num_sample, since k was last and always keyword), the positional is treated as the old num_sample with a DeprecationWarning. Keeps the one example notebook call natural_breaks(raster, 20000, k=4) working. Bundled trivial Cat 3 fix in same PR: binary() was the only public classifier with no type hints -- added agg: xr.DataArray, name: Optional[str], -> xr.DataArray to match the other 9. Tests: test_natural_breaks_positional_k_matches_siblings (new positional k == keyword k) and test_natural_breaks_legacy_positional_num_sample_warns (legacy order warns + maps identically). Full test_classify.py (now 91) + test_validation.py pass. Cat 4 considered NOT a finding: quantile k=4 (quartiles) vs k=5 (quintiles) elsewhere is the documented PySAL/mapclassify convention, not drift. No Cat 2 return drift (all 10 publics return xr.DataArray/Dataset via @supports_dataset, coords/dims/attrs preserved). No Cat 5 orphan API (all 10 re-exported in __init__.py; no __all__ but consistent with module convention). Cross-cutting, notes only: first-arg agg (classify family) vs raster (reproject/rasterize/polygonize) is library-wide drift, out of per-module scope. cuda-validated: CUDA_AVAILABLE=True on this host; natural_breaks new order + legacy shim smoke-tested on numpy AND cupy entry points (both warn + remap), dataset path binds name correctly, binary verified on cupy." focal,2026-06-10,3215;3216,MEDIUM,3;4,"Sweep 2026-06-10 (deep-sweep-api-consistency-focal-2026-06-10). 2 MEDIUM findings filed, fixed on branches -01/-02 off this one. (#3215, MEDIUM Cat 4 cross-backend default parity, branch -01) apply() default func=_calc_mean is an @ngjit CPU function but the cupy/dask+cupy paths launch func as a CUDA kernel via _focal_stats_func_cupy func[griddim, blockdim], so apply(cupy_agg, kernel) raises TypeError 'CPUDispatcher' object is not subscriptable (dask+cupy builds the graph and fails at compute). Prior 2026-05-29 sweep dispositioned this LOW as 'documented in the docstring', but the docstring covers explicit funcs -- the default itself is unusable on 2 of 4 backends. Fix: func=None sentinel resolved per backend (_calc_mean CPU, _focal_mean_cuda GPU), explicit-func behavior unchanged; same PR adds the missing name= param to the apply() docstring (signature has name='focal_apply'; mean/focal_stats/hotspots document theirs). (#3216, MEDIUM Cat 3, branch -02) hotspots() docstring lists 3 backends but dask_cupy_func=_hotspots_dask_cupy is dispatched and works; kernel param documented as binary ('values of 1 indicate the kernel') while hotspots accepts weighted kernels and the Gi* formula in the same docstring uses weights w_ij (apply/focal_stats reject non-binary via _validate_binary_kernel, hotspots deliberately does not). Docs-only fix. LOW documented, not fixed: among the 4 focal publics only mean() has @supports_dataset (Dataset-support drift; feature gap, not an API bug). Cross-cutting, notes only per template: emerging_hotspots(raster=), viewshed(raster=), calc_cellsize(raster) still use raster while focal standardized on agg with a DeprecationWarning shim (#2689/PR #2699); library-wide first-arg drift, belongs to those modules' sweeps. No Cat 1 in-module (agg canonical, raster alias warns, both-args raises). No Cat 2 return drift (mean/apply/hotspots 2D same-type, focal_stats 3D (stats,y,x) as documented). No Cat 5 orphan API (apply/focal_stats/hotspots documented in focal.rst autosummary and consumed via xrspatial.focal module path; only mean re-exported top-level; emerging_hotspots top-level vs hotspots module-level asymmetry noted, additive export would be a design call, not filed). cuda-validated: CUDA_AVAILABLE=True on this host; mean/apply/focal_stats/hotspots smoke-tested on cupy with kwarg parity; the apply default crash reproduced on GPU; hotspots weighted-kernel acceptance verified empirically." -geotiff,2026-06-12,3263;3265,MEDIUM,3;5,"Re-sweep 2026-06-12 (deep-sweep-api-consistency-geotiff-2026-06-12); prior pass 2026-06-09 (#3086). Scope: surface changes since 2026-06-09 (pack/unpack fixes #3171-#3241, SUPPORTED_FEATURES reader.unpack/writer.pack/reader.coregister, coregister docs #3248) plus a fresh 5-category pass on open_geotiff/to_geotiff. 2 MEDIUM findings filed and fixed on branches -01/-02 off this one. (#3263, MEDIUM Cat 3, PR #3269, branch -01) open_geotiff unpack docstring said 'A source without scale / offset metadata is a no-op', but unpack=True folds into the masking gate (_finalize_eager_read: mask_and_scale implies masking, rioxarray parity), so a sentinel-bearing uint16 source still comes back float64 with NaN holes; verified identical on all 4 backends (not a parity bug), only a source with neither scale/offset nor a sentinel reads unchanged. Docs-only fix + test_unpack_noop_doc_3263.py pinning wording (scoped to the unpack paragraph) and behavior. (#3265, MEDIUM Cat 5, PR #3273, branch -02) exception-export drift: VRTUnsupportedError (raised 10+ times in _vrt_validation.py on public .vrt reads, documented in geotiff_safe_io.rst which steered users to the private _errors module), CloudSizeLimitError (importable but not in __all__, sibling UnsafeURLError IS exported), and PixelSafetyLimitError (raised by the [stable] max_pixels cap, only importable from _layout/_reader) were the only 3 exceptions raised on public open_geotiff paths missing from the public surface (other 17 exported). Additive fix: import + __all__ + :class: roles in safe_io doc + trigger-point docs naming the exceptions in max_pixels/max_cloud_bytes param docs and geotiff.rst; test_exception_exports_3265.py pins export, identity with private definitions, and a functional max_pixels raise. Clean elsewhere: docstring/signature parity exact on both publics (programmatic check + 218 existing contract tests); no Cat 1 (signatures unchanged since 2026-06-09; pack/unpack pair deliberate), no Cat 2 (DataArray / path returns unchanged), no Cat 4 (shared allow_* defaults match reader/writer; gpu False-vs-None auto-detect documented). SUPPORTED_FEATURES tiers (reader.unpack/writer.pack/reader.coregister experimental) agree with docstring tier markers. coregister= itself lives on accessor.py (excluded module) -- only its SUPPORTED_FEATURES registration is in geotiff, consistent. cuda-validated: CUDA_AVAILABLE=True; open_geotiff smoke-tested with identical kwargs on numpy/cupy/dask/dask+cupy (cpu/gpu pixel parity), to_geotiff gpu=True, cupy pack=True write (#3240 fix confirmed), deprecated aliases mask_and_scale/name/mask_nodata all warn. Both PRs reviewed (COMMENTED) with findings fixed in follow-up commits c14844a8/af3c8a66; branches up to date with origin/main; left for user merge per REVIEW_REQUIRED." +geotiff,2026-07-01,3593,MEDIUM,3,"Sweep 2026-07-01 (deep-sweep-api-consistency-geotiff-2026-07-01). 1 MEDIUM Cat 3 finding filed as #3593 and fixed on this branch: to_geotiff docstring's release-contract tier list put extra_tags pass-through under [advanced] (no kwarg gate) while SUPPORTED_FEATURES['writer.extra_tags']='experimental' and _validate_write_rich_tag_optin raises ValueError on a fresh DataArray with attrs['extra_tags'] unless allow_experimental_codecs=True (verified empirically). Fix moves the bullet to [experimental], names the opt-in and the round-trip exemption, and adds a drift-guard test in tests/unit/test_signatures.py. Clean elsewhere: signature/docstring param parity is exact on open_geotiff (28 params) and to_geotiff (24 params); deprecated aliases (name/mask_nodata/mask_and_scale, plot_geotiff) all emit DeprecationWarning; no default drift between to_geotiff and _write_geotiff_gpu (identical signatures minus gpu); no orphan API (examples/docs only import __all__ names); xarray backend declares open_dataset_parameters correctly. cuda-validated: CUDA_AVAILABLE=True; smoke-tested all 4 read paths (eager/dask/gpu/dask+gpu) and cupy write round-trip accept the same public kwargs. Cross-cutting, notes only: reproject/merge use chunk_size= and name= where geotiff/templates use chunks= (dask/xarray convention) and open_geotiff deprecated name= for default_name= (rioxarray parity); offender is reproject, out of per-module scope, not filed here." hydro-d8,2026-05-29,2709,HIGH,1;5,"Sweep 2026-05-29 (deep-sweep-api-consistency-hydro-d8-2026-05-29). Scope = the 13 D8-variant files only; dinf/mfd read for reference but not modified. 1 HIGH Cat 1 + 1 MEDIUM Cat 5 fixed in this branch (#2709, PR #2716). HIGH Cat 1: stream_order_d8 named its strahler/shreve selector `ordering` while sibling stream_order_dinf/stream_order_mfd use `method`; both names live in the public API and the __init__.py _StreamOrderDispatch special-cases the drift (translates ordering->method for non-d8). Fix adds `method` as an accepted alias on stream_order_d8 (case-insensitive; takes precedence; conflicting ordering+method raises ValueError), keeping `ordering` working so the out-of-scope dispatcher (passes ordering=) and existing callers are unaffected. Full rename to `method` deferred because deprecating `ordering` would warn on every stream_order(routing='d8') call via the dispatcher I cannot touch in this scope. MEDIUM Cat 5: basins_d8 (watershed_d8.py) is a backward-compat wrapper whose docstring said 'use basin instead' but emitted no warning; added DeprecationWarning(stacklevel=2). Tests added for alias parity/precedence/conflict/case-insensitivity and for the basins_d8 warning. Findings documented but NOT filed per template: (LOW Cat 1 cross-module, out of scope) dinf siblings name the first arg `flow_dir_dinf` (stream_link/flow_path/hand/watershed_dinf) while all D8 funcs use the cleaner `flow_dir`; D8 is the better convention so no D8 change -- the drift lives in the dinf files. (LOW Cat 4 defensive-validation drift) hand_d8 validates np.isfinite(threshold) but stream_link_d8/stream_order_d8 (same threshold: float = 100 param) do not; not user-facing signature surprise, document only. No Cat 2 return drift (every D8 public fn returns xr.DataArray with coords/dims/attrs preserved; Dataset in -> Dataset out via @supports_dataset). No Cat 3 missing-hints beyond fill_d8 z_limit (optional, no hint) which mirrors its sibling style. All 13 D8 funcs are re-exported in xrspatial/hydro/__init__.py (no orphan API). cuda-validated: CUDA_AVAILABLE=True on this host; method-alias parity smoke-tested on a cupy DataArray. CI: ubuntu/windows/3.12 GitHub Actions green; macOS-3.14 + ReadTheDocs slow but no failures. NOTE: the /review-pr review comment could not be posted to GitHub (auto-mode permission denial on gh pr review); review findings were applied to code instead (case-insensitive conflict check + str|None hint, commit f8467320)." interpolate,2026-06-12,3285,MEDIUM,2,"Sweep 2026-06-12 (deep-sweep-api-consistency-interpolate-2026-06-12). Scope: idw/_idw.py, kriging/_kriging.py, spline/_spline.py, shared _validation.py. 1 MEDIUM Cat 2 finding filed as #3285, fixed on branch -01 off this one: kriging(return_variance=True) singular-matrix fallback (_kriging.py:499) returns prediction, prediction.copy() so the variance DataArray keeps the prediction's name instead of f'{name}_variance' (normal path :523 names it correctly); reproduced by monkeypatching _build_kriging_matrix to None; anything keying on .name (xr.merge, Dataset build) silently collapses the pair. One-line fix + regression test on the singular path. Clean elsewhere: Cat 1 in-module exact (idw/kriging/spline share x, y, z, template positionals and name= default ''; template matches kde's template=); docstring/signature parity exact on all 3 publics (every param documented, Returns sections match incl. kriging's tuple); Cat 4 no default drift (power=2.0, k=None, fill_value=nan, variogram_model='spherical', nlags=15, smoothing=0.0, all single-owner params); Cat 5 no orphan API (all 3 re-exported in xrspatial/__init__.py and autosummaried in docs/source/reference/interpolation.rst; tests touch private helpers only via module paths). Cross-cutting, notes only per template: fill_value (idw) vs fill (rasterize) for the uncovered-pixel value is library-wide drift (idw matches numpy's fill_value convention, left alone); public functions are untyped module-wide (consistent internally, drifts from typed kde/rasterize/proximity siblings -- annotation pass would span the whole module, LOW, not filed); kde's keyword-only style is the library minority so interpolate's positional style matches the rasterize/proximity majority. GPU k-nearest rejection (NotImplementedError) is deliberate and documented in the k param docstring. cuda-validated: CUDA_AVAILABLE=True on this host; idw/kriging/spline smoke-tested with full kwargs on numpy AND cupy DataArrays (variance name parity confirmed on both), dask+numpy and dask+cupy graph construction verified without compute." mcda,2026-06-10,3148,HIGH,1;2;3;5,"Sweep 2026-06-10 (deep-sweep-api-consistency-mcda-2026-06-10). Fixed in this branch (#3148): (HIGH Cat 1) owa() named its criterion-weight dict criterion_weights while wlc/wpm/sensitivity use weights (same semantics, same _validate_weights); renamed to weights with keyword-only criterion_weights deprecation shim (DeprecationWarning; both names -> TypeError; positional callers untouched). (MEDIUM Cat 2) boolean_overlay annotated criteria as dict-only while every sibling combiner takes xr.Dataset; Dataset already worked via the Mapping interface -- now annotated/documented as xr.Dataset | dict. (MEDIUM Cat 3) ahp_weights docstring Raises claimed ValueError on incomplete comparisons but code warns (UserWarning) and defaults missing pairs to 1 -- docstring now documents Warns behaviour. (MEDIUM Cat 5) ConsistencyResult returned by public ahp_weights but absent from xrspatial/mcda __all__ and docs/source/reference/mcda.rst -- exported and documented. Documented, NOT fixed here: (MEDIUM Cat 2, deferred to parallel sweep-metadata sibling to avoid duplicate PR) constrain() drops attrs via xr.where while the other nine public functions preserve them. (LOW Cat 2) ahp_weights returns (weights, ConsistencyResult) tuple vs rank_weights bare dict -- intentional, documented in both docstrings, no fix. (LOW Cat 4) name=None inherit-input-name (standardize/constrain) vs literal-name defaults (combiners) -- defensible split, document only. Pre-existing backend bugs surfaced by the mandated cupy smoke (accuracy/test-coverage lane, recorded in #3148 body): owa fails on cupy (numpy order-weights array mixed into cupy multiply, combine.py ~336-340) and on ANY dask backend at graph construction (da.sort does not exist, combine.py:356, despite the owa MemoryError message recommending dask); sensitivity(method=monte_carlo) fails on cupy (template.values implicit-conversion guard). constrain on cupy blocked by the known library-wide cupy 13.6 + xarray xr.where astype incompat (dependency-pin issue), not mcda-specific. cuda-validated: CUDA_AVAILABLE=True; all 10 public functions smoke-tested on cupy DataArrays; owa weights=/criterion_weights= shim verified on numpy AND cupy entry points (cupy execution stops at the pre-existing mixed-array bug, signature acceptance confirmed)." diff --git a/xrspatial/geotiff/_writers/eager.py b/xrspatial/geotiff/_writers/eager.py index dfa148cbe..6b1c50c60 100644 --- a/xrspatial/geotiff/_writers/eager.py +++ b/xrspatial/geotiff/_writers/eager.py @@ -82,11 +82,17 @@ def to_geotiff(data: xr.DataArray | np.ndarray, (``SUPPORTED_FEATURES['writer.overviews']``): the ``overview_levels`` and ``overview_resampling`` knobs and the pyramid bytes themselves. Also explicit ``bigtiff=True``; - ``photometric=`` overrides; ``extra_tags`` pass-through. + ``photometric=`` overrides. * [experimental] GPU dispatch via ``gpu=True``; ``compression`` in ``{'lerc', 'jpeg2000', 'j2k', 'lz4'}`` behind the explicit ``allow_experimental_codecs=True`` opt-in; - ``allow_unparseable_crs=True``. + ``allow_unparseable_crs=True``; ``attrs['extra_tags']`` / + ``attrs['gdal_metadata_xml']`` pass-through + (``SUPPORTED_FEATURES['writer.extra_tags']`` / + ``['writer.gdal_metadata_xml']``), gated behind the same + ``allow_experimental_codecs=True`` opt-in -- except for attrs + that came from an xrspatial read, whose round-trip stays + flag-free. * [internal-only] ``compression='jpeg'`` behind ``allow_internal_only_jpeg=True``. The produced files do not round-trip through libtiff / GDAL / rasterio; the path exists for @@ -309,11 +315,12 @@ def to_geotiff(data: xr.DataArray | np.ndarray, * An ``int`` -- written verbatim into Photometric for advanced callers (e.g. ``3`` for Palette, ``5`` for CMYK). - A user-supplied ``extra_tags`` entry of ``(TAG_PHOTOMETRIC, - ...)`` or ``(TAG_EXTRA_SAMPLES, ...)`` overrides the writer's - chosen value; only these two tag ids are overridable so other - auto-emitted tags such as ``ImageWidth`` or ``StripOffsets`` - remain protected. + A user-supplied ``attrs['extra_tags']`` entry of + ``(TAG_PHOTOMETRIC, ...)`` or ``(TAG_EXTRA_SAMPLES, ...)`` + overrides the writer's chosen value (the pass-through itself + rides the experimental tier; see the tier list above); only + these two tag ids are overridable so other auto-emitted tags + such as ``ImageWidth`` or ``StripOffsets`` remain protected. allow_experimental_codecs : bool [experimental] Opt in to the Tier 3 experimental codecs ``'lerc'``, diff --git a/xrspatial/geotiff/tests/unit/test_signatures.py b/xrspatial/geotiff/tests/unit/test_signatures.py index 832523eb2..63be1539d 100644 --- a/xrspatial/geotiff/tests/unit/test_signatures.py +++ b/xrspatial/geotiff/tests/unit/test_signatures.py @@ -823,6 +823,27 @@ def test_validate_write_rich_tag_optin_exempts_round_trip(): ) +def test_to_geotiff_docstring_extra_tags_tier_matches_supported_features(): + """The tier list in ``to_geotiff``'s docstring must agree with + ``SUPPORTED_FEATURES`` on the rich-tag pass-through. The docstring + used to list ``extra_tags`` under ``[advanced]`` (which carries no + kwarg gate) while ``writer.extra_tags`` sits at ``experimental`` + and ``_validate_write_rich_tag_optin`` rejects a fresh DataArray + without ``allow_experimental_codecs=True`` (#3593). + """ + assert g.SUPPORTED_FEATURES['writer.extra_tags'] == 'experimental' + assert g.SUPPORTED_FEATURES['writer.gdal_metadata_xml'] == 'experimental' + # Only inspect the release-contract tier list at the top of the + # docstring, not the parameter docs further down (``photometric`` + # legitimately mentions the extra_tags override there). + tier_list = to_geotiff.__doc__.split('Parameters')[0] + advanced = tier_list.split('[advanced]')[1].split('* [')[0] + assert 'extra_tags' not in advanced + experimental = tier_list.split('[experimental]')[1].split('* [')[0] + assert 'extra_tags' in experimental + assert 'allow_experimental_codecs' in experimental + + # --- Read end-to-end: write an experimental-codec file, then assert the # read side refuses without the matching opt-in and succeeds with it. ---