Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude/sweep-security-state.csv
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ fire,2026-04-25,,,,,"Clean. Despite the module's size hint, fire.py is purely pe
flood,2026-05-03,1437,MEDIUM,3,,Re-audit 2026-05-03. MEDIUM Cat 3 fixed in PR #1438 (travel_time and flood_depth_vegetation now validate mannings_n DataArray values are finite and strictly positive via _validate_mannings_n_dataarray helper). No remaining unfixed findings. Other categories clean: every allocation is same-shape as input; no flat index math; NaN propagation explicit in every backend; tan_slope clamped by _TAN_MIN; no CUDA kernels; no file I/O; every public API calls _validate_raster on DataArray inputs.
focal,2026-06-10,3222,MEDIUM,1;6,3223,"Two MEDIUM findings, both fixed via rockout. Cat 6 (#3222): mean() GPU paths (_mean_cupy ~261, _mean_dask_cupy ~194) force float32 while CPU computes float64 (astype(float)); max abs diff 0.5 on values ~1e7; same class as #2769 which only covered apply()/focal_stats(). Cat 1 (#3223): _check_kernel_vs_raster_memory budgets 4 B/cell ('float32 internals') but #2805 made internals preserve float64, so the guard underestimates 2x and a float64 combo can pass at ~100% of available RAM. Clean elsewhere: Cat 2 no int32 flat-index math; Cat 3 all divisions guarded (num>0, w_sum>0, var<0 clamp, variance_term where-guard, global_std==0 validated eagerly + lazily via _gistar_validate_lazy), NaN checks use v!=v idiom; Cat 4 all 10 CUDA kernels have bounds guards, validated under compute-sanitizer memcheck on shapes (1,1)/(7,1)/(1,7)/(97,89): 0 errors; Cat 5 no file I/O; all public APIs call _validate_raster."
geodesic,2026-04-27,1283,HIGH,1,,"HIGH (fixed PR #1285): slope(method='geodesic') and aspect(method='geodesic') stack a (3, H, W) float64 array (data, lat, lon) before dispatch with no memory check. A large lat/lon-tagged raster passed to either function would OOM. Fixed by adding _check_geodesic_memory(rows, cols) in xrspatial/geodesic.py (mirrors morphology._check_kernel_memory): budgets 56 bytes/cell (24 stacked float64 + 4 float32 output + 24 padded copy + slack) and raises MemoryError when > 50% of available RAM; called from slope.py and aspect.py inside the geodesic branch before dispatch. No other findings: 6 CUDA kernels all have bounds guards (e.g. _run_gpu_geodesic_aspect at geodesic.py:395), custom 16x16 thread blocks avoid register spill, no shared memory, _validate_raster runs upstream in slope/aspect, all backends cast to float32, slope_mag < 1e-7 flat threshold prevents arctan2 NaN propagation, curvature correction uses hardcoded WGS84 R."
geotiff,2026-06-09,3104,MEDIUM,3;6,,"Re-audit pass 20 2026-06-09 (deep-sweep). MEDIUM Cat 3/6: unpack=True accepted SCALE=0 / non-finite SCALE-OFFSET from GDAL_METADATA (silent data destruction on read: all pixels become offset or NaN) and _pack divided by scale_factor with no guard (pack=True round-trip wrote an all-sentinel file, confirmed by repro). Issue #3104, fixed on deep-sweep-security-geotiff-2026-06-09: reject in _extract_scale_offset (covers numpy+dask, both share it) plus _pack guard for hand-edited attrs; tests tests/read/test_scale_zero_3104.py. Audited 191 commits since b5bd2658 incl. reader split into _sources/_decode/_encode/_nodata/_overview, pack/unpack (#3064/#3075), sidecar origin threading (#3027), streaming write budget (#3008), _overview_kernels.py ngjit reducers (bounds-checked, clean), GPU overview path (cupy ufuncs, no raw kernels). Carried-forward guards verified post-refactor: SSRF UnsafeURLError+ALLOW_PRIVATE_HOSTS now in _sources.py, DOCTYPE rejection _safe_xml, realpath containment _vrt, decompression-bomb margins _compression, MAX_IFDS/MAX_IFD_ENTRY_COUNT/validate_tile_layout _header, max_pixels on all backends, _check_gpu_memory tile caps, sidecar max_cloud_bytes (#2121) with SSRF-validated _HTTPSource probes. CUDA available; no Cat 4 suspects required kernel execution."
geotiff,2026-06-12,3264,MEDIUM,3;6,,"Re-audit pass 21 2026-06-12 (deep-sweep). MEDIUM Cat 3/6: _pack filled NaN holes on the float64 buffer then cast, so 64-bit sentinels above 2**53 wrapped (INT64_MAX->INT64_MIN, UINT64_MAX->0) while GDAL_NODATA kept the original value; masked re-read returned holes as valid pixels; the nodata kwarg was float-validated so INT64_MAX was rejected outright. Issue #3264, fixed on deep-sweep-security-geotiff-2026-06-12: _pack_restore_int fills at native width after the cast (matches eager/GPU writers' dtype.type(nodata)), kwarg check compares as ints; tests tests/write/test_pack_64bit_sentinel_3264.py incl. gpu and dask+gpu legs. Audited the 15 commits since 7ccec772 (#3104 fix): pack nodata kwarg threading #3174, band-subset SCALE rewrite #3175, float32 width #3239, cupy pack fix #3240, GPU streaming writer #3241 (validated on GPU: 1x1/Nx1/1xN/prime shapes, 3D lazy moveaxis, streaming_buffer_bytes=1, all byte-exact round trips, bounded device memory), compression_level gate #3176 (normalized codec, pre-dispatch), VRT dst_offsets placement #3135 (internal-only, validated non-negative ints), native-width 64-bit masking #3128. _stream_row_bands boundaries are tile-aligned so per-band GPU compression cannot zero-pad mid-image. CUDA available; GPU paths exercised, no Cat 4 findings."
glcm,2026-04-24,1257,HIGH,1,,"HIGH (fixed #1257): glcm_texture() validated window_size only as >= 3 and distance only as >= 1, with no upper bound on either. _glcm_numba_kernel iterates range(r-half, r+half+1) for every pixel, so window_size=1_000_001 on a 10x10 raster ran ~10^14 loop iterations with all neighbors failing the interior bounds check (CPU DoS). On the dask backends depth = window_size // 2 + distance drove map_overlap padding, so a huge window also caused oversize per-chunk allocations (memory DoS). Fixed by adding max_val caps in the public entrypoint: window_size <= max(3, min(rows, cols)) and distance <= max(1, window_size // 2). One cap covers every backend because cupy and dask+cupy call through to the CPU kernel after cupy.asnumpy. No other HIGH findings: levels is already capped at 256 so the per-pixel np.zeros((levels, levels)) matrix in the kernel is bounded to 512 KB. No CUDA kernels. No file I/O. Quantization clips to [0, levels-1] before the kernel and NaN maps to -1 which the kernel filters with i_val >= 0. Entropy log(p) and correlation p / (std_i * std_j) are both guarded. All four backends use _validate_raster and cast to float64 before quantizing. MEDIUM (unfixed, Cat 1): the per-pixel np.zeros((levels, levels)) allocation inside the hot loop is a perf issue (levels=256 -> 512 KB alloc+free per pixel) but not a security issue because levels is bounded. Could be hoisted out of the loop or replaced with an in-place clear, but that is an efficiency concern, not security."
gpu_rtx,2026-04-29,1308,HIGH,1,,"HIGH (fixed #1308 / PR #1310): hillshade_rtx (gpu_rtx/hillshade.py:184) and viewshed_gpu (gpu_rtx/viewshed.py:269) allocated cupy device buffers sized by raster shape with no memory check. create_triangulation (mesh_utils.py:23-24) adds verts (12 B/px) + triangles (24 B/px) = 36 B/px; hillshade_rtx adds d_rays(32) + d_hits(16) + d_aux(12) + d_output(4) = 64 B/px (100 B/px total); viewshed_gpu adds d_rays(32) + d_hits(16) + d_visgrid(4) + d_vsrays(32) = 84 B/px (120 B/px total). A 30000x30000 raster asked for 90-108 GB of VRAM before cupy surfaced an opaque allocator error. Fixed by adding gpu_rtx/_memory.py with _available_gpu_memory_bytes() and _check_gpu_memory(func_name, h, w) helpers (cost_distance #1262 / sky_view_factor #1299 pattern, 120 B/px budget covers worst case, raises MemoryError when required > 50% of free VRAM, skips silently when memGetInfo() unavailable). Wired into both entry points after the cupy.ndarray type check and before create_triangulation. 9 new tests in test_gpu_rtx_memory.py (5 helper-unit + 4 end-to-end gated on has_rtx). All 81 existing hillshade/viewshed tests still pass. Cat 4 clean: all CUDA kernels (hillshade.py:25/62/106, viewshed.py:32/74/116, mesh_utils.py:50) have bounds guards; no shared memory, no syncthreads needed. MEDIUM not fixed (Cat 6): hillshade_rtx and viewshed_gpu do not call _validate_raster directly but parent hillshade() (hillshade.py:252) and viewshed() (viewshed.py:1707) already validate, so input validation runs before the gpu_rtx entry point - defense-in-depth, not exploitable. MEDIUM not fixed (Cat 2): mesh_utils.py:64-68 cast mesh_map_index to int32 in the triangle index buffer; overflows at H*W > 2.1B vertices (~46341x46341+) but the new memory guard rejects rasters that large first - documentation/clarity item rather than exploitable. MEDIUM not fixed (Cat 3): mesh_utils.py:19 scale = maxDim / maxH divides by zero on an all-zero raster, propagating inf/NaN into mesh vertex z-coords; separate follow-up. LOW not fixed (Cat 5): mesh_utils.write() opens user-supplied path without canonicalization but its only call site (mesh_utils.py:38-39) sits behind if False: in create_triangulation, not reachable in production."
hillshade,2026-04-27,,,,,"Clean. Cat 1: only allocation is the output np.empty(data.shape) at line 32 (cupy at line 165) and a _pad_array with hardcoded depth=1 (line 62) -- bounded by caller, no user-controlled amplifier. Azimuth/altitude are scalars and don't drive size. Cat 2: numba kernel uses range(1, rows-1) with simple (y, x) indexing; numba range loops promote to int64. Cat 3: math.sqrt(1.0 + xx_plus_yy) is always >= 1.0 (no neg sqrt, no div-by-zero); NaN elevation propagates correctly through dz_dx/dz_dy -> shaded -> output (the shaded < 0.0 / shaded > 1.0 clamps don't fire on NaN). Azimuth validated to [0, 360], altitude to [0, 90]. Cat 4: _gpu_calc_numba (line 107) guards both grid bounds and 3x3 stencil reads via i > 0 and i < shape[0]-1 and j > 0 and j < shape[1]-1; no shared memory. Cat 5: no file I/O. Cat 6: hillshade() calls _validate_raster (line 252) and _validate_scalar for both azimuth (253) and angle_altitude (254); all four backend paths cast to float32; tests parametrize int32/int64/float32/float64."
Expand Down
88 changes: 77 additions & 11 deletions xrspatial/geotiff/_attrs.py
Original file line number Diff line number Diff line change
Expand Up @@ -1710,7 +1710,10 @@ def _resolve(name, default):


def _pack_fill_nan(chunk, fill):
"""NaN -> sentinel fill for ``_pack``, at the buffer level.
"""NaN -> sentinel fill for ``_pack``'s float targets, at the buffer
level. Integer targets route through :func:`_pack_restore_int`
instead, which assigns the sentinel at the target dtype's native
width (issue #3264).

``DataArray.fillna`` routes through xarray's ``duck_array_ops.where``,
which breaks on cupy-backed arrays (issue #3112): eager cupy hits
Expand All @@ -1730,6 +1733,45 @@ def _pack_fill_nan(chunk, fill):
return out


def _pack_restore_int(chunk, fill, tgt_name):
"""NaN -> sentinel fill plus integer restore for ``_pack``, at the
target dtype's native width.

Filling the float64 buffer first and casting afterwards corrupts
64-bit sentinels above 2**53: ``float(INT64_MAX)`` rounds to 2**63,
the cast overflows, and the holes land on INT64_MIN (0 for
UINT64_MAX) while the GDAL_NODATA tag still declares the original
sentinel, so a masked re-read returns them as valid pixels (issue
#3264). Instead, park the holes at zero for the round + cast, then
assign the sentinel as a Python int -- exact at any width, the same
way the eager and GPU writers fill via ``dtype.type(nodata)``.
Works on numpy and cupy chunks alike: ``np.isnan`` dispatches on
the array type and the masked assignments take Python scalars.

Runs :func:`_pack_guard_int_range` between the round and the cast:
this is the only integer restore the sentinel path takes, and the
chunk is already integer-typed by the time ``_pack``'s body-level
guard block runs, so the range / finiteness check (issue #3260)
must happen here. ``+/-Inf`` is not NaN and therefore survives the
hole mask; the parked holes hold 0.0, in range for every integer
dtype, so they pass the guard.
"""
tgt = np.dtype(tgt_name)
if chunk.dtype.kind != 'f':
# Integer buffers cannot hold NaN; nothing to fill.
return chunk.astype(tgt)
mask = np.isnan(chunk)
out = chunk.copy()
out[mask] = 0.0
out = out.round()
info = np.iinfo(tgt)
_pack_guard_int_range(out, tgt.name, float(info.min),
float(int(info.max) + 1))
out = out.astype(tgt)
out[mask] = int(fill)
return out


def _pack_guard_int_range(chunk, tgt_name, lo, hi_excl):
"""Per-chunk range / finiteness guard for ``_pack``'s integer restore.

Expand Down Expand Up @@ -1886,11 +1928,19 @@ def _pack(data, nodata=None):
# The attrs sentinel fits the source dtype by construction (the
# reader validated it against GDAL_NODATA). The kwarg has no such
# guarantee: an out-of-range or fractional value would wrap /
# round in the ``astype`` below, silently recreating the
# fill-vs-tag disagreement this override exists to prevent.
# round in the fill below, silently recreating the fill-vs-tag
# disagreement this override exists to prevent. Compare as
# integers, not through float(): float64 cannot represent 64-bit
# values above 2**53, so a float round-trip rejected INT64_MAX
# even though it fits int64 exactly (issue #3264).
info = np.iinfo(tgt)
fv = float(nodata_kwarg)
if not (info.min <= fv <= info.max) or fv != int(fv):
try:
iv = int(nodata_kwarg)
representable = (iv == nodata_kwarg
and info.min <= iv <= info.max)
except (OverflowError, ValueError):
representable = False # inf / nan
if not representable:
raise ValueError(
f"pack=True: nodata={nodata_kwarg!r} cannot be represented "
f"in the packed integer dtype {tgt.name} (valid range "
Expand All @@ -1904,11 +1954,24 @@ def _pack(data, nodata=None):
# GDAL_NODATA tag that still declares the sentinel, leaving an
# inconsistent file that non-masking readers misread (#3078).
# Not ``fillna``: that routes through xarray's where(), which
# crashes on cupy-backed arrays (#3112). ``_pack_fill_nan``
# fills at the buffer level on all four backends; dask backings
# keep the fill lazy via the same map_blocks shape as the
# no-sentinel guard below.
if hasattr(out.data, 'dask'):
# crashes on cupy-backed arrays (#3112). The helpers fill at the
# buffer level on all four backends; dask backings keep the fill
# lazy via the same map_blocks shape as the no-sentinel guard
# below.
if tgt.kind in ('i', 'u'):
# Integer restore: fill, round, and cast in one step so the
# sentinel is assigned at the target dtype's native width.
# Filling the float64 buffer first corrupts 64-bit sentinels
# above 2**53 -- INT64_MAX wrapped to INT64_MIN in the cast
# while GDAL_NODATA kept declaring INT64_MAX (issue #3264).
if hasattr(out.data, 'dask'):
out = out.copy(data=out.data.map_blocks(
_pack_restore_int, nodata, tgt.name,
dtype=tgt, meta=out.data._meta.astype(tgt)))
else:
out = out.copy(
data=_pack_restore_int(out.data, nodata, tgt.name))
elif hasattr(out.data, 'dask'):
out = out.copy(data=out.data.map_blocks(
_pack_fill_nan, nodata,
dtype=out.dtype, meta=out.data._meta))
Expand Down Expand Up @@ -1939,7 +2002,10 @@ def _pack(data, nodata=None):
# cupy take the same path here.
_pack_guard_no_nan(out.data, tgt.name)

if tgt.kind in ('i', 'u'):
# The integer-with-sentinel path above already rounded and cast at
# the chunk level; its ``out`` is integer-typed here, so the round
# is skipped and the astype is a no-op cast.
if tgt.kind in ('i', 'u') and out.dtype.kind == 'f':
out = out.round()
# Out-of-range and non-finite packed values would silently wrap
# in the astype below, exactly the corruption the NaN guard
Expand Down
Loading
Loading