diff --git a/.claude/sweep-security-state.csv b/.claude/sweep-security-state.csv index 6a0cac80b..fbe6425d5 100644 --- a/.claude/sweep-security-state.csv +++ b/.claude/sweep-security-state.csv @@ -18,7 +18,7 @@ fire,2026-04-25,,,,,"Clean. Despite the module's size hint, fire.py is purely pe flood,2026-05-03,1437,MEDIUM,3,,Re-audit 2026-05-03. MEDIUM Cat 3 fixed in PR #1438 (travel_time and flood_depth_vegetation now validate mannings_n DataArray values are finite and strictly positive via _validate_mannings_n_dataarray helper). No remaining unfixed findings. Other categories clean: every allocation is same-shape as input; no flat index math; NaN propagation explicit in every backend; tan_slope clamped by _TAN_MIN; no CUDA kernels; no file I/O; every public API calls _validate_raster on DataArray inputs. focal,2026-06-10,3222,MEDIUM,1;6,3223,"Two MEDIUM findings, both fixed via rockout. Cat 6 (#3222): mean() GPU paths (_mean_cupy ~261, _mean_dask_cupy ~194) force float32 while CPU computes float64 (astype(float)); max abs diff 0.5 on values ~1e7; same class as #2769 which only covered apply()/focal_stats(). Cat 1 (#3223): _check_kernel_vs_raster_memory budgets 4 B/cell ('float32 internals') but #2805 made internals preserve float64, so the guard underestimates 2x and a float64 combo can pass at ~100% of available RAM. Clean elsewhere: Cat 2 no int32 flat-index math; Cat 3 all divisions guarded (num>0, w_sum>0, var<0 clamp, variance_term where-guard, global_std==0 validated eagerly + lazily via _gistar_validate_lazy), NaN checks use v!=v idiom; Cat 4 all 10 CUDA kernels have bounds guards, validated under compute-sanitizer memcheck on shapes (1,1)/(7,1)/(1,7)/(97,89): 0 errors; Cat 5 no file I/O; all public APIs call _validate_raster." geodesic,2026-04-27,1283,HIGH,1,,"HIGH (fixed PR #1285): slope(method='geodesic') and aspect(method='geodesic') stack a (3, H, W) float64 array (data, lat, lon) before dispatch with no memory check. A large lat/lon-tagged raster passed to either function would OOM. Fixed by adding _check_geodesic_memory(rows, cols) in xrspatial/geodesic.py (mirrors morphology._check_kernel_memory): budgets 56 bytes/cell (24 stacked float64 + 4 float32 output + 24 padded copy + slack) and raises MemoryError when > 50% of available RAM; called from slope.py and aspect.py inside the geodesic branch before dispatch. No other findings: 6 CUDA kernels all have bounds guards (e.g. _run_gpu_geodesic_aspect at geodesic.py:395), custom 16x16 thread blocks avoid register spill, no shared memory, _validate_raster runs upstream in slope/aspect, all backends cast to float32, slope_mag < 1e-7 flat threshold prevents arctan2 NaN propagation, curvature correction uses hardcoded WGS84 R." -geotiff,2026-06-12,3264,MEDIUM,3;6,,"Re-audit pass 21 2026-06-12 (deep-sweep). MEDIUM Cat 3/6: _pack filled NaN holes on the float64 buffer then cast, so 64-bit sentinels above 2**53 wrapped (INT64_MAX->INT64_MIN, UINT64_MAX->0) while GDAL_NODATA kept the original value; masked re-read returned holes as valid pixels; the nodata kwarg was float-validated so INT64_MAX was rejected outright. Issue #3264, fixed on deep-sweep-security-geotiff-2026-06-12: _pack_restore_int fills at native width after the cast (matches eager/GPU writers' dtype.type(nodata)), kwarg check compares as ints; tests tests/write/test_pack_64bit_sentinel_3264.py incl. gpu and dask+gpu legs. Audited the 15 commits since 7ccec772 (#3104 fix): pack nodata kwarg threading #3174, band-subset SCALE rewrite #3175, float32 width #3239, cupy pack fix #3240, GPU streaming writer #3241 (validated on GPU: 1x1/Nx1/1xN/prime shapes, 3D lazy moveaxis, streaming_buffer_bytes=1, all byte-exact round trips, bounded device memory), compression_level gate #3176 (normalized codec, pre-dispatch), VRT dst_offsets placement #3135 (internal-only, validated non-negative ints), native-width 64-bit masking #3128. _stream_row_bands boundaries are tile-aligned so per-band GPU compression cannot zero-pad mid-image. CUDA available; GPU paths exercised, no Cat 4 findings." +geotiff,2026-07-01,3590,MEDIUM,3,,"Re-audit pass 22 2026-07-01 (deep-sweep): delta since 2026-06-12 = xarray engine (#3375/#3377/#3380), PAM RAT sidecar r/w (#3483/#3522), symbology sidecars (#3538/#3546), hardening (#3324 compression_level type-check, #3327 gdal_metadata dict gate, #3332 degenerate pixel size, #3372 predictor-codec reject), chunked GPU single-parse (#3374), _CloudSource cat_file race fix (#3361). MEDIUM Cat 3: _parse_rat int(float(Value cell)) raises OverflowError on '1e400'/'inf'; it subclasses ArithmeticError not ValueError so it escaped read_pam_sidecar's fail-closed except tuple and crashed open_geotiff for any local source with an adversarial/corrupt .aux.xml -- same contract class as #3520/#3522. Issue #3590, fixed on deep-sweep-security-geotiff-2026-07-01 (OverflowError added to tuple + test test_non_finite_rat_value_returns_empty). _safe_xml DOCTYPE rejection reviewed (sound, incl. UTF-16/32 BOM probe); _symbology writes only hardcoded ramps/numeric stats (no injection); _xarray_backend clean. LOW (documented only): read_pam_sidecar slurps the whole .aux.xml with no size cap (adjacent-file memory DoS). No new GPU kernels in delta; CUDA available, repro and fix validated locally." glcm,2026-04-24,1257,HIGH,1,,"HIGH (fixed #1257): glcm_texture() validated window_size only as >= 3 and distance only as >= 1, with no upper bound on either. _glcm_numba_kernel iterates range(r-half, r+half+1) for every pixel, so window_size=1_000_001 on a 10x10 raster ran ~10^14 loop iterations with all neighbors failing the interior bounds check (CPU DoS). On the dask backends depth = window_size // 2 + distance drove map_overlap padding, so a huge window also caused oversize per-chunk allocations (memory DoS). Fixed by adding max_val caps in the public entrypoint: window_size <= max(3, min(rows, cols)) and distance <= max(1, window_size // 2). One cap covers every backend because cupy and dask+cupy call through to the CPU kernel after cupy.asnumpy. No other HIGH findings: levels is already capped at 256 so the per-pixel np.zeros((levels, levels)) matrix in the kernel is bounded to 512 KB. No CUDA kernels. No file I/O. Quantization clips to [0, levels-1] before the kernel and NaN maps to -1 which the kernel filters with i_val >= 0. Entropy log(p) and correlation p / (std_i * std_j) are both guarded. All four backends use _validate_raster and cast to float64 before quantizing. MEDIUM (unfixed, Cat 1): the per-pixel np.zeros((levels, levels)) allocation inside the hot loop is a perf issue (levels=256 -> 512 KB alloc+free per pixel) but not a security issue because levels is bounded. Could be hoisted out of the loop or replaced with an in-place clear, but that is an efficiency concern, not security." gpu_rtx,2026-04-29,1308,HIGH,1,,"HIGH (fixed #1308 / PR #1310): hillshade_rtx (gpu_rtx/hillshade.py:184) and viewshed_gpu (gpu_rtx/viewshed.py:269) allocated cupy device buffers sized by raster shape with no memory check. create_triangulation (mesh_utils.py:23-24) adds verts (12 B/px) + triangles (24 B/px) = 36 B/px; hillshade_rtx adds d_rays(32) + d_hits(16) + d_aux(12) + d_output(4) = 64 B/px (100 B/px total); viewshed_gpu adds d_rays(32) + d_hits(16) + d_visgrid(4) + d_vsrays(32) = 84 B/px (120 B/px total). A 30000x30000 raster asked for 90-108 GB of VRAM before cupy surfaced an opaque allocator error. Fixed by adding gpu_rtx/_memory.py with _available_gpu_memory_bytes() and _check_gpu_memory(func_name, h, w) helpers (cost_distance #1262 / sky_view_factor #1299 pattern, 120 B/px budget covers worst case, raises MemoryError when required > 50% of free VRAM, skips silently when memGetInfo() unavailable). Wired into both entry points after the cupy.ndarray type check and before create_triangulation. 9 new tests in test_gpu_rtx_memory.py (5 helper-unit + 4 end-to-end gated on has_rtx). All 81 existing hillshade/viewshed tests still pass. Cat 4 clean: all CUDA kernels (hillshade.py:25/62/106, viewshed.py:32/74/116, mesh_utils.py:50) have bounds guards; no shared memory, no syncthreads needed. MEDIUM not fixed (Cat 6): hillshade_rtx and viewshed_gpu do not call _validate_raster directly but parent hillshade() (hillshade.py:252) and viewshed() (viewshed.py:1707) already validate, so input validation runs before the gpu_rtx entry point - defense-in-depth, not exploitable. MEDIUM not fixed (Cat 2): mesh_utils.py:64-68 cast mesh_map_index to int32 in the triangle index buffer; overflows at H*W > 2.1B vertices (~46341x46341+) but the new memory guard rejects rasters that large first - documentation/clarity item rather than exploitable. MEDIUM not fixed (Cat 3): mesh_utils.py:19 scale = maxDim / maxH divides by zero on an all-zero raster, propagating inf/NaN into mesh vertex z-coords; separate follow-up. LOW not fixed (Cat 5): mesh_utils.write() opens user-supplied path without canonicalization but its only call site (mesh_utils.py:38-39) sits behind if False: in create_triangulation, not reachable in production." hillshade,2026-04-27,,,,,"Clean. Cat 1: only allocation is the output np.empty(data.shape) at line 32 (cupy at line 165) and a _pad_array with hardcoded depth=1 (line 62) -- bounded by caller, no user-controlled amplifier. Azimuth/altitude are scalars and don't drive size. Cat 2: numba kernel uses range(1, rows-1) with simple (y, x) indexing; numba range loops promote to int64. Cat 3: math.sqrt(1.0 + xx_plus_yy) is always >= 1.0 (no neg sqrt, no div-by-zero); NaN elevation propagates correctly through dz_dx/dz_dy -> shaded -> output (the shaded < 0.0 / shaded > 1.0 clamps don't fire on NaN). Azimuth validated to [0, 360], altitude to [0, 90]. Cat 4: _gpu_calc_numba (line 107) guards both grid bounds and 3x3 stencil reads via i > 0 and i < shape[0]-1 and j > 0 and j < shape[1]-1; no shared memory. Cat 5: no file I/O. Cat 6: hillshade() calls _validate_raster (line 252) and _validate_scalar for both azimuth (253) and angle_altitude (254); all four backend paths cast to float32; tests parametrize int32/int64/float32/float64." diff --git a/xrspatial/geotiff/_pam.py b/xrspatial/geotiff/_pam.py index 0a257b101..b3211b1dc 100644 --- a/xrspatial/geotiff/_pam.py +++ b/xrspatial/geotiff/_pam.py @@ -172,7 +172,8 @@ def read_pam_sidecar(path): if colors is not None: out['category_colors'] = colors return out - except (OSError, ValueError, TypeError, IndexError, ParseError): + except (OSError, ValueError, TypeError, IndexError, OverflowError, + ParseError): # A missing, malformed, or foreign sidecar is non-fatal auxiliary # metadata, not a read error -- never let it break open_geotiff. # IndexError covers a thematic RAT whose carries fewer @@ -182,7 +183,11 @@ def read_pam_sidecar(path): # adjacent sidecar. ParseError covers a truncated or otherwise # non-well-formed sidecar: safe_fromstring raises it (a SyntaxError # subclass, so not covered by the types above) and it would - # likewise escape and crash the read. + # likewise escape and crash the read. OverflowError covers a Value + # cell whose text parses to infinity ("1e400", "inf"): + # int(float(...)) in _parse_rat raises it, and it subclasses + # ArithmeticError rather than ValueError, so it too would escape + # and crash the read (issue #3590). return {} diff --git a/xrspatial/tests/test_rasterize_categorical_3482.py b/xrspatial/tests/test_rasterize_categorical_3482.py index 4cce25899..d7818dba8 100644 --- a/xrspatial/tests/test_rasterize_categorical_3482.py +++ b/xrspatial/tests/test_rasterize_categorical_3482.py @@ -282,6 +282,30 @@ def test_short_row_thematic_rat_returns_empty(self, tmp_path): # Must never raise; worst case returns {}. assert read_pam_sidecar(path) == {} + def test_non_finite_rat_value_returns_empty(self, tmp_path): + """A RAT Value cell that parses to infinity must not crash the read. + + _parse_rat converts the Value cell with int(float(text)); a cell + such as "1e400" or "inf" makes int() raise OverflowError, which + subclasses ArithmeticError rather than ValueError and so escaped + the read_pam_sidecar except tuple, crashing the open_geotiff call + that reads the sidecar for any local string source (issue #3590). + """ + from xrspatial.geotiff._pam import read_pam_sidecar + path = str(tmp_path / 'inf_value_3590.tif') + with open(path + '.aux.xml', 'w') as fh: + fh.write('' + '' + 'Value1' + '5' + 'Class2' + '2' + '1e400water' + '' + '') + # Must never raise; worst case returns {}. + assert read_pam_sidecar(path) == {} + def test_non_well_formed_xml_sidecar_returns_empty(self, tmp_path): """A truncated / non-well-formed .aux.xml must not crash the read.