Skip to content

Commit e5cff2a

Browse files
committed
Merge remote-tracking branch 'origin/main' into deep-sweep-style-diffusion-2026-06-20
# Conflicts: # .claude/sweep-style-state.csv
2 parents dbd998b + dfd1a37 commit e5cff2a

14 files changed

Lines changed: 807 additions & 79 deletions

.claude/sweep-accuracy-state.csv

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ contour,2026-05-01,,,,"Marching squares correct: NaN check uses self-inequality,
66
corridor,2026-05-01,,LOW,1,"LOW: corridor inherits float32 from cost_distance; for very large accumulated costs, normalized = corridor - corridor_min loses precision near min (intrinsic to upstream dtype, not corridor itself). NaN handling correct (skipna min, np.isfinite check before normalize). All 4 backends route through pure xarray arithmetic; threshold uses dask/cupy/numpy where with try/except dispatch. No CRIT/HIGH issues."
77
cost_distance,2026-06-16,3369,CRITICAL,5,"CRITICAL heap overflow (#3369/this PR): numba Dijkstra kernels _cost_distance_kernel + _cost_distance_tile_kernel sized the binary min-heap at height*width, but a lazy-deletion heap pushes a pixel on every improving relaxation, so push count exceeds h*w on non-uniform friction. _heap_push then writes OOB -> heap corruption, SIGABRT (exit 134, 'corrupted size vs prev_size') on iterative dask path; UB on numpy path. Reference heapq Dijkstra hits 44 pushes on a 6x6=36 grid. Fix: max_heap = h*w*(n_neighbors+1), tile kernel adds +2*(w+h)+4 for phase-2 boundary seeds. Verified: cupy relax kernel (parallel Bellman-Ford) does NOT use this heap, GPU path unaffected. CUDA available; numpy/cupy/dask+numpy/dask+cupy all agree post-fix over 30+40 random adversarial grids; 88 module tests pass (4 new regression tests). Cats 1-4 clean: dist float64 / out float32 fine; inf/nan/zero friction all impassable (tested); bounds guards use >=h/>=w; planar algorithm, no curvature expected. Supersedes prior #1191 (cupy max_iterations h+w->h*w, fixed PR #1192)."
88
curvature,2026-03-30T15:00:00Z,,,,Formula matches ArcGIS reference. Backends consistent. No issues found.
9-
dasymetric,2026-04-14T12:00:00Z,,,,Mass conservation correct. Weighted/binary/limiting_variable all verified. Pycnophylactic Tobler algorithm correct.
9+
dasymetric,2026-06-20,3403,MEDIUM,2;5,"Cat2/Cat5: disaggregate(limiting_variable) silently dropped a zone's whole value when no pixel could absorb it (all pixels in a cap-0 class, e.g. all zero-weight) - returned finite zeros summing to 0 instead of NaN, violating the documented conservation property; weighted method returns NaN for the same input. Fix #3403/this PR: set zone result to NaN when n_overflow==0, matching weighted. Also flagged (LOW, not fixed, documented only): non-finite +Inf weights are not sanitized like negatives are - an Inf weight poisons the whole zone (wsum=inf -> NaN/0, mass lost), consistent across numpy/cupy/dask backends (not a divergence). Cat1/Cat3/Cat4 clean: weighted division is by sum of positives (no cancellation), no neighborhood stencil off-by-one (pycnophylactic shifts are symmetric and bounds-guarded), no geodesic/curvature math. CUDA available; cupy + dask+cupy parity verified (62 tests pass incl. cross-backend). Backends: cupy + dask paths delegate to the numpy core so the fix covers all four."
1010
diffusion,2026-05-01,,LOW,1;2;5,"LOW: no Kahan summation across long iterations (drift over 100k steps, standard for explicit Euler); lap=n+s+w+e-4*val has catastrophic cancellation for nearly-uniform large values; res=0 in attrs causes div-by-zero (no guard); dask+cupy boundary='nan' relies on dask accepting cp.nan as fill. CPU/GPU NaN handling consistent (np.isnan vs val!=val). depth=1 matches stencil radius. Memory guards, CFL check, step cap all in place. No CRIT/HIGH."
1111
edge_detection,2026-05-01,,,,Thin wrappers around convolve_2d with fixed Sobel/Prewitt/Laplacian kernels; no issues found
1212
emerging_hotspots,2026-04-30,,MEDIUM,2;3,MEDIUM: threshold_90 uses int() (truncation) instead of ceil() so n_times=11 requires only 9/11 (81.8%) instead of 90%. MEDIUM: NaN time steps produce gi_bin=0 which classifier counts as 'non-significant' rather than missing; threshold_90 uses full n_times not valid count. LOW: 'global_std == 0' check does not catch NaN std for fully/mostly NaN inputs.

.claude/sweep-api-consistency-state.csv

Lines changed: 15 additions & 14 deletions
Large diffs are not rendered by default.

.claude/sweep-performance-state.csv

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3,13 +3,13 @@ aspect,2026-05-29,SAFE,compute-bound,1,2688,"dask+cupy geodesic densified full l
33
balanced_allocation,2026-04-16T12:00:00Z,WILL OOM,memory-bound,8,1114,"Re-audit 2026-04-16 after PR 1203 float32 fix. 8 HIGH found (friction.compute L339, argmin.compute in iter loop L182, double all_nan recompute L206, stacked cost_surfaces allocation). Covered by existing documented limitation on #1114. Not refiled."
44
bilateral,2026-03-31T18:00:00Z,SAFE,compute-bound,0,,
55
bump,2026-04-16T12:00:00Z,SAFE,compute-bound,0,1206,Re-audit 2026-04-16: fix verified SAFE. No HIGH findings. MEDIUM: CuPy backend runs CPU kernel then transfers to GPU (documented limitation).
6-
classify,2026-04-16T18:00:00Z,SAFE,compute-bound,0,fixed-in-tree,"Fixed-in-tree 2026-04-16: _run_dask_head_tail_breaks now persists data_clean once and fuses mean+head_count per iter (912ms -> 339ms, 0.37x IMPROVED); added _run_dask_box_plot that samples via _generate_sample_indices instead of boolean fancy indexing on dask array; _run_dask_cupy_box_plot likewise. 85 existing classify tests pass."
6+
classify,2026-06-20,RISKY,graph-bound,1,3412,"Re-audit 2026-06-20 (CUDA host). 1 HIGH: _generate_sample_indices >10M branch used RandomState.choice(replace=False) which builds a full arange(num_data) permutation -> O(num_data) host alloc (160MB for 20M pop, OOM at 30TB) despite docstring claiming O(num_sample). Backed dask/dask+cupy natural_breaks/maximum_breaks/quantile/percentiles/box_plot. Fixed via np.random.default_rng().choice (Floyd, O(num_sample), still deterministic); peak 160MB->0.4MB. Other paths SAFE: head_tail_breaks already persists+fuses; box_plot samples; cupy kernels low-register; no .values/np.asarray-on-dask/.compute-in-loop. 93 classify tests pass incl GPU."
77
contour,2026-03-31T18:00:00Z,SAFE,compute-bound,0,,
88
convolution,2026-03-31T18:00:00Z,SAFE,compute-bound,0,,
99
corridor,2026-03-31T18:00:00Z,SAFE,compute-bound,0,,
1010
cost_distance,2026-06-15,RISKY,memory-bound,1,3342,"Perf sweep 2026-06-15. HIGH: bounded map_overlap branch in _cost_distance_dask gated on full dims (pad>=height/width) not chunk size; pad>chunk collapses to single chunk (#880-class OOM, verified npartitions=1 at chunks=10/pad=96). Fixed: compare pad vs max chunk dim, route to iterative when pad>=chunk (matches GPU path L484). dask+cupy path already correct. Register count 37 (no pressure). nanmin().compute() L478/L1149 intentional scalar. iterative tile_cache full-dataset materialization is documented MemoryError-guarded design (#1118). All 56 tests pass incl GPU."
1111
curvature,2026-03-31T18:00:00Z,SAFE,compute-bound,0,,
12-
dasymetric,2026-03-31T18:00:00Z,SAFE,memory-bound,0,1126,Memory guard added to validate_disaggregation. Core disaggregate uses map_blocks.
12+
dasymetric,2026-06-20,SAFE,compute-bound,0,3408,"MEDIUM: disaggregate() looped per-zone building full-shape boolean masks (O(n_zones x n_pixels)); vectorized via searchsorted + np.add.at to O(n_pixels + n_zones) in PR for #3408 (numpy core, dask per-chunk sums, distribute step). cupy is documented CPU fallback (not flagged); eager zone-sum dask.compute() is required global reduction, bounded per chunk (not flagged). CUDA+cupy validated end-to-end on host."
1313
diffusion,2026-03-31T18:00:00Z,WILL OOM,memory-bound,2,1116,Scalar diffusivity now passed as float to chunks. DataArray diffusivity passed as dask array via map_overlap.
1414
edge_detection,2026-03-31T18:00:00Z,SAFE,compute-bound,0,,
1515
emerging_hotspots,2026-03-31T18:00:00Z,SAFE,compute-bound,0,,
@@ -28,7 +28,7 @@ interpolate_spline,2026-06-04,SAFE,compute-bound,0,,"scope=spline-only. Audited
2828
kde,2026-04-14T12:00:00Z,SAFE,compute-bound,0,,Graph construction serialized per-tile. _filter_points_to_tile scans all points per tile. No HIGH findings.
2929
mahalanobis,2026-03-31T18:00:00Z,SAFE,compute-bound,0,,False positive. Numpy path materializes by design. Dask path uses lazy reductions + map_blocks.
3030
mcda,2026-06-10,SAFE,memory-bound,2,3150,"2 HIGH fixed in PR #3158: owa() dask path crashed (da.sort does not exist; memory guard pointed users at the crashing path) and wpm validation ran one compute() per criterion. MEDIUM fixed in PR #3159 (#3151): cupy piecewise + dask+cupy piecewise/categorical raised TypeError via np.asarray on cupy chunks. MEDIUM fixed in PR #3160 (#3152): monte_carlo sensitivity materialized full dask dataset (now chunk-bounded map_blocks, ~8 tasks/chunk at n_samples=1000) and crashed on cupy via per-sample .values; constrain() deep copy dropped. LOW documented, not fixed: fuzzy_overlay builds ones via layers[0]*0+1; _categorical does one full-array pass per mapping key. Verdict SAFE assumes the 3 PRs merge (pre-fix: WILL OOM for MC-on-dask, owa dask broken). GPU paths validated on CUDA host (cupy 13.6)."
31-
morphology,2026-03-31T18:00:00Z,SAFE,compute-bound,0,,
31+
morphology,2026-06-20,SAFE,compute-bound,1,3401,memory guard fired on full lazy-dask shape (false MemoryError); skip guard for dask-backed inputs; eager numpy/cupy guard preserved
3232
multispectral,2026-05-02,SAFE,compute-bound,0,,"Re-audit 2026-05-02 after PRs 1292 (true_color memory guard) and 1301 (validate_arrays in true_color). Verified SAFE. No HIGH. MEDIUM: da.stack in _true_color_dask/_true_color_dask_cupy at L1702/L1731 creates (1,1,1,1) chunks along band axis (4 bands so impact is minor, scheduling overhead not OOM). LOW: np.zeros((h,w,4)) at L1681 then full overwrite -- np.empty would suffice. All 17 indices use plain map_blocks with no halo; 8192x8192 ndvi graph is 80 tasks, evi/arvi/ebbi 112 tasks."
3333
normalize,2026-03-31T18:00:00Z,SAFE,compute-bound,0,1124,Boolean indexing replaced with lazy nanmin/nanmax/nanmean/nanstd.
3434
pathfinding,2026-04-15T12:00:00Z,SAFE,compute-bound,0,false-positive,Downgraded. CuPy .get() is required -- A* has no GPU kernel. Per-pixel .compute() is only 2 calls for start/goal validation. seg.values in multi_stop_search collects already-computed results for stitching.

.claude/sweep-style-state.csv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
module,last_inspected,issue,severity_max,categories_found,notes
22
aspect,2026-05-29,2683,MEDIUM,1,E402+E305 line 38: from xrspatial.geodesic import block sat below _geodesic_cuda_dims; moved up with top-of-file imports. E501 lines 219/263: wrapped two _run_gpu_geodesic_aspect kernel-launch calls (101/109 chars). Cat 4 isort reviewed but NOT applied: slope.py/curvature.py use one-import-per-line for xrspatial.utils so raw isort would make aspect inconsistent. Cat 2/3/5 grep clean. PR #2740. 82 aspect+geodesic tests pass.
3+
classify,2026-06-20,3402,HIGH,3;4,"F401 unused not_implemented_func import; isort ordering (cmath group, dataset_support before utils). No Cat 1/2/5. Fixed in rockout PR."
34
contour,2026-05-29,2698,HIGH,3,"F821 line 557: contours() return annotation ""gpd.GeoDataFrame"" referenced gpd not bound at module scope (only imported inside _to_geopandas). Fixed via TYPE_CHECKING-guarded import geopandas as gpd, matching polygonize.py. No runtime change; geopandas stays optional. isort clean. Cat 1/2/4/5 clean. 24 contour tests pass. PR open."
45
cost_distance,2026-06-15,3339,HIGH,1;3;4;5,"Cat 3 F401: removed unused 'from functools import partial' (L32, refactor leftover, not re-exported). Cat 5 mutable default target_values: list = [] was found here too, but the parallel api-consistency PR #3348 fixed it first (same None-sentinel normalization, merged to main); on reconciliation this PR's duplicate source change and its two target_values tests were dropped, so PR #3350 is now lint-only. Cat 1: E302 L63 (1->2 blank lines before @ngjit _heap_push; placed blanks between comment divider and decorator to satisfy both E302 and isort) + E127 x5 (_cost_distance_dask_cupy L461-462, _cost_distance_dask_iterative L1005-1007 signature continuation over-indent). Cat 4 isort: dataset_support before utils, utils from-import reflowed to 100 cols. Cat 2 grep clean. flake8+isort clean on module after fix; cost_distance tests pass (CUDA available). Pre-existing test-file lint (F401 L314/L504, E201 L449, isort drift) left untouched - out of module scope. PR open."
6+
dasymetric,2026-06-20,3411,HIGH,1;3;4,"F401 is_dask_cupy unused (HIGH); E128 line 323 continuation indent (MED); isort utils import block (MED). Fixed + flake8/isort clean, 61 tests pass. PR #3411. Issue-create was denied by auto-mode classifier (judged LOW); PR went through."
57
diffusion,2026-06-20,3421,HIGH,3;4,F401 unused not_implemented_func (HIGH); isort import-ordering drift (MEDIUM); no Cat1/2/5. Fixed in PR off issue 3421.
68
fire,2026-06-19,3395,HIGH,3;4,Cat3 F401 math.log unused + F841 S_T unused local; Cat4 isort utils import reorder. No Cat1/2/5. Fixed in #3395 rockout.
79
focal,2026-05-29,2731,HIGH,3;4;5,"F401 not_implemented_func (import line 36, unused, not re-exported). isort: stdlib reorder (import math before from-imports), dropped stray blank lines in import groups, alphabetised+rewrapped convolution/utils from-imports, moved dataset_support import into order. Cat 5: mutable default excludes=[np.nan] in mean() (line 238) -> None sentinel, resolved to [np.nan] in body; never mutated so behaviour preserved; regression test test_mean_default_excludes_does_not_leak added. Cat 1/2 clean. 115 focal tests pass. PR pending."

0 commit comments

Comments
 (0)