Skip to content

flood: make mannings_n DataArray validation lazy-safe (#3503)#3507

Merged
brendancol merged 3 commits into
mainfrom
deep-sweep-performance-flood-2026-06-25-01
Jun 26, 2026
Merged

flood: make mannings_n DataArray validation lazy-safe (#3503)#3507
brendancol merged 3 commits into
mainfrom
deep-sweep-performance-flood-2026-06-25-01

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Fixes #3503.

Problem

_validate_mannings_n_dataarray in xrspatial/flood.py validated the roughness raster via np.asarray(mannings_n.values). .values materializes the whole array into host memory: a full graph compute for dask, a full device-to-host copy for cupy. This happened during validation, before the lazy result graph was built, so a large mannings_n raster OOMed the client and defeated lazy evaluation in travel_time() and flood_depth_vegetation().

A 2048x2048 dask mannings_n (chunks 512) with a per-block tripwire showed all 16 chunks computed during the travel_time(...) call alone, before any result graph existed.

Fix

Validate without .values:

  • dask-backed: skip the eager value check (values cannot be inspected without computing the graph); invalid roughness surfaces as inf/NaN on the lazy path, consistent with the non-DataArray behavior.
  • cupy-backed: reduce on device (cp.isfinite(data).all(), (data > 0).all()), transferring only a scalar bool instead of the whole array.
  • numpy: unchanged.

Tests

  • Existing numpy/cupy/dask/dask+cupy parity and validation tests still pass (90 passed locally, CUDA available).
  • New regression tests assert a tripwired dask mannings_n is not materialized during travel_time / flood_depth_vegetation validation.
  • Verified the cupy branch end-to-end on GPU: valid roughness stays on device, zero/NaN rejected.

Performance context

  • OOM verdict: RISKY before fix (validation materialized large dask/cupy roughness rasters), SAFE after.
  • Bottleneck: compute-bound.
  • Affected backends: dask+numpy, dask+cupy, cupy. numpy unaffected.
  • Found by the performance sweep (Cat 1: dask materialization via .values).

_validate_mannings_n_dataarray called np.asarray(mannings_n.values),
which materializes the whole roughness raster into host memory: a full
graph compute for dask and a full device->host copy for cupy. This ran
during validation, before any lazy graph was built, so a large mannings_n
raster OOMed the client and defeated lazy evaluation in travel_time() and
flood_depth_vegetation().

Validate without .values:
- dask-backed: skip the eager value check (cannot inspect values without
  computing the graph); invalid roughness surfaces as inf/NaN on the lazy
  path, same as before for non-DataArray inputs.
- cupy-backed: reduce on device (cp.isfinite / >0 .all()), transferring
  only a scalar bool.
- numpy: unchanged.

Adds regression tests asserting a tripwired dask mannings_n is not
materialized during travel_time/flood_depth_vegetation validation.
…e-flood-2026-06-25-01

# Conflicts:
#	xrspatial/tests/test_flood.py
@brendancol brendancol merged commit 85575e4 into main Jun 26, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

flood: mannings_n DataArray validation materializes whole dask raster (.values OOM)

1 participant