Skip to content

Fix perlin dask OOM by dropping redundant dask.persist (#3469)#3473

Merged
brendancol merged 4 commits into
mainfrom
deep-sweep-performance-perlin-2026-06-23
Jun 23, 2026
Merged

Fix perlin dask OOM by dropping redundant dask.persist (#3469)#3473
brendancol merged 4 commits into
mainfrom
deep-sweep-performance-perlin-2026-06-23

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3469

What changed

  • Removed dask.persist(data) from _perlin_dask_numpy and _perlin_dask_cupy. Both paths now reduce straight off the lazy noise array.
  • The min/ptp (and min/max on GPU) reductions already go out in one dask.compute call, which shares the noise subgraph between them, so each chunk is computed once and freed after both reductions read it. The persist was redundant and forced every chunk resident at once, which OOMs on large inputs.

Backends

numpy and cupy single-GPU paths are untouched. dask+numpy and dask+cupy lose the persist; output is unchanged.

Test plan

  • xrspatial/tests/test_perlin.py passes, including the cupy and dask+cupy GPU tests on a CUDA host (13 passed).
  • Added test_perlin_dask_does_not_persist_whole_array, which monkeypatches dask.persist to raise and confirms the dask path no longer calls it while still matching the numpy reference.

The dask+numpy and dask+cupy paths persisted the full noise array before
normalizing, forcing every chunk resident at once and OOMing on large
inputs. The min/ptp (min/max) reductions already share the noise subgraph
within a single dask.compute call, so the persist is redundant. Remove it
so peak memory stays bounded by chunk size.

Adds a regression test that fails if dask.persist is reintroduced.

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Fix perlin dask OOM by dropping redundant dask.persist (#3469)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • The fix swaps memory for compute, and that swap is worth spelling out in the PR so nobody "optimizes" the persist back in later. Without it, the noise subgraph runs twice: once for the min/ptp reductions in dask.compute, then again when the caller materializes the returned normalized array. For an OOM fix that's the right call (the noise is point-wise and deterministic, so the recompute is exact and cheap), but the code comment only covers the memory side. A half-sentence noting the recompute is intentional would save the next reader the trace.

Nits (optional improvements)

  • test_perlin_dask_does_not_persist_whole_array only covers the dask+numpy path. The dask+cupy path got the same edit but no persist guard. Folding a monkeypatch.setattr(dask, "persist", ...) assertion into test_perlin_dask_gpu would pin down both. Not blocking, since the edit is symmetric and the GPU path isn't always available in CI.

What looks good

  • Symmetric across both dask backends; numpy and cupy single-GPU are untouched.
  • The premise checks out: the min and ptp graphs reference the same named noise keys, so one dask.compute over both shares the subgraph and the persist really is dead weight.
  • Output is unchanged. The noise is deterministic, so recomputing yields identical values, and the regression test confirms the dask result still matches the numpy reference.
  • The regression test fails loudly if dask.persist comes back, which is the right guard for this class of fix.

Checklist

  • Algorithm matches reference: not affected (normalization unchanged)
  • All implemented backends consistent: yes, dask+numpy and dask+cupy tests pass
  • NaN handling correct: not affected
  • Edge cases covered: existing coverage retained
  • Dask chunk boundaries: perlin is point-wise, no overlap
  • No premature materialization: improved (persist removed)
  • Benchmark exists or not needed: benchmarks/benchmarks/perlin.py exists
  • README feature matrix: not applicable, no API change
  • Docstrings: unchanged

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after 67fa8d2)

Both findings from the first pass are addressed:

  • Suggestion (recompute not documented): the comment in both dask paths now states the returned lazy array recomputes the noise on materialization and that the extra pass is intentional, since the noise is point-wise and deterministic.
  • Nit (dask+cupy not guarded): test_perlin_dask_gpu now monkeypatches dask.persist to raise, so the GPU path is covered by the same guard as the CPU path.

13 perlin tests pass locally, including the cupy and dask+cupy GPU tests on a CUDA host. No new findings.

@brendancol brendancol merged commit 124fc27 into main Jun 23, 2026
10 checks passed
@brendancol brendancol deleted the deep-sweep-performance-perlin-2026-06-23 branch June 25, 2026 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perlin dask backends OOM: dask.persist materializes whole noise array before normalizing

1 participant