from_template: lazy dask grids skip the cell cap when chunks is supplied#3556
Merged
Conversation
…3554) Supplying chunks now promotes an eager backend to its dask variant (numpy->dask+numpy, cupy->dask+cupy) and returns a lazy grid. The 500M-cell cap applies only to the eager backends, which materialize the whole array; a dask grid is a lazy chunk graph, so the cap is skipped. The eager cap is unchanged.
brendancol
commented
Jun 27, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review: lazy dask grids skip the cell cap when chunks is supplied
Blockers
None.
Suggestions
- Dropping the cap for dask leaves that path with no guard at all, and the new footgun is chunk count rather than memory. With a fixed
chunks=512, the old 500M-cell cap also bounded the task graph at ~1900 chunks. An over-fine resolution now builds an unbounded graph:from_template('conus', resolution=1, chunks=512)is ~50 million chunks, and graph construction alone hangs the client even though no data is materialized. The lazy grid is cheap; building tens of millions of tasks is not. Worth a sanity check on chunk count (not cells), or at least a docstring note that an over-fine resolution on a dask backend can build a huge task graph. (templates.py:390-402)
Nits
- The Examples block in the
from_templatedocstring does not show the chunks/lazy path. A one-line example would make the new behavior discoverable. (templates.py Examples)
What looks good
- Minimal and surgical. The cap is now backend-aware, the stale comment that claimed it applied to dask is fixed, and the error message points users at
chunks=. - Tests cover the three new paths (numpy promotion, explicit dask without chunks, cupy promotion) plus the eager cap staying intact. The over-cap region is picked to keep the chunk count small so the tests run fast.
Checklist
- Algorithm matches reference/paper: n/a (grid construction)
- All implemented backends consistent: yes, promotion verified for numpy and cupy
- NaN handling correct: yes, fill unchanged
- Edge cases covered: eager cap retained; lazy paths tested
- Dask chunk boundaries: n/a (empty grid)
- No premature materialization: yes, da.full stays lazy
- Benchmark: not needed
- README matrix: not needed (no backend support change)
- Docstrings present and accurate: yes
brendancol
commented
Jun 27, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
Follow-up review (after docstring commit)
Disposition of the first-pass findings:
- Nit (docstring example): fixed in 587dfe2 -- added an Examples entry for the lazy chunked path.
- Suggestion (dask task-graph footgun): split. The documentation half is fixed in 587dfe2 (the
chunksparam now warns that a very fine resolution builds a large task graph and suggests coarser resolution or larger chunks). The actual chunk-count guard is a separate design decision (threshold,chunks='auto'handling, raise vs warn) and reintroduces a gate the issue asked to remove for dask, so it is deferred to #3557.
The follow-up commit is docstring-only; no logic change. The new Examples line evaluates (type(agg.data).__name__ == 'Array') and the cap tests still pass.
Remaining open item is the deferred guard (#3557). Nothing blocking on this PR.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3554
Supplying
chunksnow gives back a lazy dask grid instead of silently dropping the chunk spec, and the 500M-cell cap only guards the eager backends.chunksis passed, an eager backend is promoted to its dask variant (numpytodask+numpy,cupytodask+cupy), so the result is lazy and chunked.da.fullchunk graph, so it skips the cap. The eager cap is unchanged:from_template('conus', resolution=1)on numpy still raises.chunksdefault changes from'auto'toNoneas a "not supplied" sentinel; the dask paths still use'auto'when it is omitted.Backend coverage: numpy and cupy stay eager and capped; dask+numpy and dask+cupy are lazy and uncapped. The promotion paths exercise both dask variants.
Test plan:
from_template('new_england', chunks=512, resolution=10)returns a lazy dask array with chunksize (512, 512) and no cap errorbackend='dask+numpy'over the cap is lazy without chunkschunkspromotes cupy to dask+cupy on GPUfrom_template('conus', resolution=1)still raises