Fill pack=True nodata holes at the integer dtype's native width#3277
Merged
Conversation
brendancol
commented
Jun 12, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
PR Review: Fill pack=True nodata holes at the integer dtype's native width
Blockers (must fix before merge)
None found.
Suggestions (should fix, not blocking)
- No test covers the pre-v5 fallback (
_attrs.py:1857-1867: integer-typedattrs['nodata'], nomask_and_scale_dtype) combined with a >2**53 sentinel. That path derives the target dtype from the sentinel and flows through the new_pack_restore_int; I verified by hand that it restores INT64_MAX exactly on numpy and dask, but nothing pins it. A small test intest_pack_64bit_sentinel_3264.pybuilding the DataArray attrs directly would close the gap.
Nits (optional improvements)
-
_pack_fill_nan(_attrs.py:1712) now only serves float pack targets since integer targets route through_pack_restore_int, but its docstring still reads as the general fill. One sentence pointing at the split would save the next reader a trip through_pack. - Behavior note, not a defect: with hand-edited attrs carrying a sentinel outside the target dtype's range,
out[mask] = int(fill)raises OverflowError from numpy instead of the old silent wrap. Fail-closed is the better behavior, but the error comes from numpy's assignment rather than a pack-specific message. Fine to leave as-is; flagging so it's a known tradeoff.
What looks good
- The fill now matches the eager and GPU writers (
arr.dtype.type(nodata)/ Python int assignment), so all three sentinel-restore sites agree on native-width semantics. - Dask stays lazy: the fill and cast ride in
map_blockswith correctdtype/meta, so #3235's single-compute behavior is preserved. No.compute()or.valuesadded anywhere. - The kwarg range check comparing as integers fixes the INT64_MAX rejection without loosening anything: fractional, out-of-range, NaN, and inf are all still refused, and the test pins each one.
- Tests escalate RuntimeWarning around the pack write, so the old failure mode (numpy's "invalid value encountered in cast" as the only signal) cannot regress silently.
- All four backends covered for the round trip, plus cupy unit coverage of the new helper.
Checklist
- Algorithm matches reference (native-width fill, mirrors #3098's approach on the read side)
- All implemented backends produce consistent results (numpy, dask, gpu, dask+gpu legs in the new test)
- NaN handling is correct (holes parked at zero only for the cast, then overwritten exactly)
- Edge cases are covered by tests (INT64_MAX, UINT64_MAX, kwarg accept/reject, helper units)
- Dask chunk boundaries handled correctly (per-chunk fill, no cross-chunk state)
- No premature materialization or unnecessary copies
- Benchmark exists or is not needed (bug fix, no perf-relevant change)
- README feature matrix updated (not applicable)
- Docstrings present and accurate (see nit on
_pack_fill_nanscope)
…_fill_nan docstring (#3264)
brendancol
commented
Jun 12, 2026
brendancol
left a comment
Contributor
Author
There was a problem hiding this comment.
Follow-up review (d6bae2b)
Both actionable findings from the first pass are addressed:
- The pre-v5 fallback now has coverage:
test_pack_pre_v5_fallback_64bit_sentinelbuilds the attrs shape directly (integer-typednodata, nomask_and_scale_dtype), runs numpy and dask legs, computes the dask result inside the RuntimeWarning escalation so the chunk execution is covered, and checks both the exact INT64_MAX hole and the untouched valid pixels. _pack_fill_nan's docstring now says it only serves float targets and points at_pack_restore_intfor the integer path.
The third item (OverflowError instead of silent wrap for hand-edited out-of-range attrs sentinels) was a behavior note, left as-is by design: fail-closed beats the old silent wrap, and the legitimate read path can't reach it.
No new findings. Nothing blocking.
…eotiff-2026-06-12 # Conflicts: # xrspatial/geotiff/_attrs.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #3264.
_packfilled NaN holes on the float64 buffer (out[isnan] = float(fill)) and cast to the recorded integer dtype afterwards. float64 can't hold integers above 2**53 exactly, so an int64 file with GDAL_NODATA = INT64_MAX came back from an unpack -> pack round trip with its holes set to INT64_MIN (uint64/UINT64_MAX holes came back as 0) while the tag still declared the original sentinel. Amasked=Truere-read then returned the holes as valid pixels._pack_restore_intnow does the fill, round, and cast in one chunk-level step: park the holes at zero for the cast, then assign the sentinel as a Python int, which is exact at any width. This matches how the eager and GPU writers already fill (dtype.type(nodata)).nodata=kwarg range check compared throughfloat(), which rejectednodata=INT64_MAXas out of range even though it fits int64. It now compares as integers and still rejects fractional, out-of-range, and non-finite values._pack_fill_nanpath, unchanged.Backend coverage: the fix is in shared
_packcode; round trips verified on numpy, dask, gpu, and dask+gpu, plus unit tests for the new helper on numpy and cupy chunks.Test plan:
xrspatial/geotiff/tests/write/test_pack_64bit_sentinel_3264.py(new; int64/uint64 round trips on all four backends, kwarg accepts INT64_MAX, kwarg still rejects unrepresentable values, helper unit tests; pack writes run with RuntimeWarning escalated so the old failure mode can't pass silently)Also includes the security-sweep state CSV update for the geotiff module.