Skip to content

Fill pack=True nodata holes at the integer dtype's native width#3277

Merged
brendancol merged 4 commits into
mainfrom
deep-sweep-security-geotiff-2026-06-12
Jun 12, 2026
Merged

Fill pack=True nodata holes at the integer dtype's native width#3277
brendancol merged 4 commits into
mainfrom
deep-sweep-security-geotiff-2026-06-12

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3264.

_pack filled NaN holes on the float64 buffer (out[isnan] = float(fill)) and cast to the recorded integer dtype afterwards. float64 can't hold integers above 2**53 exactly, so an int64 file with GDAL_NODATA = INT64_MAX came back from an unpack -> pack round trip with its holes set to INT64_MIN (uint64/UINT64_MAX holes came back as 0) while the tag still declared the original sentinel. A masked=True re-read then returned the holes as valid pixels.

  • _pack_restore_int now does the fill, round, and cast in one chunk-level step: park the holes at zero for the cast, then assign the sentinel as a Python int, which is exact at any width. This matches how the eager and GPU writers already fill (dtype.type(nodata)).
  • The nodata= kwarg range check compared through float(), which rejected nodata=INT64_MAX as out of range even though it fits int64. It now compares as integers and still rejects fractional, out-of-range, and non-finite values.
  • Float pack targets keep the existing _pack_fill_nan path, unchanged.

Backend coverage: the fix is in shared _pack code; round trips verified on numpy, dask, gpu, and dask+gpu, plus unit tests for the new helper on numpy and cupy chunks.

Test plan:

  • xrspatial/geotiff/tests/write/test_pack_64bit_sentinel_3264.py (new; int64/uint64 round trips on all four backends, kwarg accepts INT64_MAX, kwarg still rejects unrepresentable values, helper unit tests; pack writes run with RuntimeWarning escalated so the old failure mode can't pass silently)
  • full geotiff suite: 6415 passed, 68 skipped, 1 xfailed

Also includes the security-sweep state CSV update for the geotiff module.

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 12, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Fill pack=True nodata holes at the integer dtype's native width

Blockers (must fix before merge)

None found.

Suggestions (should fix, not blocking)

  • No test covers the pre-v5 fallback (_attrs.py:1857-1867: integer-typed attrs['nodata'], no mask_and_scale_dtype) combined with a >2**53 sentinel. That path derives the target dtype from the sentinel and flows through the new _pack_restore_int; I verified by hand that it restores INT64_MAX exactly on numpy and dask, but nothing pins it. A small test in test_pack_64bit_sentinel_3264.py building the DataArray attrs directly would close the gap.

Nits (optional improvements)

  • _pack_fill_nan (_attrs.py:1712) now only serves float pack targets since integer targets route through _pack_restore_int, but its docstring still reads as the general fill. One sentence pointing at the split would save the next reader a trip through _pack.
  • Behavior note, not a defect: with hand-edited attrs carrying a sentinel outside the target dtype's range, out[mask] = int(fill) raises OverflowError from numpy instead of the old silent wrap. Fail-closed is the better behavior, but the error comes from numpy's assignment rather than a pack-specific message. Fine to leave as-is; flagging so it's a known tradeoff.

What looks good

  • The fill now matches the eager and GPU writers (arr.dtype.type(nodata) / Python int assignment), so all three sentinel-restore sites agree on native-width semantics.
  • Dask stays lazy: the fill and cast ride in map_blocks with correct dtype/meta, so #3235's single-compute behavior is preserved. No .compute() or .values added anywhere.
  • The kwarg range check comparing as integers fixes the INT64_MAX rejection without loosening anything: fractional, out-of-range, NaN, and inf are all still refused, and the test pins each one.
  • Tests escalate RuntimeWarning around the pack write, so the old failure mode (numpy's "invalid value encountered in cast" as the only signal) cannot regress silently.
  • All four backends covered for the round trip, plus cupy unit coverage of the new helper.

Checklist

  • Algorithm matches reference (native-width fill, mirrors #3098's approach on the read side)
  • All implemented backends produce consistent results (numpy, dask, gpu, dask+gpu legs in the new test)
  • NaN handling is correct (holes parked at zero only for the cast, then overwritten exactly)
  • Edge cases are covered by tests (INT64_MAX, UINT64_MAX, kwarg accept/reject, helper units)
  • Dask chunk boundaries handled correctly (per-chunk fill, no cross-chunk state)
  • No premature materialization or unnecessary copies
  • Benchmark exists or is not needed (bug fix, no perf-relevant change)
  • README feature matrix updated (not applicable)
  • Docstrings present and accurate (see nit on _pack_fill_nan scope)

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (d6bae2b)

Both actionable findings from the first pass are addressed:

  • The pre-v5 fallback now has coverage: test_pack_pre_v5_fallback_64bit_sentinel builds the attrs shape directly (integer-typed nodata, no mask_and_scale_dtype), runs numpy and dask legs, computes the dask result inside the RuntimeWarning escalation so the chunk execution is covered, and checks both the exact INT64_MAX hole and the untouched valid pixels.
  • _pack_fill_nan's docstring now says it only serves float targets and points at _pack_restore_int for the integer path.

The third item (OverflowError instead of silent wrap for hand-edited out-of-range attrs sentinels) was a behavior note, left as-is by design: fail-closed beats the old silent wrap, and the legitimate read path can't reach it.

No new findings. Nothing blocking.

…eotiff-2026-06-12

# Conflicts:
#	xrspatial/geotiff/_attrs.py
@brendancol brendancol merged commit c8aea14 into main Jun 12, 2026
7 checks passed
@brendancol brendancol deleted the deep-sweep-security-geotiff-2026-06-12 branch June 25, 2026 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

to_geotiff(pack=True) corrupts 64-bit nodata sentinels above 2**53

1 participant