Skip to content

Make binary() output float32 on all backends#3514

Merged
brendancol merged 2 commits into
xarray-contrib:mainfrom
brendancol:deep-sweep-accuracy-classify-2026-06-25-01
Jun 26, 2026
Merged

Make binary() output float32 on all backends#3514
brendancol merged 2 commits into
xarray-contrib:mainfrom
brendancol:deep-sweep-accuracy-classify-2026-06-25-01

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Summary

binary() returned the input dtype on the numpy and dask+numpy backends but always float32 on cupy and dask+cupy. This allocates float32 in _cpu_binary so the result dtype matches across all four backends, which is what the cupy path and every other classifier already do.

Why

_cpu_binary allocated np.empty(data.shape, dtype=data.dtype). For a float64 input the numpy and dask results were float64 while the cupy results were float32. Integer input was worse: the array cannot store the NaN that binary writes for non-finite cells.

Changes

  • _cpu_binary allocates a float32 output.
  • Added test_binary_output_dtype_float32 and test_binary_dask_output_dtype_float32.

Testing

  • pytest xrspatial/tests/test_classify.py passes (104 tests, numpy/dask/cupy/dask+cupy).

Closes #3511.

_cpu_binary allocated np.empty(..., dtype=data.dtype), so binary()
returned the input dtype on numpy/dask+numpy while cupy/dask+cupy always
returned float32. Allocate float32 in _cpu_binary so all four backends
agree and so the NaN sentinel for non-finite cells is representable on
integer input. Matches _cpu_bin and the other classifiers.

Add cross-backend dtype regression tests.
Address PR review: the existing binary backend tests called
general_output_checks without verify_dtype, so the cupy/dask+cupy output
dtype was unasserted. Pass verify_dtype=True (matching the reclassify
tests) to lock the float32 contract across all four backends.
@brendancol

Copy link
Copy Markdown
Contributor Author

PR Review: Make binary() output float32 on all backends

Blockers

None.

Suggestions

  • The four existing backend tests (test_binary_numpy, test_binary_dask_numpy, test_binary_cupy, test_binary_dask_cupy in xrspatial/tests/test_classify.py) call general_output_checks without verify_dtype=True. The two new tests assert float32 only on numpy and dask+numpy, so the cupy and dask+cupy output dtype is not checked anywhere. Passing verify_dtype=True to those four tests locks the float32 contract on all four backends without adding new test functions, which is what the reclassify tests already do.

Nits

  • The new comment says the input dtype "could not hold the NaN sentinel for integer input." That holds in principle, but an integer raster has no non-finite cells, so binary never actually wrote NaN into an int output. The cross-backend consistency point alone justifies the change.
  • test_binary_dask_output_dtype_float32 uses one chunk layout. Output dtype is chunk-independent here, so a second layout adds little.

What looks good

  • One-line change plus a comment. Values are unchanged, only the output dtype.
  • Brings numpy/dask in line with the cupy 'f4' allocation and with the docstring example, which already prints dtype=float32 (classify.py line 161).
  • Regression tests cover numpy (three input dtypes) and dask.
  • Full classify suite passes: 104 tests across numpy, dask, cupy, dask+cupy.

Duplicate

PR #3513 (metadata sweep) fixes the same binary() dtype bug. I am not merging or closing either here; flagging for maintainer reconciliation. This PR keeps its fix self-consistent.

Checklist

  • Classification logic untouched (dtype-only change)
  • Backends produce consistent dtype
  • NaN handling correct
  • Edge cases: integer/float32/float64 input covered
  • Dask chunk boundaries: dtype is chunk-independent
  • No premature materialization or copies
  • No benchmark needed (dtype-only)
  • Docstring documents float32

@brendancol brendancol merged commit 2d5c97b into xarray-contrib:main Jun 26, 2026
10 checks passed
brendancol added a commit to brendancol/xarray-spatial that referenced this pull request Jun 26, 2026
…array-contrib#3508)

The _cpu_binary float32 fix and the verify_dtype=True backend tests landed
on main via the duplicate accuracy-sweep PR xarray-contrib#3514. This change is the
remaining piece: _run_dask_cupy_binary passed meta=cupy.array(()), which
defaults to float64, so the lazy dask+cupy array advertised float64 while
_run_cupy_binary computes float32. The merged tests only check the computed
dtype (general_output_checks computes before asserting), so the mismatch
went unnoticed.

Pass meta=cupy.array((), dtype='f4') and assert the lazy dtype in
test_binary_dask_cupy. Same advertised-vs-computed class as aspect xarray-contrib#2682 and
focal xarray-contrib#3217. Also records the classify metadata-propagation sweep state.
brendancol added a commit that referenced this pull request Jun 26, 2026
…3508) (#3513)

The _cpu_binary float32 fix and the verify_dtype=True backend tests landed
on main via the duplicate accuracy-sweep PR #3514. This change is the
remaining piece: _run_dask_cupy_binary passed meta=cupy.array(()), which
defaults to float64, so the lazy dask+cupy array advertised float64 while
_run_cupy_binary computes float32. The merged tests only check the computed
dtype (general_output_checks computes before asserting), so the mismatch
went unnoticed.

Pass meta=cupy.array((), dtype='f4') and assert the lazy dtype in
test_binary_dask_cupy. Same advertised-vs-computed class as aspect #2682 and
focal #3217. Also records the classify metadata-propagation sweep state.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

binary() output dtype differs across backends (float64 on numpy/dask vs float32 on cupy)

1 participant