Skip to content

binary(): numpy/dask return input dtype while cupy returns float32 #3508

Description

@brendancol

binary() returns a different output dtype depending on the array backend. The numpy and dask+numpy paths return the input dtype; the cupy and dask+cupy paths always return float32. Switching backends in a pipeline silently changes the container dtype for the same values.

Reproduce

import numpy as np, xarray as xr, dask.array as da, cupy
from xrspatial.classify import binary

data = np.array([[np.nan, 1., 2.], [3., 4., np.inf]], dtype=np.float64)

print(binary(xr.DataArray(data), [1, 2]).dtype)                       # float64
print(binary(xr.DataArray(da.from_array(data)), [1, 2]).dtype)        # float64
print(binary(xr.DataArray(cupy.asarray(data)), [1, 2]).data.dtype)    # float32

Cause

_cpu_binary allocates its output with dtype=data.dtype, while _run_cupy_binary uses dtype='f4'.

Why it matters

The docstring example shows float32 output, and every other classifier in classify.py (reclassify, quantile, natural_breaks, and so on) emits float32 on all four backends. binary is the only one that doesn't. For integer input the numpy path returns an integer dtype, which can't hold the NaN sentinel the docstring promises for non-finite cells.

Fix

Allocate the numpy output as float32 so the four backends agree and the NaN sentinel is always representable.

Found by the metadata-propagation sweep on classify.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions