Skip to content

Fix cost_distance leaking dask task name as output .name#3349

Merged
brendancol merged 5 commits into
mainfrom
deep-sweep-metadata-cost_distance-2026-06-15
Jun 15, 2026
Merged

Fix cost_distance leaking dask task name as output .name#3349
brendancol merged 5 commits into
mainfrom
deep-sweep-metadata-cost_distance-2026-06-15

Conversation

@brendancol

Copy link
Copy Markdown
Contributor

Closes #3344

cost_distance returned a DataArray whose .name depended on the backend: numpy and cupy gave name=None, but dask+numpy and dask+cupy leaked the internal dask graph name (_trim-..., asarray-...) as the user-visible .name. That token is random per run, so .to_dataset() and any name-keyed logic became nondeterministic on the dask paths.

  • Force the result name to the input raster's name with .rename(raster.name). Passing name= to the constructor isn't enough because xarray treats name=None as "infer from data" and lets the dask token through.
  • Add a parametrized regression test over all four backends and both named and unnamed input.

Backends: numpy, cupy, dask+numpy, dask+cupy all verified (CUDA host).

Test plan:

  • test_result_name_matches_input passes on all four backends (8 cases, no skips on this GPU host)
  • Full test_cost_distance.py suite passes (63 passed)

The dask+numpy and dask+cupy backends surfaced the internal dask graph
name (e.g. _trim-... from map_overlap, asarray-... from dask+cupy) as
the returned DataArray .name, while numpy and cupy returned name=None.
Force the result name to the input raster's name with .rename() so all
four backends agree. Passing name= to the constructor is insufficient
because xarray treats name=None as 'infer from data'.

Adds a parametrized regression test across all four backends and both
named and unnamed inputs.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 15, 2026

@brendancol brendancol left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Fix cost_distance leaking dask task name as output .name

Blockers

None.

Suggestions

None.

Nits

None.

What looks good

  • The fix is minimal and correct. .rename(raster.name) is the right tool: I confirmed that passing name=None to the xr.DataArray constructor does not override a dask array's internal name, but .rename(None) does force it, so the constructor-only approach would not have fixed the unnamed-input case.
  • Only .name changes; data, coords, dims, and attrs are untouched.
  • The regression test runs all four backends and both named and unnamed inputs (8 cases). On this CUDA host none of them skip, so the cupy and dask+cupy paths are genuinely exercised.
  • The supports_dataset path is unaffected: a Dataset input calls the function per variable with raster.name already equal to the variable name, so the rename keeps the variable key. Verified by running a Dataset through the dask backend.
  • The inline comment explains the non-obvious reason .rename is needed instead of a constructor name=, which will save the next reader a debugging session.

Checklist

  • Behavior matches intent (name consistent across backends)
  • All four backends produce consistent .name
  • No data/coords/dims/attrs regression
  • Edge cases covered (named + unnamed input, all backends)
  • No premature materialization or extra copies (rename is metadata-only)
  • Benchmark not needed (metadata-only fix)
  • README / docs not applicable (no API change)

@brendancol brendancol merged commit 5a5c298 into main Jun 15, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cost_distance leaks dask task name as output .name on dask backends

1 participant