test: add Dask chunk grid benchmark scaffold by ehsanestaji · Pull Request #2465 · scverse/anndata

ehsanestaji · 2026-05-21T11:11:05Z

Summary

This adds an exploratory benchmark scaffold for #2036 so we can compare virtual Dask chunk choices against HDF5/Zarr on-disk chunk layouts before changing AnnData defaults.

The benchmark runner:

creates dense X arrays with controlled HDF5/Zarr chunks and optional Zarr v3 shards
reads X lazily through anndata.experimental.read_elem_lazy
varies virtual Dask chunks, worker counts, and thread/process settings
records runtime/package metadata, store size, task count, elapsed time, result shape/size, and coarse process/worker RSS readings
includes array-level workloads and a Scanpy-style scanpy_normalize_log1p workload

This also adds a small notebook for summarizing the generated CSV and README instructions for smoke/larger-grid runs. Generated benchmark outputs are ignored under benchmarks/results.

Local signal

A modest local grid (3000x800, HDF5/Zarr, on-disk chunks 250x800 and 1000x800, default vs 1000x-1, 1/2 workers, sum_axis0 and scanpy_normalize_log1p) produced 32 rows. For small on-disk chunks (250x800), 1000x-1 reduced task counts and improved timings in the 1-worker Scanpy-style case by about 1.16x for HDF5 and 1.26x for Zarr.

These numbers are only an initial local smoke signal; the intent is to make the benchmark/review path available before proposing default behavior changes.

Checks

ruff check benchmarks/scripts/dask_chunk_grid.py tests/test_dask_chunk_grid_script.py
.venv/bin/python -m pytest tests/test_dask_chunk_grid_script.py -q
python3 -m json.tool benchmarks/notebooks/dask_chunk_grid_analysis.ipynb
git diff --check
tiny end-to-end benchmark smoke run wrote 8 CSV result rows

review-notebook-app · 2026-05-21T11:11:11Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

for more information, see https://pre-commit.ci

codecov · 2026-05-21T11:12:46Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.60%. Comparing base (829abb6) to head (95bf7a1).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #2465   +/-   ##
=======================================
  Coverage   85.60%   85.60%           
=======================================
  Files          49       49           
  Lines        7671     7671           
=======================================
  Hits         6567     6567           
  Misses       1104     1104

ehsanestaji · 2026-05-21T11:13:19Z

I updated the title to a semantic PR title. I don’t seem to have permission to add labels on this repo, but the current validation failures look like triage metadata rather than code failures. Could a maintainer please add the appropriate labels, likely no milestone and skip-gpu-ci? benchmark, type: dask array, and performance 🐌 may also fit this benchmark-only PR.

benchmarks: add dask chunk grid exploration

8f71d0e

ehsanestaji mentioned this pull request May 21, 2026

feat: sensible chunk defaults for dask #2036

Open

3 tasks

[pre-commit.ci] auto fixes from pre-commit.com hooks

95bf7a1

for more information, see https://pre-commit.ci

ehsanestaji changed the title ~~Benchmarks for Dask chunk default tradeoffs~~ test: add Dask chunk grid benchmark scaffold May 21, 2026

Zethson added this to the 0.12.17 milestone May 21, 2026

Zethson added performance 🐌 skip-gpu-ci type: dask array labels May 21, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: add Dask chunk grid benchmark scaffold#2465

test: add Dask chunk grid benchmark scaffold#2465
ehsanestaji wants to merge 2 commits into
scverse:mainfrom
ehsanestaji:explore/anndata-2036-dask-chunk-defaults

ehsanestaji commented May 21, 2026

Uh oh!

review-notebook-app Bot commented May 21, 2026

Uh oh!

codecov Bot commented May 21, 2026 •

edited

Loading

Uh oh!

ehsanestaji commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ehsanestaji commented May 21, 2026

Summary

Local signal

Checks

Uh oh!

review-notebook-app Bot commented May 21, 2026

Uh oh!

codecov Bot commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ehsanestaji commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 21, 2026 •

edited

Loading