test: add Dask chunk grid benchmark scaffold#2465
Draft
ehsanestaji wants to merge 2 commits into
Draft
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
3 tasks
for more information, see https://pre-commit.ci
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2465 +/- ##
=======================================
Coverage 85.60% 85.60%
=======================================
Files 49 49
Lines 7671 7671
=======================================
Hits 6567 6567
Misses 1104 1104 |
Author
|
I updated the title to a semantic PR title. I don’t seem to have permission to add labels on this repo, but the current validation failures look like triage metadata rather than code failures. Could a maintainer please add the appropriate labels, likely |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This adds an exploratory benchmark scaffold for #2036 so we can compare virtual Dask chunk choices against HDF5/Zarr on-disk chunk layouts before changing AnnData defaults.
The benchmark runner:
Xarrays with controlled HDF5/Zarr chunks and optional Zarr v3 shardsXlazily throughanndata.experimental.read_elem_lazyscanpy_normalize_log1pworkloadThis also adds a small notebook for summarizing the generated CSV and README instructions for smoke/larger-grid runs. Generated benchmark outputs are ignored under
benchmarks/results.Local signal
A modest local grid (
3000x800, HDF5/Zarr, on-disk chunks250x800and1000x800, default vs1000x-1, 1/2 workers,sum_axis0andscanpy_normalize_log1p) produced 32 rows. For small on-disk chunks (250x800),1000x-1reduced task counts and improved timings in the 1-worker Scanpy-style case by about 1.16x for HDF5 and 1.26x for Zarr.These numbers are only an initial local smoke signal; the intent is to make the benchmark/review path available before proposing default behavior changes.
Checks
ruff check benchmarks/scripts/dask_chunk_grid.py tests/test_dask_chunk_grid_script.py.venv/bin/python -m pytest tests/test_dask_chunk_grid_script.py -qpython3 -m json.tool benchmarks/notebooks/dask_chunk_grid_analysis.ipynbgit diff --check