-
Notifications
You must be signed in to change notification settings - Fork 65
Sub-chunk slicing for uncompressed ManifestArrays (part of #86) #996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
TomNicholas
merged 7 commits into
zarr-developers:main
from
TomNicholas:subchunk_uncompressed_axis0
May 18, 2026
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
b3c5dce
feat: axis-0 sub-chunk slicing on uncompressed ManifestArrays
TomNicholas 39b1c6e
feat: generalize sub-chunk slicing to any largest-stride storage axis
TomNicholas db27141
refactor: extract _compute_sub_chunk_axis_selection helper
TomNicholas 14118c1
refactor: narrow _is_sub_chunk_slice's return type with TypeGuard
TomNicholas ae5c12f
test: end-to-end netCDF3 sub-chunk slice through icechunk
TomNicholas 20d4e65
test: tidy fixtures and decorators around the sub-chunk integration test
TomNicholas d18b331
Merge branch 'main' into subchunk_uncompressed_axis0
TomNicholas File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,15 +18,16 @@ | |
| ManifestStore, | ||
| ) | ||
| from virtualizarr.manifests.utils import create_v3_array_metadata | ||
| from virtualizarr.parsers import HDFParser, ZarrParser | ||
| from virtualizarr.parsers import HDFParser, NetCDF3Parser, ZarrParser | ||
| from virtualizarr.parsers.kerchunk.translator import manifestgroup_from_kerchunk_refs | ||
| from virtualizarr.tests import ( | ||
| has_fastparquet, | ||
| has_icechunk, | ||
| has_kerchunk, | ||
| requires_icechunk, | ||
| requires_kerchunk, | ||
| requires_network, | ||
| requires_zarr_python, | ||
| requires_scipy, | ||
| slow_test, | ||
| ) | ||
| from virtualizarr.tests.utils import PYTEST_TMP_DIRECTORY_URL_PREFIX | ||
|
|
@@ -179,7 +180,6 @@ def roundtrip_as_in_memory_icechunk( | |
| ) | ||
|
|
||
|
|
||
| @requires_zarr_python | ||
| @pytest.mark.parametrize( | ||
| "roundtrip_func", | ||
| [ | ||
|
|
@@ -551,3 +551,49 @@ def test_roundtrip_dataset_with_multiple_compressors(): | |
| ) as observed, | ||
| ): | ||
| xr.testing.assert_allclose(expected, observed) | ||
|
|
||
|
|
||
| @requires_scipy | ||
| @requires_icechunk | ||
| def test_subchunk_slice_netcdf3_through_icechunk_roundtrip( | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This integration test tests the actual use case |
||
| netcdf3_file, local_registry | ||
| ): | ||
| # End-to-end check of the uncompressed sub-chunk slicing path: parse a netCDF3 | ||
| # file (whose variables become single multi-row chunks), .isel within a chunk, | ||
| # write to icechunk, read back, and confirm bytes match a direct netCDF3 read + | ||
| # slice on the same file. | ||
| data = np.arange(32, dtype=np.float64).reshape(8, 4) | ||
| nc_path = netcdf3_file(xr.Dataset({"foo": (["time", "x"], data)})) | ||
| nc_url = f"file://{nc_path}" | ||
|
|
||
| with open_virtual_dataset( | ||
| url=nc_url, parser=NetCDF3Parser(), registry=local_registry | ||
| ) as vds: | ||
| # netCDF3 puts all rows in one source chunk; this slice is sub-chunk on axis 0. | ||
| sliced_vds = vds.isel(time=slice(1, 3)) | ||
|
|
||
| storage = icechunk.Storage.new_in_memory() | ||
| config = icechunk.RepositoryConfig.default() | ||
| container = icechunk.VirtualChunkContainer( | ||
| url_prefix=PYTEST_TMP_DIRECTORY_URL_PREFIX, | ||
| store=icechunk.local_filesystem_store(PYTEST_TMP_DIRECTORY_URL_PREFIX), | ||
| ) | ||
| config.set_virtual_chunk_container(container) | ||
| repo = icechunk.Repository.create( | ||
| storage=storage, | ||
| config=config, | ||
| authorize_virtual_chunk_access={PYTEST_TMP_DIRECTORY_URL_PREFIX: None}, | ||
| ) | ||
| session = repo.writable_session("main") | ||
| sliced_vds.vz.to_icechunk(session.store) | ||
| session.commit("sub-chunk slice") | ||
|
|
||
| read_session = repo.readonly_session("main") | ||
| with ( | ||
| xr.open_zarr( | ||
| read_session.store, zarr_format=3, consolidated=False | ||
| ) as roundtripped, | ||
| xr.open_dataset(nc_path) as direct, | ||
| ): | ||
| expected = direct.isel(time=slice(1, 3)) | ||
| xrt.assert_identical(roundtripped.load(), expected.load()) | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by cleanup - zarr is a required dep so this is outdated