zarr-developers · TomNicholas · May 18, 2026 · May 18, 2026 · May 18, 2026 · May 18, 2026
diff --git a/docs/data_structures.md b/docs/data_structures.md
@@ -244,8 +244,9 @@ NotImplementedError: ManifestArrays can't be converted into numpy arrays or pand
 The whole point is to manipulate references to the data without actually loading any data.
 
 !!! note
-    You also cannot currently index into a `ManifestArray`, as arbitrary indexing would require loading data values to create the new array.
-    We could imagine supporting indexing without loading data when slicing only along chunk boundaries, but this has not yet been implemented (see [GH issue #51](https://github.com/zarr-developers/VirtualiZarr/issues/51)).
+    You can index into a `ManifestArray` as long as the selection aligns with chunk boundaries — slicing through the interior of a chunk would require loading the chunk's bytes, which a virtual array deliberately cannot do.
+    Chunk-aligned integer and slice indexing is supported, including mixed integer + slice indexers; integer indexers drop the indexed axis as in numpy. Misaligned selections raise `SubChunkIndexingError`.
+    Arbitrary fancy indexing (e.g. with a boolean mask or integer array) is not supported, since it would generally require loading data.
 
 ## Zarr Groups
 

diff --git a/docs/faq.md b/docs/faq.md
@@ -174,7 +174,7 @@ Users of Kerchunk may find the following comparison table useful, which shows wh
 | Renaming dimensions              | ❌                                                                                                                                  | `xarray.Dataset.rename_dims`                                                                                                                          |
 | Renaming manifest file paths | `kerchunk.utils.rename_target`                                                                                                                                  | `vds.vz.rename_paths`                                                                                                                          |
 | Splitting uncompressed data into chunks | `kerchunk.utils.subchunk`                                                                                                                                  | `xarray.Dataset.chunk` (❌ Not yet implemented - see [PR #199](https://github.com/zarr-developers/VirtualiZarr/pull/199))
-| Selecting specific chunks | ❌                                                                                                                                  | `xarray.Dataset.isel` (❌ Not yet implemented - see [issue #51](https://github.com/zarr-developers/VirtualiZarr/issues/51))                                                                                                                          |
+| Selecting specific chunks | ❌                                                                                                                                  | `xarray.Dataset.isel` (✅ chunk-aligned selections only)                                                                                                                          |
 **Parallelization**                                                      |                                                                                                                                     |                                                                                                                                                  |
 | Parallelized generation of references                                    | Wrapping kerchunk's opener inside `dask.delayed`                                                                                    | Wrapping `open_virtual_dataset` inside `dask.delayed`
 | Parallelized combining of references (tree-reduce)                       | `kerchunk.combine.auto_dask`                                                                                                        | Wrapping `ManifestArray` objects within `dask.array.Array` objects inside `xarray.Dataset` to use dask's `concatenate` (⚠️ Untested, but also unnecessary)                         |

diff --git a/docs/releases.md b/docs/releases.md
@@ -1,5 +1,18 @@
 # Release notes
 
+## Unreleased
+
+### New Features
+
+- `ManifestArray` now supports chunk-aligned integer and slice indexing along each axis, including multi-chunk slices, mixed integer + slice indexers, and selections that include a partial final chunk. Integer indexers drop the indexed axis (numpy / array-API semantics) and are legal only when `chunk_size == 1` along that axis; slice indexers preserve the axis. This makes `xarray.Dataset.isel` work end-to-end on virtual datasets for any chunk-aligned selection. Indexers that would split individual chunks raise a new `SubChunkIndexingError` (a `ValueError` subclass) — a permanent constraint of a virtual array, not a missing feature. Previously slice misalignment silently no-op'd while integer indexing unconditionally raised `NotImplementedError`. Closes [#51](https://github.com/zarr-developers/VirtualiZarr/issues/51), supersedes [#499](https://github.com/zarr-developers/VirtualiZarr/pull/499).
+  By [Tom Nicholas](https://github.com/TomNicholas).
+
+### Bug fixes
+
+### Documentation
+
+### Internal changes
+
 ## v2.6.1 (3rd May 2026)
 
 Adds end-to-end support for inlined chunk references in `ChunkManifest` (read via Kerchunk parsers, write via Kerchunk and Icechunk writers), plus Zarr-Python 3.2.0 compatibility and several bug fixes.

diff --git a/docs/scaling.md b/docs/scaling.md
@@ -308,6 +308,37 @@ for i, batch in enumerate(file_batches):
 
 Notice this workflow could also be used for appending data only as it becomes available, e.g. by replacing the for loop with a cron job.
 
+### Splitting a single large virtual dataset across commits
+
+A single Icechunk commit cannot include more than 50 million chunk references at once.
+If a single source — typically a massive Zarr store opened via [`ZarrParser`][virtualizarr.parsers.ZarrParser] — produces a virtual dataset whose arrays together exceed that, you can't write it in one transaction even after all the references are already in memory.
+
+In that case you can slice the virtual dataset along an axis where the slicing falls on chunk boundaries (often `time`), and commit each slice with `append_dim`. Chunk-aligned slicing on a `ManifestArray` (and therefore on the variables of a virtual `xarray.Dataset`) only subsets the manifest, so this is cheap — no chunks are loaded.
+
+```python
+import icechunk as ic
+
+# Parse the giant Zarr store once, producing a virtual dataset that exceeds
+# 50M refs in total but whose `time` axis is chunked.
+vds = vz.open_virtual_dataset(<zarr_store>, parser=ZarrParser(), registry=registry)
+
+chunk_size_time = vds.chunksizes["time"]  # must align the splits to chunk boundaries
+step = chunk_size_time * N  # pick N so that each slice has < 50M refs
+
+repo = ic.Repository.open(<repo_url>)
+
+for i, start in enumerate(range(0, vds.sizes["time"], step)):
+    session = repo.writable_session("main")
+    slice_vds = vds.isel(time=slice(start, start + step))
+    append_dim = "time" if i > 0 else None
+    slice_vds.vz.to_icechunk(session.store, append_dim=append_dim)
+    session.commit(f"wrote virtual references for time slice {i}")
+```
+
+If the slice boundaries don't align with chunk edges along that axis, the indexing call raises `SubChunkIndexingError`.
+
+(Remember you can also subset the Dataset to specific variables and commit those separately too if necessary.)
+
 ### Retries
 
 Sometimes an [`open_virtual_dataset`][virtualizarr.open_virtual_dataset] call might fail for a transient reason, such as a failed HTTP response from a server.

diff --git a/virtualizarr/manifests/array.py b/virtualizarr/manifests/array.py
@@ -155,8 +155,6 @@ def __array_function__(self, func, types, args, kwargs) -> Any:
 
         return MANIFESTARRAY_HANDLED_ARRAY_FUNCTIONS[func](*args, **kwargs)
 
-    # Everything beyond here is basically just to make this array class wrappable by xarray #
-
     def __array_ufunc__(self, ufunc, method, *inputs, **kwargs) -> Any:
         """We have to define this in order to convince xarray that this class is a duckarray, even though we will never support ufuncs."""
         if ufunc == np.isnan:
@@ -227,16 +225,37 @@ def __getitem__(
         /,
     ) -> "ManifestArray":
         """
-        Perform numpy-style indexing on this ManifestArray.
+        Index into this ManifestArray, returning a new ManifestArray view over a subset of chunks.
+
+        Supports only chunk-aligned selections. A ManifestArray only stores references to where
+        each chunk's bytes live, never their decoded values, so any indexer that would split into
+        the interior of a chunk would require loading the underlying data — which defeats the
+        point of a virtual array. Selections that would do so raise ``SubChunkIndexingError``
+        (a ``ValueError`` subclass); this is a permanent constraint, not a missing feature.
 
-        Only supports limited indexing, because in general you cannot slice inside of a compressed chunk.
-        Mainly required because Xarray uses this instead of expand dims (by passing Nones) and often will index with a no-op.
+        Supported indexers (and tuples thereof):
 
-        Could potentially support indexing with slices aligned along chunk boundaries, but currently does not.
+        - ``Ellipsis`` and ``None`` — no-ops and new-axis insertion.
+        - ``slice`` with ``step == 1`` whose start and stop land on chunk boundaries
+          (``stop == axis_length`` is also allowed, so a partial final chunk can be selected).
+          Slice indexers preserve the axis.
+        - ``int`` — drops the indexed axis, following numpy / array-API semantics. Only legal
+          when ``chunk_size == 1`` along that axis; otherwise picking a single element would
+          require splitting a chunk.
+
+        Anything else — fancy indexing with arrays, misaligned slices, ``step != 1`` —
+        raises ``SubChunkIndexingError`` or ``NotImplementedError``.
 
         Parameters
         ----------
         key
+            A basic indexer or tuple of basic indexers, one per array axis (with ``Ellipsis``
+            and ``None`` allowed as per the array API).
+
+        Returns
+        -------
+        ManifestArray
+            A new array whose ``ChunkManifest`` references only the selected chunks.
         """
         return index(self, key)