Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ abi3-py311 = ["pyo3/abi3-py311"]
arrow-array = "59"
arrow-buffer = "59"
arrow-schema = "59"
async-trait = "0.1"
bytes = "1.12.0"
dlpark = { git = "https://github.com/kylebarron/dlpark", rev = "31c6f49c064e634326c97172d39a00acecd854b6", features = [
"pyo3",
Expand Down
131 changes: 131 additions & 0 deletions dev-docs/specs/2026-06-25-read-only-array-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,131 @@
# Read-only array via a `ReadOnly` storage adapter

**Date:** 2026-06-25
**Status:** Approved, ready for implementation plan

## Problem

`PyArray` wraps `Array<dyn ReadableWritableListableStorageTraits>` — the maximal
zarrs storage trait — and all write methods (`store_chunk`, `store_encoded_chunk`,
`erase_chunk`, `erase_metadata`, `compact_chunk`) live directly on it.

We want a `read_only()` method that returns an array which **raises at runtime** if
the user attempts to mutate it. The motivation is guarding against *accidental*
mutation of an array to which the user actually has write access.

zarrs' built-in `Array::readable()` downgrades to `Array<dyn ReadableStorageTraits>`,
a strictly less-capable *type* that `PyArray::new` will not accept (compile error:
`expected trait ReadableWritableListableStorageTraits, found trait
ReadableStorageTraits`). A compile-time-distinct read-only type is also the wrong
model for our future needs: stores like read-only S3 (via `ObjectStore`) or future
Python-protocol-backed stores only know they are read-only **at runtime**, not at
compile time.

## Decision

Introduce a `ReadOnly<T>` storage **adapter** that wraps a readable + listable store
and **satisfies the maximal `ReadableWritableListableStorageTraits`** by implementing
its write methods as runtime errors. This keeps the entire existing stack (one
`PyArray` type over one storage trait) unchanged — no new pyclass, no type surgery —
while making writes fail at runtime instead of compile time.

This works because zarrs provides a blanket impl: any `T: Readable + Writable +
Listable + 'static` automatically implements `ReadableWritableListableStorageTraits`
(`zarrs_storage/src/storage_sync.rs:254`). So an adapter that *delegates* reads and
lists and *errors* on writes transparently fills the maximal-trait slot.

Rejected alternative — a separate `ReadOnlyArray` pyclass over
`Array<dyn ReadableStorageTraits>`: gives compile-time guarantees but (a) requires
duplicating/factoring all read + metadata methods into shared macros for a second
type, and (b) cannot represent stores whose read-only-ness is only known at runtime,
which is the more important real-world case.

## Components

### 1. `ReadOnly<T>` — sync adapter

New file: `src/storage/read_only.rs`.

```rust
pub struct ReadOnly<T: ?Sized>(Arc<T>);

impl<T: ?Sized> ReadOnly<T> {
pub fn new(inner: Arc<T>) -> Self { Self(inner) }
}
```

Trait impls (over `T: ?Sized + ReadableListableStorageTraits` as appropriate):

- `ReadableStorageTraits` — delegate every method to `self.0`
(`get_partial_many`, `size_key`, `supports_get_partial`).
- `ListableStorageTraits` — delegate every method to `self.0`
(`list`, `list_prefix`, `list_dir`, `size_prefix`).
- `WritableStorageTraits` — **every** method returns
`Err(StorageError::ReadOnly)`:
- `set` → `Err(StorageError::ReadOnly)`
- `set_partial_many` → `Err(StorageError::ReadOnly)`
- `erase` → `Err(StorageError::ReadOnly)`
- `erase_prefix` → `Err(StorageError::ReadOnly)`
- `supports_set_partial` → `false`

`StorageError::ReadOnly` already exists with the message "a write operation was
attempted on a read only store" (`zarrs_storage/src/lib.rs:170`). It flows through
the existing `ZarristaError` conversion and surfaces as a Python exception.

The blanket impl then yields `ReadableWritableListableStorageTraits` for free.

### 2. `AsyncReadOnly<T>` — async adapter

Same file (or `src/storage/read_only.rs` shared). Mirrors `ReadOnly` against the
async traits (`AsyncReadableStorageTraits`, `AsyncListableStorageTraits`,
`AsyncWritableStorageTraits`), whose method set is identical with `async fn`. Write
methods return `Err(StorageError::ReadOnly)`. The async blanket impl
(`storage_async.rs`) yields `AsyncReadableWritableListableStorageTraits`.

### 3. `PyArray::read_only` (sync)

Replace the current non-compiling body in `src/array/sync.rs`:

```rust
fn read_only(&self) -> Self {
let inner = self.inner.storage().readable_listable(); // RWL -> RL
let storage = Arc::new(ReadOnly::new(inner)); // RL -> faked RWL
Self::new(Arc::new(self.inner.with_storage(storage)))
}
```

- `Array::storage()` (`zarrs/src/array.rs:710`) → `Arc<dyn RWL>`.
- `.readable_listable()` (`storage_sync.rs:248`) downgrades to `Arc<dyn RL>`.
- `Array::with_storage(storage)` (`zarrs/src/array.rs:420`) rebuilds the `Array`
with the same metadata over the new storage.

### 4. `PyAsyncArray::read_only` (async)

Same shape in `src/array/async.rs`, using `AsyncReadOnly` and the async
`readable_listable()`.

## Data flow

- Reads (`__getitem__`, `retrieve_array_subset`, `retrieve_chunk`,
`retrieve_encoded_chunk`) and metadata accessors: pass through the adapter
untouched.
- Writes (`store_chunk`, `store_encoded_chunk`, `erase_chunk`, `erase_metadata`,
`compact_chunk`): hit the adapter's erroring write methods and raise
`StorageError::ReadOnly` → Python exception.

## Out of scope (YAGNI for this pass)

- A `read_only` boolean / introspection property on the array. Future direction:
expose `array.store`, and put `is_read_only` on that store object instead.
- Wrapping genuinely read-only stores at the `PySyncStorage`/`PyAsyncStorage`
boundary (S3, Python-protocol stores). The `ReadOnly` adapter built here is the
reusable mechanism that will enable it; the boundary wiring is a separate change.

## Testing

- `read_only()` returns an array that still reads correctly (round-trip a subset
read equals the source array's read).
- Each write method on a read-only array raises (Python-level assertion that the
expected exception type is raised) for both sync and async.
- A normal (non-read-only) array still writes successfully — no regression.
```
32 changes: 32 additions & 0 deletions python/zarrista/_array.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,16 @@ class Array:
@staticmethod
def open(store: FilesystemStore | MemoryStore, path: str = "/") -> Array:
"""Open the array stored at `path` in `store`."""
@staticmethod
def from_metadata(
metadata: ArrayMetadataV3,
store: FilesystemStore | MemoryStore,
path: str = "/",
) -> Array:
"""Use the provided metadata to open a new array at `path` in `store`.

This does **not** write the metadata to the store.
"""
@property
def attrs(self) -> dict[str, JSONValue]:
"""The array's user attributes as a dict."""
Expand Down Expand Up @@ -129,6 +139,12 @@ class Array:
"""
def erase_metadata(self) -> None:
"""Delete the array's metadata from the store."""
def read_only(self) -> Array:
"""Return a read-only view of this array.

Reads behave identically, but any write (`store_chunk`, `erase_chunk`,
`erase_metadata`, ...) raises at runtime.
"""
@property
def shape(self) -> list[int]:
"""The array shape."""
Expand All @@ -147,6 +163,16 @@ class AsyncArray:

`store` may be an obstore `ObjectStore` or an icechunk `Session`.
"""
@staticmethod
def from_metadata(
metadata: ArrayMetadataV3,
store: AsyncStore,
path: str = "/",
) -> AsyncArray:
"""Use the provided metadata to open a new array at `path` in `store`.

This does **not** write the metadata to the store.
"""
@property
def attrs(self) -> dict[str, JSONValue]:
"""The array's user attributes as a dict."""
Expand Down Expand Up @@ -243,6 +269,12 @@ class AsyncArray:
"""
async def erase_metadata(self) -> None:
"""Delete the array's metadata from the store."""
def read_only(self) -> AsyncArray:
"""Return a read-only view of this array.

Reads behave identically, but any write (`store_chunk`, `erase_chunk`,
`erase_metadata`, ...) raises at runtime.
"""
@property
def shape(self) -> list[int]:
"""The array shape."""
Expand Down
30 changes: 28 additions & 2 deletions src/array/async.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,9 +8,10 @@ use crate::array::PyChunkIndices;
use crate::array_bytes::PyArrayBytes;
use crate::codec::PyCodecOptions;
use crate::decoded_array::DecodedArray;
use crate::error::ZarristaError;
use crate::error::{ZarristaError, ZarristaResult};
use crate::metadata::PyArrayMetadata;
use crate::node::PyNodePath;
use crate::storage::PyAsyncStorage;
use crate::storage::{AsyncReadOnlyStorageAdapter, PyAsyncStorage};
use pyo3::prelude::*;
use pyo3_async_runtimes::tokio::future_into_py;
use pyo3_bytes::PyBytes;
Expand Down Expand Up @@ -56,6 +57,24 @@ impl PyAsyncArray {
)
}

/// Use the provided metadata to open a new array at `path` in `store`.
///
/// This does **not** write to the store, use `store_metadata` to write metadata to storage.
#[staticmethod]
#[pyo3(
signature = (metadata, store, path = PyNodePath::root()),
text_signature = "(metadata, store, path='/')"
)]
fn from_metadata(
metadata: PyArrayMetadata,
store: PyAsyncStorage,
path: PyNodePath,
) -> ZarristaResult<Self> {
let inner =
Array::new_with_metadata(store.into_inner(), path.as_str(), metadata.into_inner())?;
Ok(Self::new(Arc::new(inner)))
}

/// Open the array stored at `path` in `store`.
#[staticmethod]
#[pyo3(
Expand Down Expand Up @@ -112,6 +131,13 @@ impl PyAsyncArray {
})
}

/// Return a read-only view of this array; writes raise at runtime.
fn read_only(&self) -> Self {
let read_list_storage = self.inner.storage().readable_listable();
let storage = Arc::new(AsyncReadOnlyStorageAdapter::new(read_list_storage));
Self::new(Arc::new(self.inner.with_storage(storage)))
}

fn erase_metadata<'py>(&self, py: Python<'py>) -> PyResult<Bound<'py, PyAny>> {
let inner = self.inner.clone();
future_into_py(py, async move {
Expand Down
27 changes: 26 additions & 1 deletion src/array/sync.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,9 @@ use crate::array_bytes::PyArrayBytes;
use crate::codec::PyCodecOptions;
use crate::decoded_array::DecodedArray;
use crate::error::ZarristaResult;
use crate::metadata::PyArrayMetadata;
use crate::node::PyNodePath;
use crate::storage::PySyncStorage;
use crate::storage::{PySyncStorage, ReadOnlyStorageAdapter};
use pyo3::prelude::*;
use pyo3_bytes::PyBytes;
use zarrs::array::Array;
Expand Down Expand Up @@ -50,6 +51,24 @@ impl PyArray {
)
}

/// Use the provided metadata to open a new array at `path` in `store`.
///
/// This does **not** write to the store, use `store_metadata` to write metadata to storage.
#[staticmethod]
#[pyo3(
signature = (metadata, store, path = PyNodePath::root()),
text_signature = "(metadata, store, path='/')"
)]
fn from_metadata(
metadata: PyArrayMetadata,
store: PySyncStorage,
path: PyNodePath,
) -> ZarristaResult<Self> {
let inner =
Array::new_with_metadata(store.into_inner(), path.as_str(), metadata.into_inner())?;
Ok(Self::new(Arc::new(inner)))
}

/// Open the array stored at `path` in `store`.
#[staticmethod]
#[pyo3(
Expand Down Expand Up @@ -85,6 +104,12 @@ impl PyArray {
Ok(())
}

fn read_only(&self) -> Self {
let read_list_storage = self.inner.storage().readable_listable();
let storage = Arc::new(ReadOnlyStorageAdapter::new(read_list_storage));
Self::new(Arc::new(self.inner.with_storage(storage)))
}

/// Read a region of the array, using numpy-style basic indexing.
///
/// Returns one of the decoded result classes (`Tensor`, `VariableArray`,
Expand Down
Loading
Loading