Skip to content

chunk boundaries for zarr.create(17, chunks=4) #977

Description

@sshleifer

Suppose I created

import zarr
arr = zarr.create(17, chunks=4)

Would it be safe to have 1 thread write to arr[12:16] while another wrote to arr[16:]?

I.e is the last element in its own chunk of size 1?

Why I care

I have each worker holding a strangely shaped 1D array, and am trying to verify the following code for determining the maximum/best chunk size (I assume (possibly incorrectly) that uneven chunk sizes is not supported and that I always want to take the max allowable chunk size even if it's wierd).

from math import gcd
from typing import list
from functools import reduce

def find_gcd(list):
    x = reduce(gcd, list)
    return x

def chunk_size(array_shapes: List[int]):
    if len(array_shapes) <= 2:
        return array_shapes[0]
    return find_gcd(array_shapes[:-1])

def test_chunk_size():
    assert chunk_size([7, 7, 7, 6]) == 7
    assert chunk_size([7, 7, 7, 7]) == 7
    assert chunk_size([6, 6, 6, 4]) == 6
    assert chunk_size([6, 6, 1, 1]) == 1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions