Update AWS access instructions for ZARR version 3.0. by simonpf · Pull Request #100 · SEE-GEO/ccic

simonpf · 2025-06-09T14:00:34Z

This PR updates the instructions for accessing CCIC data from AWS to ensure compatibility with Zarr version 3.0 and above. The previous approach no longer worked due to changes in the Zarr API.

adriaat · 2025-06-11T08:00:46Z

Two notes:

import ccic can't be removed. It's necessary to register the log_bins codec
ds = xr.open_zarr('s3://chalmerscloudiceclimatology/record/gridsat/2020/ccic_gridsat_202001010000.zarr') already addresses the changes introduced wtih Zarr 3. As far as I know, there is no plan that consolidated=True will stop being the default.

I see two options:

Keep the (more verbose) approach you suggest (and not removing import ccic)
Reformat to indicate ds = xr.open_zarr('s3://chalmerscloudiceclimatology/record/gridsat/2020/ccic_gridsat_202001010000.zarr'), without instantiating an s3fs filesystem.

docs/getting_started.md

adriaat · 2025-06-11T08:41:53Z

docs/getting_started.md

-ds = xr.open_zarr(s3.get_mapper('chalmerscloudiceclimatology/record/gridsat/2020/ccic_gridsat_202001010000.zarr'))
+aws_file_path = "chalmerscloudiceclimatology/record/gridsat/2021/ccic_gridsat_202101010000.zarr"
+store = zarr.storage.FsspecStore(s3, path=aws_file_path)
+ds = xr.open_zarr(store, consolidated=True)


ds = xr.open_zarr('s3://chalmerscloudiceclimatology/record/gridsat/2021/ccic_gridsat_202101010000.zarr') will simply work with a Zarr3 installation and xarray (xarray will try to guess the store from the s3 key)

adriaat · 2025-06-11T09:26:42Z

Also note that this PR fails the GitHub test_and_install action. I am looking into it.

Update:

Zarr 3 requires at least Python 3.11. We use 3.10 in the environment YAML files.

Python 3.11 is not compatible with the PyTorch packages we specify in the YAML files.

I opened issue #101 for this.

Related: PR #102

…mported to read data

simonpf · 2025-06-14T03:14:42Z

Thanks a lot for digging into this. Your suggestion is a lot cleaner.

To make it work I still had to

Add s3fs to the dependencies
Set the storage options to 'anon'

adriaat · 2025-06-16T09:41:33Z

Ah, yes, of course, s3fs should be a dependency if you read from S3 buckets. I assumed the user would have that already on their end --- I was having in mind the more local use we do at Chalmers when I changed setup.py, where we have the data offline.
Good that you noticed that you need to give anon to the storage options. Perhaps something in my ~/.aws directory was used when I tested it.

For future reference: I think something has changed with how mamba interacts with our environment YAML files. If mamba env create -f ccic_cpu.yml is used, resolving the dependencies is painfully slow (when debugging locally the terminal even becomes unresponsive): this explains why the GH action install_and_test takes about 50 min. It used to take (less than) a handful of minutes. Instead, if conda env create -f ccic_cpu.yml is used, the dependencies are quickly resolved and the GH action is completed in (less than) a handful of minutes. I tried to find the cause; I think mamba tries to look for a CUDA driver (and install it) even if we use the env file for CPU-only.

Update AWS access instructions for ZARR version 3.0.

84450f4

adriaat requested changes Jun 11, 2025

View reviewed changes

adriaat mentioned this pull request Jun 13, 2025

Environment updates #102

Merged

adriaat and others added 5 commits June 13, 2025 23:11

Add info on new paper

7447d7b

Import Blosc from numcodecs

f319828

Address DeprecationWarning for scipy

9b60c7d

Pinning versions, and flexibility to support Zarr 3 if ccic is only i…

6b8b20d

…mported to read data

Simplify getting started instructions.

aaf03cf

adriaat approved these changes Jun 16, 2025

View reviewed changes

Resolve conflicts

b0bfa0e

adriaat merged commit 224c8fc into main Jun 16, 2025
1 check passed

adriaat mentioned this pull request Jun 16, 2025

mamba slows down the environment creation #103

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update AWS access instructions for ZARR version 3.0.#100

Update AWS access instructions for ZARR version 3.0.#100
adriaat merged 7 commits intomainfrom
fix_docs

simonpf commented Jun 9, 2025

Uh oh!

adriaat commented Jun 11, 2025 •

edited

Loading

Uh oh!

Uh oh!

adriaat Jun 11, 2025 •

edited

Loading

Uh oh!

adriaat commented Jun 11, 2025 •

edited

Loading

Uh oh!

simonpf commented Jun 14, 2025

Uh oh!

adriaat commented Jun 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

simonpf commented Jun 9, 2025

Uh oh!

adriaat commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

adriaat Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adriaat commented Jun 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

simonpf commented Jun 14, 2025

Uh oh!

adriaat commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adriaat commented Jun 11, 2025 •

edited

Loading

adriaat Jun 11, 2025 •

edited

Loading

adriaat commented Jun 11, 2025 •

edited

Loading

adriaat commented Jun 16, 2025 •

edited

Loading