Skip to content

Export esemble members to cf-compliant netcdf#2247

Open
s6sebusc wants to merge 9 commits into
ecmwf:developfrom
s6sebusc:export_members
Open

Export esemble members to cf-compliant netcdf#2247
s6sebusc wants to merge 9 commits into
ecmwf:developfrom
s6sebusc:export_members

Conversation

@s6sebusc

Copy link
Copy Markdown
Contributor

Description

If more than one member is present, a new coordinate "mem" is added for the members.

Issue Number

Closes #2246

@s6sebusc

Copy link
Copy Markdown
Contributor Author

I know that the coordinates are originally stored in the config but I wasn't sure how best to add another coordinate that is only sometimes present. Another option would be to add the ensemble coordinate in every case and treat deterministic runs as single member ensembles. That might be convenient for some applications, but annoying for others.

@github-actions github-actions Bot added the eval anything related to the model evaluation pipeline label Apr 22, 2026
@s6sebusc s6sebusc marked this pull request as ready for review May 18, 2026 11:25
@s6sebusc

Copy link
Copy Markdown
Contributor Author

Had to touch the coordinate ordering thing as well. Moved that part to Regridder.regrid_ds in hopes that it is more reusable there, shouldn't really change anything but I can also move it back to the specific gauss_to_regular thing if that's better.
I didn't know at which place in the coordinate order mem should go so I put it last (also any other future coordinates with be appended after that in the current implementation), let me know if there is some preferred place for it.

@enssow

enssow commented May 22, 2026

Copy link
Copy Markdown
Contributor

Hi Sebastian, thank you for working on this I have some suggestions contained within the PR i opened :)

Tested with q76cdspy which only has the tp_imerg_0 valriable for stream IMERG_ANEMOI and ens members. If exported using --format netcdf it returns the ensemble members as the coordinate mem with CF-compliant attributes.

image

Tested with ryjwbxus which has all variables for stream ERA5 (no ens members). If exported with --format netcdf or --format verif returns the same netcdf as before.

It would be useful to test with a run_id that has ensemble members for ERA5 to check the knock on effect on verif_parser but I will open a sepearte issue for this when the time comes

@iluise

iluise commented May 22, 2026

Copy link
Copy Markdown
Contributor

Thanks for testing! you can use for example ege1pq8v on Santis (4 ensemble members). Do you have access?
If not, @wael-mika has IMERG run with 16 ensemble members on Jupiter. Wael can you share the run_id?

s6sebusc added 2 commits May 26, 2026 10:07
Adds the new "member" coordinate to the config instead of hacking it in somewhere else. Thanks Sorcha!
@enssow

enssow commented May 28, 2026

Copy link
Copy Markdown
Contributor

Debugging with ege1pq8v seems to have 40320* 6 = 270000 ipoints. This is the result from the data worker, before any reshaping etc. This may be a bug? Needs further investigation

<xarray.DataArray (ipoint: 270000, channel: 72, mem: 4)> Size: 311MB
array([[[-1.98282078e-01,  4.26849890e+00, -9.56105709e-01,
          3.12603760e+00],
        [ 2.86099744e+00, -1.01562309e+00, -4.21182060e+00,
          8.33157837e-01],
        [ 2.65874084e+02,  2.68199127e+02,  2.59316284e+02,
          2.59673981e+02],
        ...,
        [ 2.73779297e+04,  2.80807520e+04,  2.59722852e+04,
          2.68680391e+04],
        [ 1.22252734e+04,  1.36557715e+04,  1.14337314e+04,
          1.33076836e+04],
        [ 5.82391992e+03,  6.91643164e+03,  5.73539258e+03,
          6.62066895e+03]],

       [[-1.70663610e-01,  4.22538757e+00, -1.05041265e+00,
          3.08292580e+00],
        [ 2.73565650e+00, -1.06934071e+00, -4.24763250e+00,
          8.46587241e-01],
        [ 2.65874084e+02,  2.68258728e+02,  2.59435516e+02,
          2.59793213e+02],
...
        [ 2.34366133e+04,  2.41807773e+04,  2.36846680e+04,
          2.82943555e+04],
        [ 1.19868574e+04,  1.14909512e+04,  1.08233848e+04,
          1.02511855e+04],
        [ 5.30080176e+03,  6.39130176e+03,  4.31090234e+03,
          3.63487305e+03]],

       [[-2.16469839e-01,  1.43390167e+00, -8.92369461e+00,
         -2.80182743e+00],
        [ 9.41454601e+00, -2.36751604e+00, -1.00670395e+01,
          9.51003432e-02],
        [ 2.18777206e+02,  2.06377014e+02,  2.43100647e+02,
          2.09715515e+02],
        ...,
        [ 2.34641758e+04,  2.41532148e+04,  2.36571055e+04,
          2.83150254e+04],
        [ 1.19773203e+04,  1.14909512e+04,  1.08043115e+04,
          1.02702598e+04],
        [ 5.30885010e+03,  6.41142139e+03,  4.29480615e+03,
          3.61877710e+03]]], shape=(270000, 72, 4), dtype=float32)
Coordinates:
  * ipoint         (ipoint) int64 2MB 0 1 2 3 4 ... 269996 269997 269998 269999
    valid_time     (ipoint) datetime64[ns] 2MB 2023-10-02T18:00:00 ... 2023-1...
    lat            (ipoint) float32 1MB 89.78 89.78 89.78 ... -89.78 -89.78
    lon            (ipoint) float32 1MB 0.0 20.0 120.0 ... -60.0 -40.0 -20.0
  * channel        (channel) <U6 2kB '10u' '10v' '2d' ... 'z_850' 'z_925'
    forecast_step  int64 8B 4
Dimensions without coordinates: mem

@enssow

enssow commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Hi @s6sebusc sorry for the long review process. I've opened a PR against your branch here: s6sebusc#2

Testing is now complete (there was a small isue I found with processing multiple streams that has also affected the PR)

Tested on JUPITER with run_id bgra397w , using command uv run export --run-id bgra397w --output-dir ../test_ens --format netcdf --samples 0 --fsteps 2 3

Examining the data:

image
<xarray.Dataset> Size: 78MB
Dimensions:                  (pressure: 3, mem: 16, ncells: 40320, valid_time: 2)
Coordinates:
  * pressure                 (pressure) int64 24B 50 500 850
  * mem                      (mem) int64 128B 0 1 2 3 4 5 ... 10 11 12 13 14 15
  * ncells                   (ncells) int64 323kB 0 1 2 3 ... 40317 40318 40319
  * valid_time               (valid_time) datetime64[ns] 16B 2014-07-01T12:00...
    latitude                 (ncells, valid_time) float32 323kB ...
    longitude                (ncells, valid_time) float32 323kB ...
    forecast_period          (valid_time) timedelta64[ns] 16B ...
    forecast_reference_time  datetime64[ns] 8B ...
Data variables:
    q                        (pressure, mem, ncells, valid_time) float32 15MB ...
    t                        (pressure, mem, ncells, valid_time) float32 15MB ...
    u                        (pressure, mem, ncells, valid_time) float32 15MB ...
    v                        (pressure, mem, ncells, valid_time) float32 15MB ...
    z                        (pressure, mem, ncells, valid_time) float32 15MB ...
Attributes:
    title:        WeatherGenerator Output for bgra397w
    institution:  WeatherGenerator Project
    source:       WeatherGenerator v0.0
    history:      Created using the export_inference.py script on 2026-06-12T...
    Conventions:  CF-1.12
    ```

@enssow

enssow commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

(There will be a seperate PR for verif as adding the ensembles changes a few more things)

@enssow enssow mentioned this pull request Jun 12, 2026
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

eval anything related to the model evaluation pipeline

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

Export ensemble members to netcdf

3 participants