[2502][evaluation] score bug fixes by moritzhauschulz · Pull Request #2503 · ecmwf/WeatherGenerator

moritzhauschulz · 2026-06-14T09:25:49Z

Description

Enables the previously dead probabilistic metrics (spread, ssr, crps, rank_histogram) in the evaluation package, corrects the spread-skill ratio to the standard ensemble-mean definition, and makes the spatial score-map path robust for metrics that collapse the ensemble dimension. This unblocks GenCast-style spread-skill diagnostics over lead time and ensemble-spread maps.

I am not very familiar with the eval pipeline, so it would be good if this could be reviewed by someone more knowledgeable

`scores/score.py`

Enable probabilistic-metric dispatch. Replace the dead assert self.ens_dim … / return None
(undefined self.ens_dim, unconditional return) with a real check: warn and skip when the
ensemble dim self._ens_dim is absent from the predictions (e.g. deterministic runs), otherwise
dispatch to the metric function. This activates spread, ssr, crps, and rank_histogram.
Correct the spread-skill ratio. calc_ssr now divides the ensemble spread by the RMSE of
the ensemble mean (the "skill", GenCast / WeatherBench2 convention) —
calc_spread(p) / calc_rmse(p.mean("ens"), gt) — instead of the full-ensemble per-member RMSE.
SSR is now a single value per variable/level/lead-time with the standard calibration
interpretation (under-/over-dispersion), consistent with the already ensemble-reduced spread
numerator.

`plotting/plot_orchestration.py` (`_plot_score_maps_per_stream`)

Fix CoordinateValidationError crash. Guard the ensemble-label assignment on
"ens" in plot_metrics.dims (the concatenated result) rather than preds.dims. When every
selected metric reduces the ensemble dim (e.g. metrics: ["ssr"]), plot_metrics has no ens
dim and the previous unconditional plot_metrics["ens"] = preds.ens raised.
Avoid redundant per-member maps. Track which metrics retain a per-member ens dim in their
own result (ens_metrics). Iterate ensemble members only for those; ensemble-reduced metrics
(spread, crps, ssr) get a single map instead of one identical map per member.
Collapse broadcast ensemble axis. When xr.concat broadcasts a reduced metric across ens,
select a single member (isel(ens=0, drop=True)) so the plotted field is 2-D.

Issue Number

Closes #2502

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
[] I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

Use the robust ens-detection variant in _plot_score_maps_per_stream so the block is identical (apart from the branch-specific tag string) to mh/full-pipeline-diffusion-adjusted-scores, minimising future merge conflicts: - restore ens labels via assign_coords(ens=preds.ens.values) (positional) instead of plot_metrics["ens"] = preds.ens (index-aligned) - gate has-ens detection on the ens *dimension* (all_ens) rather than the coordinate - compute per-metric ens iteration via metric_has_ens / ens_values Behaviour is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

clessig · 2026-06-15T05:51:54Z

@jpolz : could you have a look?

score bug fixes

ff845a9

github-project-automation Bot added this to WeatherGen-dev Jun 14, 2026

moritzhauschulz mentioned this pull request Jun 14, 2026

Mh/full pipeline diffusion adjusted scores #2491

Merged

3 tasks

github-actions Bot added bug Something isn't working eval anything related to the model evaluation pipeline labels Jun 14, 2026

moritzhauschulz mentioned this pull request Jun 15, 2026

[2507][Evaluation] Implement Adjusted SSR #2508

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[2502][evaluation] score bug fixes#2503

[2502][evaluation] score bug fixes#2503
moritzhauschulz wants to merge 2 commits into
ecmwf:developfrom
moritzhauschulz:mh/develop-score-bug-fixes

moritzhauschulz commented Jun 14, 2026

Uh oh!

clessig commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

moritzhauschulz commented Jun 14, 2026

Description

scores/score.py

plotting/plot_orchestration.py (_plot_score_maps_per_stream)

Issue Number

Checklist before asking for review

Uh oh!

clessig commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`scores/score.py`

`plotting/plot_orchestration.py` (`_plot_score_maps_per_stream`)