[2507][Evaluation] Implement Adjusted SSR by moritzhauschulz · Pull Request #2508 · ecmwf/WeatherGenerator

moritzhauschulz · 2026-06-15T17:48:42Z

Description

Adds two probabilistic metrics following GenCast (Price et al., 2024, App. A),
leaving the existing spread/ssr untouched.

Changes

spread_adj — unbiased ensemble spread sqrt(mean Var_ens(ddof=1)) (GenCast Eq. A.6).
ssr_adj — adjusted spread-skill ratio sqrt((M+1)/M) · spread_adj / RMSE(ens_mean)
(GenCast Eq. A.9), where M is the ensemble size. The sqrt((M+1)/M) factor removes the
finite-ensemble bias so a perfectly calibrated ensemble gives SSR = 1.
Plotting: ssr_adj line plots draw a horizontal reference line at the optimal value of 1.

Notes

Legacy spread (biased ddof=0) and ssr are unchanged; GenCast has no uncorrected SSR,
so these remain only as non-standard diagnostics.
Use via evaluation.metrics in the eval config (e.g. add ssr_adj, spread_adj).

Should be reviewed by someone from the evaluation team

On Santis, run with uv run evaluate --config ./config/evaluate/eval_config_test.yml -run-ids qvim6zb3 using the attached eval config.

Issue Number

Closes #2507

Note this depends on #2503, which should be merged first.

Checklist before asking for review

I have performed a self-review of my code
My changes comply with basic sanity checks:
- I have fixed formatting issues with ./scripts/actions.sh lint
- I have run unit tests with ./scripts/actions.sh unit-test
- I have documented my code and I have updated the docstrings.
- I have added unit tests, if relevant
I have tried my changes with data and code:
- I have run the integration tests with ./scripts/actions.sh integration-test
- (bigger changes) I have run a full training and I have written in the comment the run_id(s): launch-slurm.py --time 60
- (bigger changes and experiments) I have shared a hegdedoc in the github issue with all the configurations and runs for this experiments
I have informed and aligned with people impacted by my change:
- for config changes: the MatterMost channels and/or a design doc
- for changes of dependencies: the MatterMost software development channel

Use the robust ens-detection variant in _plot_score_maps_per_stream so the block is identical (apart from the branch-specific tag string) to mh/full-pipeline-diffusion-adjusted-scores, minimising future merge conflicts: - restore ens labels via assign_coords(ens=preds.ens.values) (positional) instead of plot_metrics["ens"] = preds.ens (index-aligned) - gate has-ens detection on the ens *dimension* (all_ens) rather than the coordinate - compute per-metric ens iteration via metric_has_ens / ens_values Behaviour is unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

moritzhauschulz and others added 2 commits June 14, 2026 11:20

score bug fixes

ff845a9

github-project-automation Bot added this to WeatherGen-dev Jun 15, 2026

implemented spread adj and ssr adj

75de10b

moritzhauschulz changed the title ~~[2507][Evaluation] Implement Correct SSR~~ [2507][Evaluation] Implement Adjusted SSR Jun 15, 2026

github-actions Bot added the eval anything related to the model evaluation pipeline label Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[2507][Evaluation] Implement Adjusted SSR#2508

[2507][Evaluation] Implement Adjusted SSR#2508
moritzhauschulz wants to merge 3 commits into
ecmwf:developfrom
moritzhauschulz:mh/develop-score-bug-fixes-adjusted-ssr

moritzhauschulz commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

moritzhauschulz commented Jun 15, 2026

Description

Changes

Notes

Issue Number

Checklist before asking for review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant