Skip to content

Flexible NaN-handling  #34

Description

@frazane

For ensemble-based scoring rules, we need flexible handling of missing values in ensemble members. Currently, ensemble-based metrics such as the CRPS return NaN if there is one or more NaNs in the ensemble members. It may be the case that users have an ensemble with a few NaNs (e.g. with lagged ensembles you have NaNs for some timestamps) but still want to get a valid score.

Proposed solution

We allow users to specify a nan_policy argument on ensemble metrics:

  • propagate (default): return NaN if any ensemble member is NaN
  • omit: ignore NaN values during computation
  • raise: raise an error if NaN values are encountered

NaN in the observations always propagates, regardless of policy. The omit policy flags every invalid member — any NaN member, and in the multivariate case any member with a NaN variable or weight — and folds it into the ensemble weights as a zero weight, routing the score through the existing weighted estimators (#117). Shared helpers (apply_nan_policy_ens_uv, apply_nan_policy_ens_mv, nan_policy_check) live in core/utils.py.

Status

Done

  • CRPS — crps_ensemble, twcrps_ensemble, owcrps_ensemble, vrcrps_ensemble (NaN-handling for ensemble CRPS #118, merged).
  • Energy Score — es_ensemble, twes_ensemble, owes_ensemble, vres_ensemble (NaN-handling for Energy Score #123). First multivariate metric; also fixed two pre-existing bugs in the weighted energy numba kernels (ow = ow[0] scalar indexing, and np.mean(fw) vs np.sum(fw * ens_w) normalization).

Remaining

  • Variogram Score — vs_ensemble, twvs_ensemble, owvs_ensemble, vrvs_ensemble. Weighted estimators exist, so it follows the Energy pattern directly.
  • Gaussian Kernel Score — univariate and multivariate gks*_ensemble. Weighted estimators exist, but the weighted multivariate numba kernels still carry the ow = ow[0] bug fixed for Energy (core/kernels/_gufuncs_w.py:250, :277); needs fixing before the omit path runs on numba.

Not planned

  • Dawid–Sebastiani (dssuv_ensemble, dssmv_ensemble) — computed from mean/covariance, no weighted estimator to route through.
  • Log Score (logs_ensemble, clogs_ensemble) — kernel-density estimates, omit semantics not well defined.

Note: omit is not implemented for estimators that cannot be expressed through member weighting (CRPS int/akr/akr_circperm, Energy akr/akr_circperm); these raise NotImplementedError.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions