fix: apply version filtering and fix supplementary dataset selection in regression analysis by lewisjared · Pull Request #554 · Climate-REF/climate-ref

lewisjared · 2026-02-20T12:13:03Z

Description

Fixes two issues in the dataset selection logic that caused spurious entries in regression analysis results:

Version filtering in dataset queries (datasets/base.py): Added proper version filtering to query_datasets and query_facets methods. Previously, version constraints were silently ignored, causing older dataset versions to leak into results. The version filter supports both exact matching and ordering comparisons (e.g. v20190731 vs v20200101).
Supplementary dataset selection (constraints.py): Fixed AddSupplementaryDataset to pick a single best-matching dataset per group instead of accumulating multiple candidates across score ties. Previously, when multiple supplementary datasets tied on matching score, all were selected, leading to spurious experiment entries (e.g. 1pctCO2, esm-1pct-brch-1000PgC) appearing alongside expected historical and SSP data.

These fixes together eliminate unexpected experiment/version combinations from regression outputs.

Checklist

Please confirm that this pull request has done the following:

Tests added
Documentation added (where applicable)
Changelog item added to changelog/

Closes #545

lewisjared · 2026-02-20T12:17:51Z

@bouweandela what do you think about 1cb40c0 ?

lewisjared · 2026-02-20T12:18:48Z

Closes #545 and #543

codecov · 2026-02-20T12:21:13Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

Flag	Coverage Δ
core	`92.51% <100.00%> (+0.04%)`	⬆️
providers	`89.65% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...imate-ref-core/src/climate_ref_core/constraints.py	`96.75% <100.00%> (+0.04%)`	⬆️
...kages/climate-ref/src/climate_ref/datasets/base.py	`98.47% <100.00%> (+0.04%)`	⬆️
...kages/climate-ref/src/climate_ref/solve_helpers.py	`98.92% <100.00%> (+0.02%)`	⬆️

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lewisjared · 2026-02-25T09:55:24Z

Need to check ps which is another potential supplementary. volcello might be time-dependent.

bouweandela · 2026-02-25T14:50:01Z

For reference, here is an overview of the supplementary variables used by ESMValCore preprocessor functions: https://docs.esmvaltool.org/projects/ESMValCore/en/latest/recipe/preprocessor.html#supplementary-variables-ancillary-variables-and-cell-measures. Not all preprocessor functions are used by the REF, so not all of them may be needed.

Note that I have also used the AddSupplementary constraint to add the historical experiment to the scenarios here:

climate-ref/packages/climate-ref-esmvaltool/src/climate_ref_esmvaltool/diagnostics/climate_at_global_warming_levels.py

Lines 72 to 76 in cf05fe9

    
           AddSupplementaryDataset( 
        
               supplementary_facets={"experiment_id": "historical"}, 
        
               matching_facets=matching_facets, 
        
               optional_matching_facets=tuple(), 
        
           ),

. Those variables will certainly be time-dependent.

bouweandela

On second thought, I wonder how bad the extra areacella files for the global warming levels diagnostic are. Some will get picked up by esmvaltool and others may not, but in the end it shouldn't affect the result.

bouweandela · 2026-02-25T16:40:41Z

...smvaltool/tests/unit/test_solve_regression/test_solve_regression_global_mean_timeseries_.yml

  cmip6:
  - CMIP6.CMIP.EC-Earth-Consortium.EC-Earth3-Veg-LR.1pctCO2.r1i1p1f1.fx.areacella.gr.v20220428
  - CMIP6.ScenarioMIP.EC-Earth-Consortium.EC-Earth3-Veg-LR.ssp126.r1i1p1f1.Amon.tas.gr.v20201201
-cmip6_ssp126_gr_r1i1p1f1_EC-Earth3-Veg_Amon_tas:


I wonder why this execution disappeared

bouweandela · 2026-02-25T16:41:31Z

packages/climate-ref-core/src/climate_ref_core/constraints.py

-            for i in range(len(datasets)):
-                dataset = datasets.iloc[i]
-                # Restrict the supplementary datasets to those that match the main dataset.
+            for _, main_group_df in datasets.groupby(matching_facets):


I tried to understand what is happening here, but this is getting rather complicated. Will have another look tomorrow.

lewisjared · 2026-02-25T22:26:17Z

On second thought, I wonder how bad the extra areacella files for the global warming levels diagnostic are. Some will get picked up by esmvaltool and others may not, but in the end it shouldn't affect the result.

I agree with this. If we are too restrictive all of the providers would need to follow the same logic to find supplementary files and given there is no strict guidance on that, a looser approach might be better

I'm going to split this into two so we can take the version fixes now

lewisjared added 5 commits February 20, 2026 23:11

fix: apply version filtering to the regression analysis

1260185

chore: rerun regression output

03c3eb8

Closes #545

fix: Only select a single areacella

1cb40c0

docs: add changelog for PR #553

324d7d2

docs: rename changelog to match PR #554

2b1f1eb

lewisjared requested a review from bouweandela February 21, 2026 04:52

bouweandela reviewed Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: apply version filtering and fix supplementary dataset selection in regression analysis#554

fix: apply version filtering and fix supplementary dataset selection in regression analysis#554
lewisjared wants to merge 5 commits intomainfrom
fix/regression-versions

lewisjared commented Feb 20, 2026

Uh oh!

lewisjared commented Feb 20, 2026

Uh oh!

lewisjared commented Feb 20, 2026

Uh oh!

codecov bot commented Feb 20, 2026 •

edited

Loading

Uh oh!

lewisjared commented Feb 25, 2026

Uh oh!

bouweandela commented Feb 25, 2026

Uh oh!

bouweandela left a comment

Uh oh!

bouweandela Feb 25, 2026

Uh oh!

bouweandela Feb 25, 2026

Uh oh!

lewisjared commented Feb 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lewisjared commented Feb 20, 2026

Description

Checklist

Uh oh!

lewisjared commented Feb 20, 2026

Uh oh!

lewisjared commented Feb 20, 2026

Uh oh!

codecov bot commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

lewisjared commented Feb 25, 2026

Uh oh!

bouweandela commented Feb 25, 2026

Uh oh!

bouweandela left a comment

Choose a reason for hiding this comment

Uh oh!

bouweandela Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

bouweandela Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

lewisjared commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Feb 20, 2026 •

edited

Loading

lewisjared commented Feb 25, 2026 •

edited

Loading