fix: apply version filtering and fix supplementary dataset selection in regression analysis#554
fix: apply version filtering and fix supplementary dataset selection in regression analysis#554lewisjared wants to merge 5 commits intomainfrom
Conversation
|
@bouweandela what do you think about 1cb40c0 ? |
Codecov Report✅ All modified and coverable lines are covered by tests.
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 1 file with indirect coverage changes 🚀 New features to boost your workflow:
|
|
Need to check ps which is another potential supplementary. volcello might be time-dependent. |
|
For reference, here is an overview of the supplementary variables used by ESMValCore preprocessor functions: https://docs.esmvaltool.org/projects/ESMValCore/en/latest/recipe/preprocessor.html#supplementary-variables-ancillary-variables-and-cell-measures. Not all preprocessor functions are used by the REF, so not all of them may be needed. Note that I have also used the |
bouweandela
left a comment
There was a problem hiding this comment.
On second thought, I wonder how bad the extra areacella files for the global warming levels diagnostic are. Some will get picked up by esmvaltool and others may not, but in the end it shouldn't affect the result.
| cmip6: | ||
| - CMIP6.CMIP.EC-Earth-Consortium.EC-Earth3-Veg-LR.1pctCO2.r1i1p1f1.fx.areacella.gr.v20220428 | ||
| - CMIP6.ScenarioMIP.EC-Earth-Consortium.EC-Earth3-Veg-LR.ssp126.r1i1p1f1.Amon.tas.gr.v20201201 | ||
| cmip6_ssp126_gr_r1i1p1f1_EC-Earth3-Veg_Amon_tas: |
There was a problem hiding this comment.
I wonder why this execution disappeared
| for i in range(len(datasets)): | ||
| dataset = datasets.iloc[i] | ||
| # Restrict the supplementary datasets to those that match the main dataset. | ||
| for _, main_group_df in datasets.groupby(matching_facets): |
There was a problem hiding this comment.
I tried to understand what is happening here, but this is getting rather complicated. Will have another look tomorrow.
Description
Fixes two issues in the dataset selection logic that caused spurious entries in regression analysis results:
Version filtering in dataset queries (
datasets/base.py): Added properversionfiltering toquery_datasetsandquery_facetsmethods. Previously, version constraints were silently ignored, causing older dataset versions to leak into results. The version filter supports both exact matching and ordering comparisons (e.g.v20190731vsv20200101).Supplementary dataset selection (
constraints.py): FixedAddSupplementaryDatasetto pick a single best-matching dataset per group instead of accumulating multiple candidates across score ties. Previously, when multiple supplementary datasets tied on matching score, all were selected, leading to spurious experiment entries (e.g.1pctCO2,esm-1pct-brch-1000PgC) appearing alongside expectedhistoricaland SSP data.These fixes together eliminate unexpected experiment/version combinations from regression outputs.
Checklist
Please confirm that this pull request has done the following:
changelog/