Skip to content

fix: accessor API with lazy registration and BaseModelRun infrastructure#256

Merged
pgierz merged 24 commits intoprep-releasefrom
fix/gh250
Mar 30, 2026
Merged

fix: accessor API with lazy registration and BaseModelRun infrastructure#256
pgierz merged 24 commits intoprep-releasefrom
fix/gh250

Conversation

@pgierz
Copy link
Copy Markdown
Member

@pgierz pgierz commented Mar 27, 2026

Summary

  • Replace auto-import accessor registration with explicit pycmor.enable_xarray_accessor() (lazy, idempotent)
  • Add _build_rule() helper for interactive Rule construction with native pipeline backend
  • Add StdLibAccessor with tab-completable ds.pycmor.stdlib.<step>() access
  • Add .process() method for running full pipelines interactively on datasets
  • Add BaseModelRun ABC in pycmor.tutorial for standardized test fixtures
  • Add StdLibAccessor export to pycmor.xarray
  • Add comprehensive test suite (test_accessor_api.py)
  • Update existing accessor tests for new lazy registration pattern
  • Fix CMIP7 compound_name matching, error propagation, DefaultPipeline duplicate step
  • Fix pre-existing test failures (pyfesom2 imports, CMIP7 configs, dimension_mapping)
  • Cherry-pick PR Martina #194 (OpenIFS time dimension rename) by @mzapponi
  • Pin sphinx<9 for RTD compatibility

Context

Foundation for the accessor API spec (ACCESSOR-API-SPEC.md). Enables ds.pycmor.process(cmor_variable="tas") and ds.pycmor.stdlib.convert_units(cmor_variable="tas") workflows for interactive CMORization. The three future/feat/* PRs (#253, #254, #255) build on top of this.

Test plan

  • CI passes
  • tests/unit/test_accessor_api.py -- lazy registration, StdLibAccessor, process(), _build_rule()
  • tests/unit/test_xarray_accessors.py -- no regressions
  • tests/unit/test_accessors.py -- no regressions

🤖 Generated with Claude Code

pgierz and others added 24 commits December 12, 2025 09:54
…ss()

- Replace auto-import with enable_xarray_accessor() for lazy registration
- Add _build_rule() helper for interactive Rule construction
- Add StdLibAccessor with tab-completable std_lib steps via ds.pycmor.stdlib
- Add .process() method for running full pipelines interactively
- Add BaseModelRun ABC in pycmor.tutorial for test infrastructure
- Update existing tests to use enable_xarray_accessor()
- Add comprehensive test suite in test_accessor_api.py
# Conflicts:
#	src/pycmor/core/cmorizer.py
- Add required compound_name field to all CMIP7 test config rules
  (validator requires it for cmor_version=CMIP7)
- Add setuptools to Dockerfile.test (pyfesom2 imports pkg_resources)
The vendored all_var_info.json does not populate cmip7_compound_name or
cmip6_compound_name on DRVs. So variable_id falls back to the short
name (e.g., "tas"). The matching logic compared the full compound name
"Amon.tas" against the plain "tas" when only one side had a dot,
which always failed.

Fix: always extract the short name from compound_name for comparison,
regardless of whether the DRV also has dots. Also add a fallback match
against drv.name directly.

Add CMIP7 DRV fixtures (dr_cmip7_tas, dr_cmip7_thetao) for testing.
Pipeline._run_prefect() now uses return_state=True and checks for
failures, re-raising the original exception. Previously, Prefect
swallowed exceptions via on_failure callbacks that only logged.

CMORizer._parallel_process_prefect() also checks both the flow-level
state and individual rule future states for failures.

This ensures integration tests correctly fail when pipeline steps
raise exceptions.
DefaultPipeline had both handle_unit_conversion (correct pipeline step
taking data+rule) and units.convert (low-level function taking
da+from_unit+to_unit). The latter was called with (data, rule) args,
causing ParameterBindError: missing required argument 'to_unit'.

handle_unit_conversion already calls convert() internally, so the
duplicate step was both wrong and redundant.
- dimension_mapping.py: use getattr(rule, "dimension_mapping") instead
  of rule._pycmor_cfg("dimension_mapping", default={}) -- dimension_mapping
  is a rule attribute, not a config option, and everett rejects non-string
  defaults
- CMIP7 test configs: add activity_id="CMIP" to rules that need it for
  global attribute generation
- cmorizer.py: fix parallel error checking to handle both PrefectFuture
  and State objects from different Prefect versions
…_run

- dimension_mapping.py: check isinstance(user_mapping, dict) to handle
  Mock objects in tests (getattr on Mock returns Mock, not None)
- base_model_run.py: convert doctest example to code-block to prevent
  pytest from trying to execute it
Cherry-picked from PR #194 by @mzapponi (adapted for src/pycmor/ paths):
- gather_inputs.py: if rule has time_dimname and dataset uses that
  dimension instead of "time", rename it automatically on load
- pipeline.py: defensive getattr for _cluster attribute

Co-authored-by: Martina Zapponi <mzapponi@users.noreply.github.com>
@pgierz pgierz merged commit 049a206 into prep-release Mar 30, 2026
26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant