Skip to content

Reconcile coupled-model state on the first compiled step under Reactant#405

Open
xkykai wants to merge 5 commits into
mainfrom
xk/fix-reactant-set
Open

Reconcile coupled-model state on the first compiled step under Reactant#405
xkykai wants to merge 5 commits into
mainfrom
xk/fix-reactant-set

Conversation

@xkykai

@xkykai xkykai commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Closes #403

Verification

Minimal coupled MWE shown in #403 (1×1 grid, constant PrescribedAtmosphere, default SlabLand, no
radiation), skin temperature set! to 320 K after construction, 16 compiled steps:

CPU eager:               ⟨T⟩ = 286.88632 K
Reactant, no refresh:    ⟨T⟩ = 235.06009 K   (Δ vs CPU = -5.183e+01 K)   # before this fix
Reactant, no refresh:    ⟨T⟩ = 286.88632 K   (Δ vs CPU = +5.684e-14 K)   # with this fix — bit-exact

🤖 Generated with Claude Code

xkykai and others added 2 commits July 1, 2026 15:41
`EarthSystemModel.time_step!` opens with `maybe_prepare_first_time_step!`, whose generic
definition reconciles the auxiliary/flux state with the prognostic fields on the first
step only, guarded by `if model.clock.iteration == 0`. On `ReactantState` the clock
iteration is a traced scalar, so a plain `if` cannot branch on it inside a compiled
`@trace` loop and the guard silently never fires. Prognostic state `set!` after
construction (as calibration / optimization / differentiable loops do) is therefore never
reconciled into the flux state, and the first compiled step consumes stale
construction-time fluxes. The compiled trajectory then drifts from the eager CPU one
(~0.5 K/window on the ERA5 differentiable stack; larger when seeded far from equilibrium).

Override `maybe_prepare_first_time_step!(::ReactantESM, ...)` to run the refresh through
Reactant's traced conditional, so it still fires only on the first step (no cost on
subsequent steps). `update_state!` is referenced unqualified because `@trace` lifts the
branch body into a closure where module-qualified names do not resolve.

Verified bit-exact vs eager CPU on a 1x1 SlabLand MWE: without the fix the compiled
16-step run drifts (Δ = -51.8 K when seeded at 320 K); with it Δ = 5.7e-14 K.

Residual of #390: #394 fixed the static regridder-index half; this fixes the dynamic
flux-refresh half.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov

codecov Bot commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
ext/NumericalEarthReactantExt.jl 0.00% 5 Missing ⚠️

📢 Thoughts on this report? Let us know!

@dkytezab

dkytezab commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Should we shelve first_time_step! then in Reactant? It's going to be confusing to have two paradigms: where Oceananigans maybe_prepare_first_time_step! is a no op on Reactant, the brunt being passed over to first_time_step! (call to update_state! and reconcile_state!) but here we have NumericalEarth maybe_prepare_first_time_step! mirroring the latter rather than its namesake.

@Pangoraw

Pangoraw commented Jul 2, 2026

Copy link
Copy Markdown

Should we shelve first_time_step! then in Reactant?

I feel like doing the explicit first time step in Reactant might be better, otherwise we are implicitly adding a lot of code to the loop that will run just for the first iteration. Making compile times longer.

xkykai-aeolus and others added 3 commits July 2, 2026 11:47
…ng an iteration guard

Mirror Oceananigans' Reactant paradigm: make maybe_prepare_first_time_step! a no-op on
ReactantESM and perform the first-step auxiliary-state reconcile in an explicit
first_time_step! (update_state! + time_step!), so it is compiled once ahead of the
stepping loop rather than traced into every iteration. Addresses review feedback on #405.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…actant test

Clarify why the coupled-model `first_time_step!` under Reactant refreshes only the
auxiliary/flux state (`update_state!`) and deliberately does not re-run the state-
exchanger initialization: the regridder fractional indices are geometry only and are
fixed at construction (#394), so a post-construction `set!` of the prognostic state does
not invalidate them. The first-compiled-step bug (#403) is purely the missing flux/aux
refresh, so re-running the regridder kernel in every `first_time_step!` graph would be
redundant work for no correctness gain.

Add a Reactant correctness test (test/test_reactant.jl) that seeds the `SlabLand` skin
temperature after construction and checks that a compiled `first_time_step!` + `@trace`
stepping run reproduces the eager CPU result, covering the previously untested extension
code path (`maybe_prepare_first_time_step!` no-op + `first_time_step!`).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Coupled model under Reactant @compile is wrong when prognostic state is set! after construction: the first-step update_state! is skipped inside @trace

4 participants