Coupled evaluator with prediction loader #856
Conversation
fme/coupled/inference/evaluator.py
Outdated
| restrict_to_output_names=( | ||
| stepper.ocean.out_names, | ||
| stepper.atmosphere.out_names, | ||
| ), |
There was a problem hiding this comment.
We did not have to do this in ace evaluator because there was no merge operation, we need this here because we can have overlap in atmosphere and ocean in_names (e.g., land_fraction)
There was a problem hiding this comment.
In the fme.ace version we log all variables including input-only forcings. I think this can be useful for examining the forcings and for post-hoc analyses.
See my other comment. I think we can instead add an @property called all_names to CoupledStepperConfig which returns a CoupledNames where:
atmospherehas its out names +atmosphere_forcing_exogenous_names+shared_forcing_exogenous_namesoceanhas its out names +ocean_forcing_exogenous_names
jpdunc23
left a comment
There was a problem hiding this comment.
Looks good! I have some suggestions for handling the names.
fme/coupled/inference/loop.py
Outdated
| deriver: CoupledDeriverABC, | ||
| writer: CoupledPairedDataWriter | NullDataWriter | None = None, | ||
| record_logs: Callable[[InferenceLogs], None] | None = None, | ||
| restrict_to_output_names: tuple[list[str], list[str]] | None = None, |
There was a problem hiding this comment.
Can you please add a new dataclass to replace tuple[list[str], list[str]] in fme/coupled/typing_.py:
@dataclasses.dataclass
class CoupledNames:
ocean: list[str]
atmosphere: list[str]You could also add an See my other comment where I suggest instead adding @property called out_names to CoupledStepperConfig which gets the names from the component steppers and returns a CoupledNames object.all_names to CoupledStepperConfig.
| assert not os.path.exists(tmp_path / "atmosphere/restart.nc") | ||
| assert not os.path.exists(tmp_path / "atmosphere/initial_condition.nc") | ||
| assert not os.path.exists(tmp_path / "ocean/restart.nc") | ||
| assert not os.path.exists(tmp_path / "ocean/initial_condition.nc") |
There was a problem hiding this comment.
Would be good to add a check similar to the following in the fme.ace test:
# if these are off by something like 90% then probably the stepper
# is being used instead of the prediction_data
assert log[f"inference/mean/weighted_rmse/{var}"] == 0.0
assert log[f"inference/mean/weighted_bias/{var}"] == 0.0
fme/coupled/inference/loop.py
Outdated
| initial_condition=CoupledPairedData( | ||
| ocean_data=PairedData.from_batch_data( | ||
| prediction=pred_ic.ocean_data, | ||
| reference=target_ic.ocean_data, | ||
| ), | ||
| atmosphere_data=PairedData.from_batch_data( | ||
| prediction=pred_ic.atmosphere_data, | ||
| reference=target_ic.atmosphere_data, | ||
| ), | ||
| ), |
There was a problem hiding this comment.
Can you use CoupledPairedData.from_coupled_batch_data here?
fme/coupled/inference/loop.py
Outdated
| from fme.coupled.inference.data_writer import CoupledPairedDataWriter | ||
|
|
||
|
|
||
| class CoupledDeriverABC(Protocol): |
There was a problem hiding this comment.
I think using Protocol rather than abc.ABC as in fme.ace is good, but we should call it CoupledDeriverProtocol or simply CoupledDeriver.
There was a problem hiding this comment.
Changed to CoupledDeriver
fme/coupled/inference/evaluator.py
Outdated
| restrict_to_output_names=( | ||
| stepper.ocean.out_names, | ||
| stepper.atmosphere.out_names, | ||
| ), |
There was a problem hiding this comment.
In the fme.ace version we log all variables including input-only forcings. I think this can be useful for examining the forcings and for post-hoc analyses.
See my other comment. I think we can instead add an @property called all_names to CoupledStepperConfig which returns a CoupledNames where:
atmospherehas its out names +atmosphere_forcing_exogenous_names+shared_forcing_exogenous_namesoceanhas its out names +ocean_forcing_exogenous_names
jpdunc23
left a comment
There was a problem hiding this comment.
Couple of nits and a question, but LGTM.
| timestep_size=1, | ||
| timestep_start=0, | ||
| nz=3, | ||
| masked_fill_value: float = float("nan"), |
There was a problem hiding this comment.
Do you know why this is needed for prediction_loader inference but not evaluator inference?
There was a problem hiding this comment.
Okay this is because we didn't write mask_2d so variables like o_prog doesn't have the right mask, I reverted back to using nans as masked fill values and added the proper mask.
| atmosphere_names = ( | ||
| self.atmosphere.stepper.output_names | ||
| + self._atmosphere_forcing_exogenous_names | ||
| + self._shared_forcing_exogenous_names | ||
| ) |
There was a problem hiding this comment.
Shouldn't matter in practice, but agent review flagged that _shared_forcing_exogenous_names is a subset of _atmosphere_forcing_exogenous_names, so probably best to remove _shared_forcing_exogenous_names here.
fme/coupled/inference/loop.py
Outdated
| deriver: CoupledDeriver, | ||
| writer: CoupledPairedDataWriter | NullDataWriter | None = None, | ||
| record_logs: Callable[[InferenceLogs], None] | None = None, | ||
| restrict_to_output_names: CoupledNames | None = None, |
There was a problem hiding this comment.
Nit: Maybe call this all_names since it is no longer restricted to just output names.
prediction_loaderis an already supported feature in ace evaluator, we add this feature for coupled evaluator so that we can easily compare with target data on wandb.Resolves #805