Using the runner in replay mode introduced some difference in decimals in reward comparison.
This is a separate notion from "reward_significant_digit" of config.ini
In the commit I suggested to introduce a new parameter in the config.ini: replay_reward_rel_tolerance
This parameter provides a configurable threshold in expected cumulated reward relative comparison with the replayed cumulated reward.
All the test currently pass with replay_reward_rel_tolerance = 1e-7
With 500 timestep generation, it pass with replay_reward_rel_tolerance = 1e-4
Be careful to rise this threshold when rising max_iter.
This slight difference is negligible compared to differences between KPIs