Slight differences in rewards significative digits in replay mode

Using the runner in replay mode introduced some difference in decimals in reward comparison.
This is a separate notion from "reward_significant_digit" of config.ini

In the commit I suggested to introduce a new parameter in the config.ini: replay_reward_rel_tolerance

This parameter provides a configurable threshold in expected cumulated reward relative comparison with the replayed cumulated reward.

All the test currently pass with replay_reward_rel_tolerance = 1e-7

With 500 timestep generation, it pass with replay_reward_rel_tolerance = 1e-4
Be careful to rise this threshold when rising max_iter.

This slight difference is negligible compared to differences between KPIs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slight differences in rewards significative digits in replay mode #21

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Slight differences in rewards significative digits in replay mode #21

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions