Skip to content

Slight differences in rewards significative digits in replay mode #21

@NMegel

Description

@NMegel

Using the runner in replay mode introduced some difference in decimals in reward comparison.
This is a separate notion from "reward_significant_digit" of config.ini

In the commit I suggested to introduce a new parameter in the config.ini: replay_reward_rel_tolerance

This parameter provides a configurable threshold in expected cumulated reward relative comparison with the replayed cumulated reward.

All the test currently pass with replay_reward_rel_tolerance = 1e-7

With 500 timestep generation, it pass with replay_reward_rel_tolerance = 1e-4
Be careful to rise this threshold when rising max_iter.

This slight difference is negligible compared to differences between KPIs

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions