Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slight differences in rewards significative digits in replay mode #21

Open
NMegel opened this issue Mar 1, 2021 · 0 comments
Open

Slight differences in rewards significative digits in replay mode #21

NMegel opened this issue Mar 1, 2021 · 0 comments

Comments

@NMegel
Copy link
Collaborator

NMegel commented Mar 1, 2021

Using the runner in replay mode introduced some difference in decimals in reward comparison.
This is a separate notion from "reward_significant_digit" of config.ini

In the commit I suggested to introduce a new parameter in the config.ini: replay_reward_rel_tolerance

This parameter provides a configurable threshold in expected cumulated reward relative comparison with the replayed cumulated reward.

All the test currently pass with replay_reward_rel_tolerance = 1e-7

With 500 timestep generation, it pass with replay_reward_rel_tolerance = 1e-4
Be careful to rise this threshold when rising max_iter.

This slight difference is negligible compared to differences between KPIs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant