Slight differences in rewards significative digits in replay mode #21

NMegel · 2021-03-01T14:27:51Z

Using the runner in replay mode introduced some difference in decimals in reward comparison.
This is a separate notion from "reward_significant_digit" of config.ini

In the commit I suggested to introduce a new parameter in the config.ini: replay_reward_rel_tolerance

This parameter provides a configurable threshold in expected cumulated reward relative comparison with the replayed cumulated reward.

All the test currently pass with replay_reward_rel_tolerance = 1e-7

With 500 timestep generation, it pass with replay_reward_rel_tolerance = 1e-4
Be careful to rise this threshold when rising max_iter.

This slight difference is negligible compared to differences between KPIs

NMegel mentioned this issue Mar 1, 2021

Save Agent Replay logs #20

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slight differences in rewards significative digits in replay mode #21

Slight differences in rewards significative digits in replay mode #21

NMegel commented Mar 1, 2021

Slight differences in rewards significative digits in replay mode #21

Slight differences in rewards significative digits in replay mode #21

Comments

NMegel commented Mar 1, 2021