You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Using the runner in replay mode introduced some difference in decimals in reward comparison.
This is a separate notion from "reward_significant_digit" of config.ini
In the commit I suggested to introduce a new parameter in the config.ini: replay_reward_rel_tolerance
This parameter provides a configurable threshold in expected cumulated reward relative comparison with the replayed cumulated reward.
All the test currently pass with replay_reward_rel_tolerance = 1e-7
With 500 timestep generation, it pass with replay_reward_rel_tolerance = 1e-4
Be careful to rise this threshold when rising max_iter.
This slight difference is negligible compared to differences between KPIs
The text was updated successfully, but these errors were encountered:
Using the runner in replay mode introduced some difference in decimals in reward comparison.
This is a separate notion from "reward_significant_digit" of config.ini
In the commit I suggested to introduce a new parameter in the config.ini: replay_reward_rel_tolerance
This parameter provides a configurable threshold in expected cumulated reward relative comparison with the replayed cumulated reward.
All the test currently pass with replay_reward_rel_tolerance = 1e-7
With 500 timestep generation, it pass with replay_reward_rel_tolerance = 1e-4
Be careful to rise this threshold when rising max_iter.
This slight difference is negligible compared to differences between KPIs
The text was updated successfully, but these errors were encountered: