Discount Factor Possibly Applied Twice #13

Silent-Zebra · 2024-10-16T19:23:23Z

Thanks to @cool-RR for pointing this out.

get_gae_advantages already includes discount factors, then is later multiplied by cum_discount, which is another discount factor. Thus, the discount factor is counted twice. I may have been confused by the bottom of page 5 in the Loaded DiCE paper (Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Estimators for Reinforcement Learning), where the formula includes both a cumulative discount and an advantage (which I took to be the GAE which includes discount), and didn't follow the common practice of omitting the cumulative discount.

The implication of this is that my discount factor may be applied twice, compared to common practice, so for a given discount factor e.g. 0.96, there might be more discounting going on than you might otherwise expect. I expect that this doesn't materially change overall results, though it might affect learning dynamics, and might be confusing or inconsistent when comparing discount factors with other codebases.

I'm leaving things as is, even though the fix is very quick (e.g. just remove cum_discount in https://github.com/Silent-Zebra/POLA/blob/master/jax_files/POLA_dice_jax.py#L85), because I don't have time to rerun experiments now, and also I don't expect results to materially change (and even if they do, I could likely use a different discount factor to get similar results).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Discount Factor Possibly Applied Twice #13

Discount Factor Possibly Applied Twice #13

Silent-Zebra commented Oct 16, 2024

Discount Factor Possibly Applied Twice #13

Discount Factor Possibly Applied Twice #13

Comments

Silent-Zebra commented Oct 16, 2024