Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discount Factor Possibly Applied Twice #13

Open
Silent-Zebra opened this issue Oct 16, 2024 · 0 comments
Open

Discount Factor Possibly Applied Twice #13

Silent-Zebra opened this issue Oct 16, 2024 · 0 comments

Comments

@Silent-Zebra
Copy link
Owner

Thanks to @cool-RR for pointing this out.

get_gae_advantages already includes discount factors, then is later multiplied by cum_discount, which is another discount factor. Thus, the discount factor is counted twice. I may have been confused by the bottom of page 5 in the Loaded DiCE paper (Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Estimators for Reinforcement Learning), where the formula includes both a cumulative discount and an advantage (which I took to be the GAE which includes discount), and didn't follow the common practice of omitting the cumulative discount.

The implication of this is that my discount factor may be applied twice, compared to common practice, so for a given discount factor e.g. 0.96, there might be more discounting going on than you might otherwise expect. I expect that this doesn't materially change overall results, though it might affect learning dynamics, and might be confusing or inconsistent when comparing discount factors with other codebases.

I'm leaving things as is, even though the fix is very quick (e.g. just remove cum_discount in https://github.com/Silent-Zebra/POLA/blob/master/jax_files/POLA_dice_jax.py#L85), because I don't have time to rerun experiments now, and also I don't expect results to materially change (and even if they do, I could likely use a different discount factor to get similar results).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant