Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About the TD(lambda) computation #109

Open
opocaj92 opened this issue Jun 10, 2021 · 2 comments
Open

About the TD(lambda) computation #109

opocaj92 opened this issue Jun 10, 2021 · 2 comments

Comments

@opocaj92
Copy link

I was having a look at the build_td_lambda_targets function into utils/rl_utils.py, and I was wondering if line 8:
ret[:, -1] = target_qs[:, -1] * (1 - th.sum(terminated, dim=1))
is really correct.
This line should initialize the TD(lambda) targets for a given episode's last step (and the episode could possibly not been finished yet), but shouldn't it be something like (as it is in the for loop after line 8):
ret[:, -1] = target_qs[:, -1] * (1 - terminated[:, -1])
so that, for each batch sample and each agent, the last time step is really multiplied by its own "boolean flag" terminated rather than a sum of these (that does not seem to make sense to me here). Am I interpreting your code wrong?

@hijkzzz
Copy link

hijkzzz commented Aug 6, 2021

For PyMARL, every episodes are terminated before training, so there is no problem as described above.
We also recommend our finetuned qmix: https://github.com/hijkzzz/pymarl2.

@opocaj92
Copy link
Author

Yes, I agree. Not 100% sure if this IN PRACTICE is going to make any difference in the computed results when using SMAC, still I think that CONCEPTUALLY it is wrong (if my understanding of the code is right of course). This can also lead to some computational errors when using other environments than SMAC that may have a different handling of epidoses ending.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants