Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DRAFT][DON'T MERGE] Enable learning from multiple priors #5439

Draft
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

ervteng
Copy link
Contributor

@ervteng ervteng commented Jun 23, 2021

Proposed change(s)

This PR adds a feature that was inspired by the approach used by DeepMind in this paper. By specifying (i.e. giving the directory to that run id) multiple different policies as "priors" in the YAML file, the code will load those policies and use them as regularization priors for the learning policy. This has proven to be effective in Dodgeball, where training with two priors (shooting and flag-getting) leads to a skilled policy in roughly 1/3 the time. See ELO plot below.

image

This PR also contains a version of WallJump that can be broken down into subtasks.

TODO

Ideally in order for this to be a feature, we'd want too add these components:

  • Add to PPO and SAC (currently only in POCA)
  • Allow checkpoints with different network architectures (e.g. different num_layers) to be handled properly (I don't see how we could change the obs space and action space, though).
  • Solve the entropy issue (entropy seems to increase when more than one prior is specified, probably b/c it's trying to learn a multimodal policy.
  • Documentation and tests

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

@miguelalonsojr miguelalonsojr self-requested a review January 18, 2022 22:53
@miguelalonsojr miguelalonsojr self-assigned this Jan 18, 2022
@miguelalonsojr miguelalonsojr requested review from jrupert-unity and removed request for miguelalonsojr and jrupert-unity January 21, 2022 18:35
@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Ervin Teng seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants