[DRAFT][DON'T MERGE] Enable learning from multiple priors #5439
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Proposed change(s)
This PR adds a feature that was inspired by the approach used by DeepMind in this paper. By specifying (i.e. giving the directory to that run id) multiple different policies as "priors" in the YAML file, the code will load those policies and use them as regularization priors for the learning policy. This has proven to be effective in Dodgeball, where training with two priors (shooting and flag-getting) leads to a skilled policy in roughly 1/3 the time. See ELO plot below.
This PR also contains a version of WallJump that can be broken down into subtasks.
TODO
Ideally in order for this to be a feature, we'd want too add these components:
Types of change(s)
Checklist
Other comments