-
Notifications
You must be signed in to change notification settings - Fork 170
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Summary: This diff adds k-step learning to our codebase and adds generalized advantage estimation (gae). In our current codebase, learning can happen either after each episode or every environment step. However, some algorithms, like PPO, learn every k steps, where k>1. This diff enables learning every k environment steps by adding a new field in the online_learning function. To support this new feature, this diff modifies OnPolicyEpisodicReplayBuffer. The current implementation of it is only compatible when learning happens after each episode. This diff makes it compatible with k-step learning. Some necessary changes to PPO and REINFORCE are also made. Note that in the online_learning function, learn_every_k_steps would only be effective when learn_after_episode is False. And the default value of learn_every_k_steps is 1. In this way this diff does not change the behavior of external code relying on our online_learning function. This diff also adds GAE to PPO. Current PPO implementation uses n-step returns to compute advantages for policy updates. The original PPO algorithm uses GAE, which is the difference between the truncated lambda return and the current value estimate. Reviewed By: rodrigodesalvobraz Differential Revision: D53838214 fbshipit-source-id: 216ceb6584a1a4c156fe5f29562f0ddd1c9970eb
- Loading branch information
1 parent
c189690
commit 611928a
Showing
18 changed files
with
419 additions
and
254 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.