My re-implementation of the six reinforcement learning algorithms featured in OpenAI's Spinning Up.
Save for the implementation of the first algorithm, VPG, I generally tried to implement everything without looking at the reference code, using only the pseudocode on the Spinning Up site and the original whitepapers. Despite that I did borrow from the ActorCritic
class during while implementing VPG.
base contains packages that are shared by the actual algorithms. This includes the abstract base class Algorithm
, which implements the training loop itself. Each algorithm is responsible for implementing update
and act
. update
contains the logic for updating model parameters according the specification each specific algorithm.
Algorithm implementations:
VPG | TRPO | PPO | DDPG | TD3 | SAC |
---|
Below are benchmarks for each of the 6 algorithms in 6 Mujoco environments. Each agent was allowed to learn with 3 random seeds, each seed exposed to 3 million total frames per environment. These benchmarks can be compared to the Spinning Up Benchmarks. Note that these benchmarks are not fair to the on-policy algorithms: they generally require significantly more experience to reach a comparable level of performance to off-policy algorithms, and so in most cases one could expect the on-policy algorithms to display better performance with say 10 million frames of experience. It would probably be more fair to allow all algorithms a chance to converge rather than artificially restricting total experience. This is justifiable because on-policy algorithms are generally less computationally intensive per update and faster in terms of wall-clock time. However, the Spinning Up benchmarks use 3 million frames of experience, so I followed suit.
Links to gifs of agent behavior are also included. Note that there is no attempt to cherry-pick agents with the best-performing random seed or particularly good episodes. These videos were recorded from the first random seed for each algorithm/environment for three episodes.
VPG | TRPO | PPO | DDPG | TD3 | SAC | |
---|---|---|---|---|---|---|
Swimmer-v3 |
VPG | TRPO | PPO | DDPG | TD3 | SAC | |
---|---|---|---|---|---|---|
HalfCheetah-v3 |
VPG | TRPO | PPO | DDPG | TD3 | SAC | |
---|---|---|---|---|---|---|
Hopper-v3 |
VPG | TRPO | PPO | DDPG | TD3 | SAC | |
---|---|---|---|---|---|---|
Walker2d-v3 |
VPG | TRPO | PPO | DDPG | TD3 | SAC | |
---|---|---|---|---|---|---|
Ant-v3 |
VPG | TRPO | PPO | DDPG | TD3 | SAC | |
---|---|---|---|---|---|---|
Humanoid-v1 |
The contents of this repository are based on OpenAI's spinningup repository.
@article{SpinningUp2018,
author = {Achiam, Joshua},
title = {{Spinning Up in Deep Reinforcement Learning}},
year = {2018}
}