DeepRL

Highly modularized implementation of popular deep RL algorithms by PyTorch. My principal here is to reuse as much components as possible through different algorithms and switch easily between classical control tasks like CartPole and Atari games with raw pixel inputs.

Implemented algorithms:

(Double/Dueling) Deep Q-Learning (DQN)
Categorical DQN (C51, Distributional DQN with KL Distance)
Quantile Regression DQN (Distributional DQN with Wasserstein Distance)
Synchronous Advantage Actor Critic (A2C)
Synchronous N-Step Q-Learning
Deep Deterministic Policy Gradient (DDPG, pixel & low-dim-state)
(Continuous/Discrete) Synchronous Proximal Policy Optimization (PPO, pixel & low-dim-state)
The Option-Critic Architecture (OC)
Action Conditional Video Prediction

Asynchronous algorithms below are removed in current version but can be found in v0.1.

Async Advantage Actor Critic (A3C)
Async One-Step Q-Learning
Async One-Step Sarsa
Async N-Step Q-Learning
Continuous A3C
Distributed Deep Deterministic Policy Gradient (Distributed DDPG, aka D3PG)
Parallelized Proximal Policy Optimization (P3O, similar to DPPO)

Support for PyTorch v0.3.x can be found in v0.2. Note all the figures are generated via this version. After the upgrade to PyTorch v0.4.0, I have only tested the classical control tasks.

Dependency

MacOS 10.12 or Ubuntu 16.04
PyTorch v0.4.0
Python 3.6, 3.5 or 2.7 (deprecated)
Core dependencies: pip install -e .
Optional: Roboschool, PyBullet

Usage

examples.py contains examples for all the implemented algorithms

Dockerfile contains an example environment (w/ pybullet, w/o roboschool, w/o GPU)

Please use this bibtex if you want to cite this repo

@misc{deeprl,
  author = {Shangtong, Zhang},
  title = {Modularized Implementation of Deep RL Algorithms in PyTorch},
  year = {2018},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/ShangtongZhang/DeepRL}},
}

Curves

Curves for CartPole are trivial so I didn't place it here, and there isn't any fixed random seed. The curves are generated in the same manner as OpenAI baselines (one run and smoothed by recent 100 episodes)

DQN

Categorical DQN

Quantile Regression DQN

A2C

N-Step Q-Learning

DDPG

PPO

OC

This is my synchronous option-critic implementation, not the original one.

Action Conditional Video Prediction

Left: One-step prediction Right: Ground truth

Prediction is sampled after 110K iterations, and I only implemented one-step training

References

Human Level Control through Deep Reinforcement Learning
Asynchronous Methods for Deep Reinforcement Learning
Deep Reinforcement Learning with Double Q-learning
Dueling Network Architectures for Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning
HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
Deterministic Policy Gradient Algorithms
Continuous control with deep reinforcement learning
High-Dimensional Continuous Control Using Generalized Advantage Estimation
Hybrid Reward Architecture for Reinforcement Learning
Trust Region Policy Optimization
Proximal Policy Optimization Algorithms
Emergence of Locomotion Behaviours in Rich Environments
Action-Conditional Video Prediction using Deep Networks in Atari Games
A Distributional Perspective on Reinforcement Learning
Distributional Reinforcement Learning with Quantile Regression
The Option-Critic Architecture
Some hyper-parameters are from DeepMind Control Suite, OpenAI Baselines and Ilya Kostrikov

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_deeprl.md

README_deeprl.md

DeepRL

Dependency

Usage

Curves

DQN

Categorical DQN

Quantile Regression DQN

A2C

N-Step Q-Learning

DDPG

PPO

OC

Action Conditional Video Prediction

References

Files

README_deeprl.md

Latest commit

History

README_deeprl.md

File metadata and controls

DeepRL

Dependency

Usage

Curves

DQN

Categorical DQN

Quantile Regression DQN

A2C

N-Step Q-Learning

DDPG

PPO

OC

Action Conditional Video Prediction

References