Skip to content

Latest commit

 

History

History
64 lines (56 loc) · 7.12 KB

README.md

File metadata and controls

64 lines (56 loc) · 7.12 KB

Algorithm catalog

nnabla-rl offers various (deep) reinforcement learning and optimal control algorithms. See the list below for the implemented algorithms!

Reinforcement learning algorithms

  • Online training: Training which is performed by interacting with the environment. You'll need to prepare an environment which is compatible with the OpenAI gym's environment interface.
  • Offline(Batch) training: Training which is performed sorely from provided data. You'll need to prepare a dataset capsuled with the ReplayBuffer.
  • Continuous/Discrete action: If you are familiar with the training of deep neural nets, the action type's difference is similar to the difference of regression and classification. Continuous action is an action which consists of real value(s) (e.g. robot's motor torque). In contrast, discrete action is an action which can be labeled (e.g. UP, DOWN, RIGHT, LEFT). The choice of action type depends on the environment (problem) and applicable algorithm changes depending on the its action type.
  • Hybrid action: Hybrid action is an environment that requires both discrete and continuous action in pairs.
  • RNN layer support: Supports training of network models with recurrent layers.
Algorithm Online training Offline(Batch) training Continuous action Discrete action Hybrid action RNN layer support
A2C ✔️ (We will support continuous action in the future) ✔️
AMP ✔️ ✔️
ATRPO ✔️ ✔️ (We will support discrete action in the future)
BCQ ✔️ ✔️
BEAR ✔️ ✔️
Categorical DDQN ✔️ ✔️ ✔️ ✔️
Categorical DQN ✔️ ✔️ ✔️ ✔️
DDPG ✔️ ✔️ ✔️ ✔️
DDQN ✔️ ✔️ ✔️ ✔️
DecisionTransformer ✔️ ✔️ ✔️
DEMME-SAC ✔️ ✔️ ✔️ ✔️
DQN ✔️ ✔️ ✔️ ✔️
DRQN ✔️ ✔️ ✔️ ✔️
GAIL ✔️ ✔️ (We will support discrete action in the future)
HER ✔️ ✔️ ✔️ ✔️
HyAR ✔️ ✔️
IQN ✔️ ✔️ ✔️ ✔️*
MME-SAC ✔️ ✔️ ✔️ ✔️
M-DQN ✔️ ✔️ ✔️ ✔️
M-IQN ✔️ ✔️ ✔️ ✔️
Option Critic Architecture ✔️ (We will support continuous action in the future) ✔️
PPO ✔️ ✔️ ✔️
QRSAC ✔️ ✔️ ✔️ ✔️
QRDQN ✔️ ✔️ ✔️
QtOpt (ICRA 2018 version) ✔️ ✔️ ✔️ ✔️
Rainbow ✔️ ✔️ ✔️ ✔️
REDQ ✔️ ✔️ ✔️ ✔️
REINFORCE ✔️ ✔️ ✔️
SAC ✔️ ✔️ ✔️ ✔️
SAC (ICML 2018 version) ✔️ ✔️ ✔️ ✔️
SAC-D ✔️ ✔️ ✔️ ✔️
SRSAC ✔️ ✔️ ✔️ ✔️
TD3 ✔️ ✔️ ✔️ ✔️
TRPO ✔️ ✔️ (We will support discrete action in the future)
TRPO (ICML 2015 version) ✔️ ✔️ ✔️
XQL ✔️ ✔️ ✔️

*May require special treatment to train with RNN layers.

Optimal control algorithms

  • Need training: Most of the optimal control algorithm does NOT require training to run the controller. Instead, you will need the dynamics model of the system and cost function of the task in prior to the execution of the algorithm. See the documentation of each algorithm for the detail.
  • Continuous/Discrete action: Same as reinfocement learning. However, most of the optimal control algorithm does not support discrete action.
Algorithm Need training Continuous action Discrete action
DDP not required ✔️
iLQR not required ✔️
LQR not required ✔️
MPPI may train a dynamics model ✔️