Algorithm catalog

nnabla-rl offers various (deep) reinforcement learning and optimal control algorithms. See the list below for the implemented algorithms!

Reinforcement learning algorithms

Online training: Training which is performed by interacting with the environment. You'll need to prepare an environment which is compatible with the OpenAI gym's environment interface.
Offline(Batch) training: Training which is performed sorely from provided data. You'll need to prepare a dataset capsuled with the ReplayBuffer.
Continuous/Discrete action: If you are familiar with the training of deep neural nets, the action type's difference is similar to the difference of regression and classification. Continuous action is an action which consists of real value(s) (e.g. robot's motor torque). In contrast, discrete action is an action which can be labeled (e.g. UP, DOWN, RIGHT, LEFT). The choice of action type depends on the environment (problem) and applicable algorithm changes depending on the its action type.
Hybrid action: Hybrid action is an environment that requires both discrete and continuous action in pairs.
RNN layer support: Supports training of network models with recurrent layers.

Algorithm	Online training	Offline(Batch) training	Continuous action	Discrete action	Hybrid action	RNN layer support
A2C	✔️	❌	(We will support continuous action in the future)	✔️	❌	❌
AMP	✔️	❌	✔️	❌	❌	❌
ATRPO	✔️	❌	✔️	(We will support discrete action in the future)	❌	❌
BCQ	❌	✔️	✔️	❌	❌	❌
BEAR	❌	✔️	✔️	❌	❌	❌
Categorical DDQN	✔️	✔️	❌	✔️	❌	✔️
Categorical DQN	✔️	✔️	❌	✔️	❌	✔️
DDPG	✔️	✔️	✔️	❌	❌	✔️
DDQN	✔️	✔️	❌	✔️	❌	✔️
DecisionTransformer	❌	✔️	✔️	✔️	❌	❌
DEMME-SAC	✔️	✔️	✔️	❌	❌	✔️
DQN	✔️	✔️	❌	✔️	❌	✔️
DRQN	✔️	✔️	❌	✔️	❌	✔️
GAIL	✔️	❌	✔️	(We will support discrete action in the future)	❌	❌
HER	✔️	✔️	✔️	❌	❌	✔️
HyAR	✔️	❌	❌	❌	✔️	❌
IQN	✔️	✔️	❌	✔️	❌	✔️^*
MME-SAC	✔️	✔️	✔️	❌	❌	✔️
M-DQN	✔️	✔️	❌	✔️	❌	✔️
M-IQN	✔️	✔️	❌	✔️	❌	✔️
Option Critic Architecture	✔️	❌	(We will support continuous action in the future)	✔️	❌	❌
PPO	✔️	❌	✔️	✔️	❌	❌
QRSAC	✔️	✔️	✔️	❌	❌	✔️
QRDQN	✔️	✔️	❌	✔️	❌	❌
QtOpt (ICRA 2018 version)	✔️	✔️	✔️	❌	❌	✔️
Rainbow	✔️	✔️	❌	✔️	❌	✔️
REDQ	✔️	✔️	✔️	❌	❌	✔️
REINFORCE	✔️	❌	✔️	✔️	❌	❌
SAC	✔️	✔️	✔️	❌	❌	✔️
SAC (ICML 2018 version)	✔️	✔️	✔️	❌	❌	✔️
SAC-D	✔️	✔️	✔️	❌	❌	✔️
SRSAC	✔️	✔️	✔️	❌	❌	✔️
TD3	✔️	✔️	✔️	❌	❌	✔️
TRPO	✔️	❌	✔️	(We will support discrete action in the future)	❌	❌
TRPO (ICML 2015 version)	✔️	❌	✔️	✔️	❌	❌
XQL	❌	✔️	✔️	❌	❌	✔️

^*May require special treatment to train with RNN layers.

Optimal control algorithms

Need training: Most of the optimal control algorithm does NOT require training to run the controller. Instead, you will need the dynamics model of the system and cost function of the task in prior to the execution of the algorithm. See the documentation of each algorithm for the detail.
Continuous/Discrete action: Same as reinfocement learning. However, most of the optimal control algorithm does not support discrete action.

Algorithm	Need training	Continuous action	Discrete action
DDP	not required	✔️	❌
iLQR	not required	✔️	❌
LQR	not required	✔️	❌
MPPI	may train a dynamics model	✔️	❌

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Algorithm catalog

Reinforcement learning algorithms

Optimal control algorithms

Files

README.md

Latest commit

History

README.md

File metadata and controls

Algorithm catalog

Reinforcement learning algorithms

Optimal control algorithms