This code was written for a university exam and the scope was to test some exploration vs exploitation strategies for DDPG agents. Part 1: Key Concepts in RL
Part 2: Kinds of RL Algorithms
Part 3: Intro to Policy Optimization
Introduction to RL and Deep Q Networks
Deep Deterministic Policy Gradient
numpy
pytorch
gym
The code is set to work with the OpenAIGym's LunarLanderV2 environment.
In the main.py file you can set the strategy to use, the number of training episodes and tune other params.
The total number of steps per episode are limited to 300 to speed up the training.
A Multilayer Perceptron (MLP) is used as a function approximator
The actor and critic networks are implemented as described in the original DDPG's paper. (citations are blow)
Layer Normalization is used instead of BatchNorm after each ReLU activation layer.
@misc{lillicrap2019continuouscontroldeepreinforcement,
title={Continuous control with deep reinforcement learning},
author={Timothy P. Lillicrap and Jonathan J. Hunt and Alexander Pritzel and Nicolas Heess and Tom Erez and Yuval Tassa and David Silver and Daan Wierstra},
year={2019},
eprint={1509.02971},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/1509.02971},
}
@misc{mnih2013playingatarideepreinforcement,
title={Playing Atari with Deep Reinforcement Learning},
author={Volodymyr Mnih and Koray Kavukcuoglu and David Silver and Alex Graves and Ioannis Antonoglou and Daan Wierstra and Martin Riedmiller},
year={2013},
eprint={1312.5602},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/1312.5602},
}
@misc{ba2016layernormalization,
title={Layer Normalization},
author={Jimmy Lei Ba and Jamie Ryan Kiros and Geoffrey E. Hinton},
year={2016},
eprint={1607.06450},
archivePrefix={arXiv},
primaryClass={stat.ML},
url={https://arxiv.org/abs/1607.06450},
}