DDQN with PyTorch for OpenAI Gym

Implementation of Double DQN reinforcement learning for OpenAI Gym environments with discrete action spaces. Performance is defined as the sample efficiency of the algorithm i.e. how good is the average reward after using x episodes of interaction in the environment for training.
The related paper can be found here: Hasselt, 2010

Double DQN

The standard DQN method has been shown to overestimate the true Q-value, because for the target an argmax over estimated Q-values is used. Therefore when some values are overestimated and some underestimated, the overestimated values have a higher probability to be selected.

Standard DQN target:
Q(s_t, a_t) = r_t + Q(s_t+1, argmax_aQ(s_t, a))

By using two uncorralated Q-Networks we can prevent this overestimation. In order to save computation time we do gradient updates only for one of the Q-Networks and periodically update the parameters of the target Q-Network to match the parameter of the Q-Network that is updated.

The Double DQN target then becomes:
Q(s_t, a_t) = r_t + Q_θ(s_t+1, argmax_aQ_target(s_t, a))

And the loss function is given by:
(Q(s_t, a_t) - Q_θ(s_t, a_t))^2

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
DDQN_discrete.py		DDQN_discrete.py
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DDQN with PyTorch for OpenAI Gym

Double DQN

About

Releases

Packages

Languages

License

bwarre471/DDQN-with-PyTorch-for-OpenAI-Gym

Folders and files

Latest commit

History

Repository files navigation

DDQN with PyTorch for OpenAI Gym

Double DQN

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages