EC500 Deep Learning Final Project
The aim of our team was to explore three different DQN reinforcement learning networks. First, we learned and implemented Deep Q-network (DQN). Then, to improve Q value estimation method, we implemented Double Deep Q-network (DDQN), after that, we managed to improve improve network structure, so we implemented Dueling Deep Q-network (Dueling DQN). In experimental part, we trained each model for more than 10 hours and recorded and plotted average reward and average Q value of each model as experimental results. Finally, we made comparison among these results and it showed that DDQN performed best; followed by DQN while Dueling network had the worst performance.
Playing Atari with Deep Reinforcement Learning
Human-level control through deep reinforcement learning
Deep Reinforcement Learning with Double Q-learning
Dueling Network Architectures for Deep Reinforcement Learning
OpenAI Car Racing-v0
We try DQN, Double DQN(DDQN) and dueling DQN. Please refer to the presentation for detailed algorithm explanation. Also, you should have some basic knowledge on Reinforcement Learning and Q-learning.
-
pip install -r requirements.txt
necessary module:
tensorflow
,pygame
,gym
,Box2D
,VC++ 14.0
... -
In DQN/DDQN/dueling DQN folder, run
python car_racing.py
-
If you'd like to utilize the trained model, switch
load_mdoel = True
in python car_racing.py -
On CPU, it takes about 8 hours to get a well-trained model.
The DQN,DDQN and dueling DQN have similar structures. Take DQN for example:
DQN/car_cacing.py
- main entrance, the executable file
DQN/dqn/agent.py
- DQN model
DQN/dqn/experience_replay.py
- experience replay
data/plot.py
- plot figures
DQN | DDQN | Dueling DQN | Human |
---|---|---|---|
755 | 784.95 | 737.35 | 216.35 |
The dueling DQN doesn't perform as well as we expected. Some speculated reasons are in the presentation.
Oct.15,2018 Project Proposal
Nov.19,2018 Project Progress
Dec.12,2018 Presentation
Dec.14,2018 Final Report
- car_racing.py is our gaming environment which we adopted from gym library.
- Three DQN networks were implemented by us where we chose same hyperparameters as those in the three main referencing papers.
- Three main DQN networks loss functions were modified by us.
- The experience_replay part is referenced from diegoalejogm's github.