Advantage Actor-Critic (A2C) on Atari-2600 games
- Install
TensorFlow
andOpenAI Gym
; - Modify hyperparameters, constants and game name in
config.py
; - Run
A2C_train.ipynb
to train a dqn with the settings from theconfig.py
; - Run
A2C_test.ipynb
to evaluate your model and to save some animations.
- Feel free to experiment different architectures and preprocessing methods by changing
a2c.py
andpreprocessing.py
.
Unfortunately I couldn't obtain good results with this implementation/architecture/hyperparameters. More investigation is needed.
- Value-based methods have high variability. To reduce this we can use the advantage function instead of the value function.
- The advantage function is defined like this: A(s, a) = Q(s, a) - V(s). It tells us the improvement compared to the average score of actions taken at that state.
- The problem of implementing this advantage function is that it requires 2 value functions - Q(s, a) and V(s), but we can use the TD error as a good estimator of the advantage function: A(s, a) = r + gamma * V(s') - V(s).