README.md

Assignment 9 (2018-2019)

Advantage Actor-Critic (A2C) on Atari-2600 games

Feel free to experiment different architectures and preprocessing methods by changing a2c.py and preprocessing.py.

Unfortunately I couldn't obtain good results with this implementation/architecture/hyperparameters. More investigation is needed.

Value-based methods have high variability. To reduce this we can use the advantage function instead of the value function.
The advantage function is defined like this: A(s, a) = Q(s, a) - V(s). It tells us the improvement compared to the average score of actions taken at that state.
The problem of implementing this advantage function is that it requires 2 value functions - Q(s, a) and V(s), but we can use the TD error as a good estimator of the advantage function: A(s, a) = r + gamma * V(s') - V(s).