I hope that this repository will be my personal framework for Deep Reinforcement Learning in future. Of course this remains to be seen.
- Deep Q Learning on Atari Pong.
- Add
wandb
for experiment tracking.
Entry points to experiments are in experiments
folder.
To train a pong agent:
$ python experiments/pong.py --train
To continue training from a model:
$ python experiments/pong.py --train -l checkpoints/step-<step_num>
To evaluate a model:
$ python experiments/pong.py --eval -l checkpoints/step-<step_num>
- Use PongDeterministic-v4, since this implemented frame skipping.
- Make sure to preprocess the image correctly into 84 x 84 and stack 4 frames into 4 channels.
- Learning rate must be as small as 0.00025. I missed a 0 once.
- Make sure in
target = rewards + self.gamma * expected_v * (1 - done)
, the (1 - done) factor is there so as to account for rewards of terminal states. We should not add the value of next states (represented byexpected_v
) if we are already in terminal states. - Make sure in
loss = self.criterion(estimated_q, target.unsqueeze(1))
the target and estimated Q value has the same dimensions (might be caused by improper shape of rewards) - Use Kaiming Initialization, since CNN use ReLU activation.
- Make sure policy model and target model are not referencing the same model object (deep copy).
- Make sure we gather across the right dimension in
estimated_q = policy_q.gather(1, actions)