PyTorch implementation of the state-of-the-art distributional reinforcement learning algorithm Fully Parameterized Quantile Function (FQF). Implementation includes DQN extensions with which FQF represents the most powerful Rainbow version - supports multi env for parallelization to reduce wall clock time. The FQF Baseline in this repository is already a Double FQF version with target network!
For details on the algorithm check the article on medium
Extension included:
- Prioritized Experience Replay Buffer (PER)
- Noisy Layer for exploration
- N-step Bootstrapping
- Dueling Version
- Munchausen RL
- Parallelization with multi environments. 4 parallel environments reduced the wall clock time for the CartPole environment to less than 1/3.
Trained and tested on:
Python 3.6 PyTorch 1.4.0 Numpy 1.15.2 gym 0.10.11
With the script version it is possible to train on simple environments like CartPole-v0 and LunarLander-v2 or on Atari games with image inputs!
To run the script version execute in your command line:
python run.py -info fqf_run1
To run the script version on the Atari game Pong:
python run.py -env PongNoFrameskip-v4 -info fqf_pong1
To see the options:
python run.py -h
-agent, choices=["iqn","fqf+per","noisy_fqf","noisy_fqf+per","dueling","dueling+per", "noisy_dueling","noisy_dueling+per"], Specify which type of FQF agent you want to train, default is FQF - baseline!
-env, Name of the Environment, default = CartPole-v0
-frames, Number of frames to train, default = 60000
-eval_every, Evaluate every x frames, default = 1000
-eval_runs, Number of evaluation runs, default = 5"
-seed, Random seed to replicate training runs, default = 1
-N, Number of quantiles, default = 32
-ec, --entropy_coeff, Entropy coefficient, default = 0.001
-bs, --batch_size, Batch size for updating the DQN, default = 32
-layer_size, Size of the hidden layer, default=512
-n_step, Multistep IQN, default = 1
-m, --memory_size, Replay memory size, default = 1e5
-munchausen, choices=[0,1], Use Munchausen RL loss for training if set to 1 (True), default = 0
-lr, Learning rate, default = 5e-4
-g, --gamma, Discount factor gamma, default = 0.99
-t, --tau, Soft update parameter tat, default = 1e-3
-eps_frames, Linear annealed frames for Epsilon, default = 5000
-min_eps, Final epsilon greedy value, default = 0.025
-w , --worker, Number of parallel environments. performance for more than 4 worker can be unstable since batchsize increased proportionally, default = 0
-info, Name of the training run
-save_model, choices=[0,1] Specify if the trained network shall be saved or not, default is 0 - not saved!
tensorboard --logdir=runs
200000 Frames (~54 min), eps_frames: 20000, eval_every: 5000
800000 Frames (IQN: ~95 min 3 worker, FQF: ~240 min 2 worker) Authors of the paper say: FQF is roughly 20% slower than IQN due to the additional fraction proposal network. Also IQN uses N=8 and FQF N=32 quantiles!
hyperparameter:
- frames 800000
- eps_frames 80000
- min_eps 0.025
- lr 2e-4
- tau 1e-3
- m 20000
- gamma 0.99
- layer_size 512
Im open for feedback, found bugs, improvements or anything. Just leave me a message or contact me.
Big thank you also to Toshiki Watanabe who helped me with the implementation and where I have the training routine for the fraction proposal network from! His Repo
- Sebastian Dittert
Feel free to use this code for your own projects or research. For citation:
@misc{FQF and Extensions,
author = {Dittert, Sebastian},
title = {Fully Parameterized Quantile Function (FQF) and Extensions},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/BY571/FQF-and-Extensions}},
}