This is an bandit experiment that implements different exploration techniques for a 10-arm testbed as described in the Reinforcement Learning Book by Sutton & Barto.
The exploration techniques covered include:
- ε-greedy
- Optimistic Initialization
- UCB Exploration
- Boltzmann (Softmax) Exploration
This experiment further compares the different exploration techniques and concludes on which is better to use in different settings.