Multi-Armed Bandit Policy Simulator

Overview

This library contains a set of bandit policies along with a graphing tool to help evaluate the relative performance between policies on varying arms and payoff rates. The following policies have already been implemented:

Standard A/B Test
Epsilon-Greedy
Annealing Epsilon-Greedy
Softmax
Annealing Softmax
UCB1

The following policies have yet to be implemented:

UCB2
Exp3
Thompson Sampling

Installing Pylab

sudo apt-get install python-matplotlib

Example Usage

from arms import BernoulliArm
from plotter import Plotter
from policies import *

Comparison between various Epsilon-Greedy policies:

arms = [BernoulliArm(mu) for mu in [.1, .1, .7, 1.0]]
policies = [EpsilonGreedy(step * 0.2) for step in range(6)]
plotter = Plotter(arms, policies)
plotter.plot_results(num_trials=1000, num_pulls=200, metric='reward')
plotter.plot_results(num_trials=1000, num_pulls=200, metric='cumulative_reward')

Comparison between alternative policies:

policies = [AnnealingEpsilonGreedy(), AnnealingSoftmax(), UCB1()]
plotter = Plotter(arms, policies)
plotter.plot_results(num_trials=1000, num_pulls=200, metric='reward')
plotter.plot_results(num_trials=1000, num_pulls=200, metric='cumulative_reward')

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
arms.py		arms.py
plotter.py		plotter.py
policies.py		policies.py
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Armed Bandit Policy Simulator

Overview

Installing Pylab

Example Usage

About

Releases

Packages

Languages

License

shazeline/mab-py

Folders and files

Latest commit

History

Repository files navigation

Multi-Armed Bandit Policy Simulator

Overview

Installing Pylab

Example Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages