Skip to content

A finite memory automaton for static and dynamic two-armed Bernoulli bandit problems

Notifications You must be signed in to change notification settings

raoariel/finite-memory-automaton

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

A Finite Memory Automaton for Static and Dynamic Two-Armed Bernoulli Bandit Problems

Abstract

The multi-armed bandit problem is characterized by a repeated decision of selecting an arm amongst a set of arms with unknown payoff probabilities; the payoff probabilities may remain constant or change over time. An agent gains information about the payoff probabilities from past actions and aims to learn an optimal arm selection strategy for maximal payoff. The two-armed Bernoulli bandit (TABB) problem is a special class of multi-armed bandit problems where there are exactly two arms with payoff structures that follow Bernoulli distributions. Existing approaches to the TABB primarily rely on perfect recall of past actions to generate estimates for arm payoff probabilities; it is further assumed that the decision maker knows a priori whether arm payoff probabilities can change. We present a different approach based on finite automata which demonstrates that an agent can learn a low regret strategy without knowing whether arm payoff probabilities are static or dynamic and without having perfect recall of past actions. Roughly speaking, the automaton works by maintaining a relative ranking of arms based on payoff probabilities rather than estimating precise payoff probabilities.

Full paper available upon request. Contact arielrao[at]cmu.edu.

About

A finite memory automaton for static and dynamic two-armed Bernoulli bandit problems

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published