Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Thompson Sampling Strategy for Gaussian with Unknown Mean + Variance #12

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

aicherc
Copy link
Member

@aicherc aicherc commented Mar 14, 2017

Also moved strategy.py into a subfolder and split it out a bit.

Note: I found that selecting prior parameters is important.

This resolves #7

@aicherc aicherc changed the title Add Thompson Sampling for Gaussian with Unknown Variance Added Thompson Sampling Strategy for Gaussian with Unknown Mean + Variance Mar 14, 2017
samples = [self._sample(**params) for params in self.posterior_params]
return np.argmax(samples)

def pull_arm(self, arm_index):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized this isn't private. Can you change that (or I can do it). Users shouldn't be interacting with bandit directly except through .fit()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. I can add that tonight

Attributes:
num_arms (int)
posterior_params (list of list): posterior parameters
estimated_arm_means (ndarray): posterior predictive mean
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's the posterior predictive mean; Isn't it just the posterior mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they are the same in this case, since the posterior mean of the mean is also the posterior predictive mean. And we want to maximize the posterior predictive mean.
The variance/standard deviation is another story. The posterior variance of the mean is smaller than the posterior predictive variance

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh yea i see. yea they were same since wes & i only considered returning the mean. hmm i dunno what's best here

"""
def __init__(self, bandit, **kwargs):
self.bandit = bandit
self.num_arms = bandit.num_arms
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd like to make this into a property to keep style consistent.


@property
def estimated_arm_sds(self):
return np.array([self.sigma2 + params['sigma2']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it the sum of these two?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was in the first commit and fixed in the second.
Originally, I wanted estimated_arm_sds to be the estimated arm standard deviation (posterior predictive standard deviation). The sum was changed to just params['sigma2'] in the second commit.

Don't forget there's a 'view all changes' (so you don't have to look at commits one-by-one)

@aicherc
Copy link
Member Author

aicherc commented Mar 15, 2017

Made some unnecessary minor changes (property + private function).
Question remains whether estimated_arm_sds should return a numpy array with (1) arm reward standard deviation estimates or (2) arm mean posterior's standard deviation.

Currently it is (2). The goal is to give the user insight into why different arms may be pulled (max of samples from arm posteriors), hence why its the standard deviation of the posterior.

The name might be a bit misleading.

@kyleclo
Copy link
Member

kyleclo commented Mar 15, 2017

hmm, ya i dunno what's best between posterior predictive SD or posterior SD. would it be fine to return both, but name them as such?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Thompson Sampling for Unknown Mean + Variance
2 participants