-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Thompson Sampling Strategy for Gaussian with Unknown Mean + Variance #12
base: master
Are you sure you want to change the base?
Conversation
samples = [self._sample(**params) for params in self.posterior_params] | ||
return np.argmax(samples) | ||
|
||
def pull_arm(self, arm_index): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just realized this isn't private. Can you change that (or I can do it). Users shouldn't be interacting with bandit directly except through .fit()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. I can add that tonight
Attributes: | ||
num_arms (int) | ||
posterior_params (list of list): posterior parameters | ||
estimated_arm_means (ndarray): posterior predictive mean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's the posterior predictive mean; Isn't it just the posterior mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think they are the same in this case, since the posterior mean of the mean is also the posterior predictive mean. And we want to maximize the posterior predictive mean.
The variance/standard deviation is another story. The posterior variance of the mean is smaller than the posterior predictive variance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohhh yea i see. yea they were same since wes & i only considered returning the mean. hmm i dunno what's best here
""" | ||
def __init__(self, bandit, **kwargs): | ||
self.bandit = bandit | ||
self.num_arms = bandit.num_arms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'd like to make this into a property to keep style consistent.
|
||
@property | ||
def estimated_arm_sds(self): | ||
return np.array([self.sigma2 + params['sigma2'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it the sum of these two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was in the first commit and fixed in the second.
Originally, I wanted estimated_arm_sds
to be the estimated arm standard deviation (posterior predictive standard deviation). The sum was changed to just params['sigma2']
in the second commit.
Don't forget there's a 'view all changes' (so you don't have to look at commits one-by-one)
Made some unnecessary minor changes (property + private function). Currently it is (2). The goal is to give the user insight into why different arms may be pulled (max of samples from arm posteriors), hence why its the standard deviation of the posterior. The name might be a bit misleading. |
hmm, ya i dunno what's best between posterior predictive SD or posterior SD. would it be fine to return both, but name them as such? |
Also moved
strategy.py
into a subfolder and split it out a bit.Note: I found that selecting prior parameters is important.
This resolves #7