Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pystan example model - comparing two groups #108

Open
anmwinter opened this issue Jun 30, 2017 · 5 comments
Open

Pystan example model - comparing two groups #108

anmwinter opened this issue Jun 30, 2017 · 5 comments

Comments

@anmwinter
Copy link

Hello,

I asked over on the Pystan group about submitting juypter notebook example models using Pystan. I was directed to over here. I am in the process of moving our models into Pystan so this is a learning process for me.

I created a jupyter notebook here:
https://github.com/bioinfonm/bioinfonm.github.io/blob/master/_posts/pystan_musings_part1_img/pystan_three_centirues_english_grain_data.ipynb

The notebook, raw data, and images are all:
https://github.com/bioinfonm/bioinfonm.github.io/tree/master/_posts/pystan_musings_part1_img

I was wondering what was the best way to get this vetted and then hosted here as an example for PyStan.

Thank you for the time and consideration,
Ara

@bob-carpenter
Copy link
Contributor

The case studies eventually go on the web site repo. Your Stan model has lots of problems you can see in just this fragment:

parameters { //The primary parameters of interest that are to be estimated. 
  real mu1; // mean of y1
  ...
  real<lower=0> sigma1; // standard deviation of y1
  ...
}
model { // Where your priors and likelihood are specified. Uniform, cauchy, and normal 
        // priors might be a good place to start?
  mu1 ~ uniform(0, 30); // uniform prior, maybe try half-normal, exp, or half-cauchy
  ...
y1 ~ normal(mu1, sigma1);
...

The code itself has some problems:

  • if you put a uniform distribution on mu1, then you need to constrain the parameter to have matching lower and upper bounds---Stan models should have a finite log likelihood for all parameter values meeting the declared constraints
  • we recommend much more informative priors

The doc also has some issues

  • mu1 isn't the mean of y1, it's a location parameter
  • you don't want to doc the language in a program, such as what the parameters block is
  • you have lingering open-ended questions on the model---these are best left on the outside

@anmwinter
Copy link
Author

@bob-carpenter Thanks for the feedback! I'll work on correcting this. This is a learning process for me.

@bob-carpenter
Copy link
Contributor

For the moment, we're trying to keep the case studies to best practices recommendations for Stan. We're working on establishing a place for more community oriented sharing of work we wouldn't need to vet so closely. There are prior recommendations on the stan-dev/stan wiki and in the manual regression chapter.

@bob-carpenter
Copy link
Contributor

bob-carpenter commented Jun 30, 2017

You also don't need blocks with nothing in them and you can vectorize everything. This model should look like this:

data {
  int N[2];
  vector[N[1]] y1;
  vector[N[2]] y2;
}
parameters {
  vector[2] mu;
  vector<lower=0>[2] sigma;
}
model {
  mu ~ normal(0, 10);
  sigma ~ cauchy(0, 5);
  y1 ~ normal(mu[1], sigma[1]);
  y2 ~ normal(mu[2], sigma[2]);
}

It'd be even easier if we had ragged arrays.

@anmwinter
Copy link
Author

Thanks again @bob-carpenter ! I am working on how to vectorize data. I appreciate the model re-write.

ara

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants