Skip to content

RStan Getting Started

bgoodri edited this page Nov 11, 2019 · 161 revisions

RStan is the R interface to Stan. For more information on Stan and its modeling language visit the Stan website at http://mc-stan.org/

Latest Version: 2.19.2   (July 2019)

Almost all install instructions below are for the aforementioned version of RStan, which requires that you have R version 3.4.0 or later. If necessary you can install the latest version of R from here.

In addition, we strongly recommend that you use RStudio version 1.2.x or later because it has great support for Stan.

Installation of RStan

To be on the safe side, it is sometimes necessary to remove any existing RStan via

remove.packages("rstan")
if (file.exists(".RData")) file.remove(".RData")

Then, restart R.

In most cases, you can simply type (exactly like this)

install.packages("rstan", repos = "https://cloud.r-project.org/", dependencies = TRUE)

However, if you use Linux or if getOption("pkgType") has been set to "source" or if R asks you whether you want to install the latest version of RStan from source, then go to the corresponding page for Windows, Mac, or Linux.

Checking the C++ Toolchain

In RStudio (preferably) or otherwise in R, execute once

pkgbuild::has_build_tools(debug = TRUE)

to check your C++ toolchain using the pkgbuild package that gets installed when you install RStan. If this line ultimately returns TRUE, then your C++ toolchain is properly installed and you can jump to the next section.

Otherwise,

  • If you use Windows and RStudio (recommended), a pop-up will appear asking if you want to install Rtools. Click Yes and wait until the installation is finished.
  • If you use Windows but not RStudio, a message will appear in the R console telling you to install Rtools. Further information to download and install rtools may be helpful, but you do NOT ordinarily need to continue to the section titled "Installing RStan from source".
  • If you use a Mac, a link will appear but do not click on it. Instead go here
  • If you use Linux (including Windows Subsystem for Linux), then go here.

If you follow the above instructions but are not successful you can get help from the Stan Discourse Forum but be sure to tell us what your operating system is, whether you use RStudio, and what is the output when you try the above.

Configuration of the C++ Toolchain

This step is optional, but it can result in compiled Stan programs that execute much faster than they otherwise would. Simply paste the following into R once

dotR <- file.path(Sys.getenv("HOME"), ".R")
if (!file.exists(dotR)) dir.create(dotR)
M <- file.path(dotR, ifelse(.Platform$OS.type == "windows", "Makevars.win", "Makevars"))
if (!file.exists(M)) file.create(M)
cat("\nCXX14FLAGS=-O3 -march=native -mtune=native",
    if( grepl("^darwin", R.version$os)) "CXX14FLAGS += -arch x86_64 -ftemplate-depth-256" else 
    if (.Platform$OS.type == "windows") "CXX11FLAGS=-O3 -march=corei7 -mtune=corei7" else
    "CXX14FLAGS += -fPIC",
    file = M, sep = "\n", append = TRUE)

However, be advised that setting the optimization level to O3 may cause problems for packages besides RStan and that, in rare cases, specifying -march=native -mtune=native may cause Stan programs to not work. If you ever need to change anything with your C++ toolchain configuration, you can execute

M <- file.path(Sys.getenv("HOME"), ".R", ifelse(.Platform$OS.type == "windows", "Makevars.win", "Makevars"))
file.edit(M)

How to Use RStan

The rest of this document assumes that you have already installed RStan by following the instructions above.

Loading the package

The package name is rstan (all lowercase), so we start by executing

library("rstan") # observe startup messages

As the startup message says, if you are using rstan locally on a multicore machine and have plenty of RAM to estimate your model in parallel, at this point execute

options(mc.cores = parallel::detectCores())

In addition, you should follow the second startup message that says to execute

rstan_options(auto_write = TRUE)

which allows you to automatically save a bare version of a compiled Stan program to the hard disk so that it does not need to be recompiled (unless you change it).

Finally, if you use Windows, there will be a third startup message saying to execute

Sys.setenv(LOCAL_CPPFLAGS = '-march=native')

which is not necessary if you followed the C++ toolchain configuration advice in the previous section.

Example 1: Eight Schools

This is an example in Section 5.5 of Gelman et al (2003), which studied coaching effects from eight schools. For simplicity, we call this example "eight schools."

We start by writing a Stan program for the model in a text file. If you are using RStudio version 1.2.x or greater, click on File -> New File -> Stan File . Otherwise, open your favorite text editor. Either way, paste in the following and save your work to a file called 8schools.stan in R's working directory (which can be seen by executing getwd())

// saved as 8schools.stan
data {
  int<lower=0> J;         // number of schools 
  real y[J];              // estimated treatment effects
  real<lower=0> sigma[J]; // standard error of effect estimates 
}
parameters {
  real mu;                // population treatment effect
  real<lower=0> tau;      // standard deviation in treatment effects
  vector[J] eta;          // unscaled deviation from mu by school
}
transformed parameters {
  vector[J] theta = mu + tau * eta;        // school treatment effects
}
model {
  target += normal_lpdf(eta | 0, 1);       // prior log-density
  target += normal_lpdf(y | theta, sigma); // log-likelihood
}

Be sure that your Stan programs ends in a blank line without any characters including spaces and comments.

In this Stan program, we let theta be a transformation of mu, eta, and tau instead of declaring theta in the parameters block, which allows the sampler will run more efficiently (see detailed explanation). We can prepare the data (which typically is a named list) in R with:

schools_dat <- list(J = 8, 
                    y = c(28,  8, -3,  7, -1,  1, 18, 12),
                    sigma = c(15, 10, 16, 11,  9, 11, 10, 18))

And we can get a fit with the following R command. Note that the argument to file = should point to where the file is on your file system unless you have put it in the working directory of R in which case the below will work.

fit <- stan(file = '8schools.stan', data = schools_dat)

The object fit, returned from function stan is an S4 object of class stanfit. Methods such as print, plot, and pairs are associated with the fitted result so we can use the following code to check out the results in fit. print provides a summary for the parameter of the model as well as the log-posterior with name lp__ (see the following example output). For more methods and details of class stanfit, see the help of class stanfit.

In particular, we can use the extract function on stanfit objects to obtain the samples. extract extracts samples from the stanfit object as a list of arrays for parameters of interest, or just an array. In addition, S3 functions as.array, as.matrix, and as.data.frame are defined for stanfit objects (using help("as.array.stanfit") to check out the help document in R).

print(fit)
plot(fit)
pairs(fit, pars = c("mu", "tau", "lp__"))

la <- extract(fit, permuted = TRUE) # return a list of arrays 
mu <- la$mu 

### return an array of three dimensions: iterations, chains, parameters 
a <- extract(fit, permuted = FALSE) 

### use S3 functions on stanfit objects
a2 <- as.array(fit)
m <- as.matrix(fit)
d <- as.data.frame(fit)

Example 2: Rats

The Rats example is also a popular example. For example, we can find the OpenBUGS version from here, which originally is from Gelfand et al (1990). The data are about the growth of 30 rats weekly for five weeks. In the following table, we list the data, in which we use x to denote the dates the data were collected. We can try this example using the linked data rats.txt and model code rats.stan.

Rat x=8 x=15 x=22 x=29 x=36 Rat x=8 x=15 x=22 x=29 x=36
1 151 199 246 283 320 16 160 207 248 288 324
2 145 199 249 293 354 17 142 187 234 280 316
3 147 214 263 312 328 18 156 203 243 283 317
4 155 200 237 272 297 19 157 212 259 307 336
5 135 188 230 280 323 20 152 203 246 286 321
6 159 210 252 298 331 21 154 205 253 298 334
7 141 189 231 275 305 22 139 190 225 267 302
8 159 201 248 297 338 23 146 191 229 272 302
9 177 236 285 350 376 24 157 211 250 285 323
10 134 182 220 260 296 25 132 185 237 286 331
11 160 208 261 313 352 26 160 207 257 303 345
12 143 188 220 273 314 27 169 216 261 295 333
13 154 200 244 289 325 28 157 205 248 289 316
14 171 221 270 326 358 29 137 180 219 258 291
15 163 216 242 281 312 30 153 200 244 286 324
y <- as.matrix(read.table('https://raw.github.com/wiki/stan-dev/rstan/rats.txt', header = TRUE))
x <- c(8, 15, 22, 29, 36)
xbar <- mean(x)
N <- nrow(y)
T <- ncol(y)
rats_fit <- stan('https://raw.githubusercontent.com/stan-dev/example-models/master/bugs_examples/vol1/rats/rats.stan')

Example 3: Anything

You can run many of the BUGS examples and some others that we have created in Stan by executing

model <- stan_demo()

and choosing an example model from the list that pops up. The first time you call stan_demo(), it will ask you if you want to download these examples. You should choose option 1 to put them in the directory where rstan was installed so that they can be used in the future without redownloading them. The model object above is an instance of class stanfit, so you can call print, plot, pairs, extract, etc. on it afterward.

More Help

More details about RStan can be found in the documentation including the vignette of package rstan. For example, using help(stan) and help("stanfit-class") to check out the help for function stan and S4 class stanfit.
And see Stan's modeling language manual for details about Stan's samplers, optimizers, and the Stan modeling language.

In addition, the Stan User's Mailing list can be used to discuss the use of Stan, post examples or ask questions about (R)Stan. When help is needed, it is important to provide enough information such as the following:

  • properly formatted syntax in the Stan modeling language
  • data
  • necessary R code
  • dump of error message using verbose=TRUE and cores=1 when calling the stan or sampling functions
  • information about R by using function sessionInfo() in R

References

  • Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2003). Bayesian Data Analysis, CRC Press, London, 2nd Edition.
  • Stan Development Team. Stan Modeling Language User's Guide and Reference Manual.
  • Gelfand, A. E., Hills S. E., Racine-Poon, A., and Smith A. F. M. (1990). "Illustration of Bayesian Inference in Normal Data Models Using Gibbs Sampling", Journal of the American Statistical Association, 85, 972-985.
  • Stan
  • R
  • BUGS
  • OpenBUGS
  • JAGS
  • Rcpp