Skip to content

Latest commit

 

History

History
209 lines (124 loc) · 14.2 KB

reading-list.md

File metadata and controls

209 lines (124 loc) · 14.2 KB

Reading list

Hope this list is helpful. If I forgot any topics please let me know!

Quick links

R packages from the Stan development team

  • rstan, the R interface to Stan
  • rstanarm provides a traditional R formula interface for fitting common applied regression models with Stan, without having to write the Stan code yourself
  • bayesplot provides plotting functions for use after fitting a model
  • shinystan provides interactive tables and visualizations in a GUI
  • loo provides tools for model comparison and averaging
  • brms is similar to rstanarm with several advantages (more models are implemented, Stan code is simpler to read) and several disadvantages (models not pre-compiled, Stan code is less robust to numerical problems)
  • rstantools tools for developing R packages interfacing with Stan
  • projpred is for projection predictive variable selection, which is described in this paper: http://arxiv.org/abs/1508.02502

You can also find many R packages developed by Stan users that fit Stan models for you. Check out the list of packages that depend on the rstan package at cran.r-project.org/package=rstan (scroll down to the Reverse dependencies section).

Workflow

Hamiltonian Monte Carlo (HMC) and related background

Chi Feng's interactive MCMC demos that we used in class:

  • The Markov-chain Monte Carlo Interactive Gallery (website)

I highly recommend my Stan colleague Michael Betancourt's intro to HMC paper. Michael has a lot of very technical papers about HMC but this one is primarily focused on providing intuition (e.g., he has a whole section on the connection between HMC and the physics of planetary motion that I mentioned briefly in class):

  • A Conceptual Introduction to Hamiltonian Monte Carlo (paper)

This next paper is aimed at ecologists, but the HMC explanation is well written and is worth reading regardless of your field of work/study:

  • Faster Estimation of Bayesian Models in Ecology using Hamiltonian Monte Carlo (paper)

This case study from Stan developer Bob Carpenter uses simple simulations to demonstrate how things get strange (and challenging) very quickly as the number of dimensions grows due to the tension between probability density at the mode and volume in the tails:

  • Typical Sets and the Curse of Dimensionality (case study)

Diagnostics and reparameterization

Miscellaneous thoughts on priors

Heteroscedasticity and collinearity

Heteroscedasticity

This is only a problem if your model lacks important structure. The generative modeling perspective provides a simple solution to this problem: build a model that allows for different amounts of variability in different subpopulations:

Collinearity

  • Informative priors on the relevant regression coefficients will help a lot
  • The QR reparameterization) helps avoid computational issues when you have highly correlated predictors.

Visualization and graphical model checking

This is my paper (with many great coauthors!) that most of the course slides were based off of:

  • Visualization in Bayesian Workflow (paper, code)

We also have some vignettes for the bayesplot package that demonstrate many of the important graphical model checks:

Time series & spatial models

Measurement error & missing data

Survival (duration) analysis

Some Stan users have written Python and R libraries to help fit certain survival models using Stan:

Model comparison, predictive performance, variable selection

The loo package has several useful vignettes that Aki Vehtari and I recently updated for version 2.0.0:

Aki Vehtari also has a bunch of tutorials online as well as some blog posts on the topic:

Papers from various authors (published in journals but I'm including links to the free arXiv preprint versions):

  • Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC (arXiv, R package)
  • Understanding predictive information criteria for Bayesian models (arXiv)
  • Projection predictive variable selection using Stan+R (arXiv, R package)
  • Using stacking to average Bayesian predictive distributions (arXiv)
  • Comparison of Bayesian predictive methods for model selection (arXiv)

Mixture models

Gaussian processes

Horseshoe and other hierarchical shrinkage priors

Discrete choice

Condition logit has different meanings in different fields. What we call conditional logit is implemented in the rstanarm package:

Multinomial logit is a common discrete choice model (which may sometimes also be referred to as conditional logit in a small number of fields):

Why do Bayesian modeling?

Some blog posts on the topic from various authors:

Automatic differentiation & Stan's math library

Several Stan developers wrote a paper about the custom implementation of autodiff developed for Stan:

  • The Stan Math Library: Reverse-Mode Automatic Differentiation in C++. arXiv 1509.07164

Current limitations of Stan

Here's a wiki page where we list a lot of things we want to add to Stan going forward. Many of these things are already in progress, but this should help give a sense of some of the current limitations: