diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 00000000..8b137891 --- /dev/null +++ b/.nojekyll @@ -0,0 +1 @@ + diff --git a/404.html b/404.html new file mode 100644 index 00000000..f099b779 --- /dev/null +++ b/404.html @@ -0,0 +1,84 @@ + + +
+ + + + +YEAR: 2021 +COPYRIGHT HOLDER: Imperial College of Science, Technology and Medicine ++ +
This vignette outlines errors that might be generated when parsing +mcstate2 DSL code, with more explanation about the error and how they +can be avoided. Don’t read this top to bottom as it’s quite boring! +However, if we get errors that benefit from more explanation about why +they’ve been thrown then we’ll expand on the contents here and arrange +for these to be linked from the thrown error directly.
+The error numbers are arbitrary after the first digit. The first +digit will correspond to different phases of the parsing:
+E1xx
- errors during parsing of individual
+expressionsE2xx
- errors when considering the system as a
+wholeE101
+We found an expression that is neither an assignment (with
+<-
) or a stochastic relationship (with
+~
)
Example:
+a + 1
+E102
+Invalid left hand side of a relationship operator (~
).
+Currently the left hand side must be symbol, though this will be relaxed
+in the future once we support an array syntax. However, you may not use
+things like numbers or function calls on the left hand side.
Example:
+1 <- 2
+f(x) <- g(y)
+E103
+Your distribution call failed to parse. This can fail for many
+reasons, and the details of the failure come from
+mcstate2::mcstate_dsl_parse_distribution
Example reasons for failure include the rhs being:
+compare(x) ~ 1
+compare(x) ~ sqrt(2)
)compare(x) ~ Normal(0, 1, 2)
)The details for the failure will be included in the body of the error +message.
+E201
+Duplicated relationships (with ~
).
Example:
+a ~ Normal(0, 1)
+b ~ Uniform(0, 1)
+a ~ Exponential(1) # <= error here
+Relationships must be unique becaue each represents a parameter, and +a parameter can’t be represented by two different distributions.
+E202
+Duplicated assigments (with <-
). This is similar to
+E201
Example:
+a <- 1
+b <- 2
+a <- 3 # <= error here
+Assignments must be unique within the dsl code because this makes it +straightforward to trace usage through the dependency graph and from +this create a gradient function using automatic differentation.
+This restriction means that you cannot reassign a value either, so +this is an error:
+a <- 1
+b <- 10
+a <- a + b # <= error here
+E203
+A relationship (with ~
) is shadowing a previous
+assignment. So after assigning to a variable you have declared that the
+same symbol refers to a parameter.
Example:
+a <- 1
+a ~ Normal(0, 1) # <= error here
+E204
+An assignment (with <-
) is shadowing a previous
+relationship.
Example:
+a ~ Normal(0, 1)
+a <- 10
+E205
+Variables are used out of order. If you are using odin this is a big +departure - at the moment you must declare your expressions (assignments +and relationships) in order. However, because we forbid multiple +assignment we may relax this in the future, but no existing programs +should be changed.
+E206
+Failed to differentiate the model. This error will only be seen where +it was not possible to differentiate your model but you requested that a +gradient be available. Not all functions supported in the dsl can +currently be differentiated by mcstate2; if you think that yours should +be, please let us know.
+mcstate2
includes a simple probabilistic domain-specific
+language (DSL) that is inspired by stan
and Statistical Rethinking. It is designed
+to make some tasks a bit easier, particularly when defining priors for
+your model. We expect that this DSL is not sufficiently advanced to
+represent most interesting models but it may get more clever and
+flexible in the future. In particular we do not expect the DSL to be
+useful in writing likelihood functions for comparison to data; we expect
+that if your model is simple enough for this you would be better off
+using stan
or some similarly flexible system.
In chapter 4 of Statistical Rethinking, we build a regression model +of height with parameters +, + +and +. +We can define the model for the prior probability of this model in +mcstate by running
+
+prior <- mcstate_dsl({
+ alpha ~ Normal(178, 20)
+ beta ~ Normal(0, 10)
+ sigma ~ Uniform(0, 50)
+})
This will define a new mcstate_model()
object that
+represents the prior, but with all the bits that we might need depending
+on how we want to use it:
We have model parameters
+
+prior$parameters
+#> [1] "alpha" "beta" "sigma"
These are defined in the order that they appear in your definition
+(so alpha
is first and sigma
is last)
We can compute the domain for your model:
+
+prior$domain
+#> [,1] [,2]
+#> alpha -Inf Inf
+#> beta -Inf Inf
+#> sigma 0 50
We can draw samples from the model if we provide an [mcstate_rng] +object
+
+rng <- mcstate_rng$new()
+theta <- prior$direct_sample(rng)
+theta
+#> [1] 167.700794 -5.085348 8.899977
We can compute the (log) density at a point in parameter space
+
+prior$density(theta)
+#> [1] -11.31011
The computed properties for the model are:
+
+prior$properties
+#> $has_gradient
+#> [1] TRUE
+#>
+#> $has_direct_sample
+#> [1] TRUE
+#>
+#> $is_stochastic
+#> [1] FALSE
+#>
+#> $has_parameter_groups
+#> [1] FALSE
+#>
+#> $allow_multiple_parameters
+#> [1] FALSE
+#>
+#> attr(,"class")
+#> [1] "mcstate_model_properties"
This vignette is an introduction to some of the ideas in
+mcstate2
. As the package is not ready for users to use it,
+this is more a place for us to jot down ideas about what it will do.
We’re going to change the package name in the future (dropping the
+2
at least, but possibly also changing it entirely). It’s
+not yet clear to us if this will be a package that users will use or
+something that the rest of our stack will target.
Draw samples from a model using Markov Chain Monte Carlo methods. To +do this you will need:
+mcstate_model()
+object and minimally knows about the names of its parameter vector
+(which is an unstructured real-valued vector) and can compute a log
+probability density. It may also be able to compute a gradient of this
+log density, or sample directly from parameter space (e.g., if it
+represents a prior distribution).mcstate_sampler_random_walk()
, which implements a simple
+Metropolis algorithm random walk.The system is designed to be composable; you can work in a Bayesian +way by defining a model representing a likelihood and another model +representing a prior and then pick a sampler based on the capabilities +of the model, and pick a runner based on the capabilities your +computer.
+The mcstate_model()
interface is designed to be very
+flexible but not user-friendly. We expect to write a higher-level
+interface to help work with this, and describe how to write wrappers for
+models implemented in other packages (so you might write a model in dust or odin and an adaptor would make
+it easy to work with the tools provided by mcstate2
to
+start making inferences with your model).
Before starting the example, it’s worth noting that there are far +better tools out there to model this sort of thing (stan, bugs, jags, R +itself - really anything). The aim of this section is to derive a simple +model that may feel familiar. The strength of the package is when +performing inference with custom models that can’t be expressed in these +high level interfaces.
+
+head(data)
+#> height weight
+#> 1 162.5401 45.92805
+#> 2 159.9566 51.19368
+#> 3 156.1808 44.56841
+#> 4 168.4164 60.36933
+#> 5 158.6978 52.14180
+#> 6 154.7666 44.66696
+plot(height ~ weight, data)
A simple likelihood, following the model formulation in “Statistical +Rethinking” chapter 3; height is modelled as normally distributed +departures from a linear relationship with weight.
+
+likelihood <- mcstate_model(
+ list(
+ parameters = c("a", "b", "sigma"),
+ density = function(x) {
+ a <- x[[1]]
+ b <- x[[2]]
+ sigma <- x[[3]]
+ mu <- a + b * data$weight
+ sum(dnorm(data$height, mu, sigma, log = TRUE))
+ }))
The prior we’ll make much nicer to work with in the future, but here
+we construct the density by hand as a sum of normally distributed priors
+on a
and b
, and a weak uniform prior on
+sigma
. We provide a direct_sample
function
+here so that we can draw samples from the prior distribution
+directly.
+prior <- local({
+ a_mu <- 178
+ a_sd <- 100
+ b_mu <- 0
+ b_sd <- 10
+ sigma_min <- 0
+ sigma_max <- 50
+ mcstate_model(
+ list(
+ parameters = c("a", "b", "sigma"),
+ density = function(x) {
+ a <- x[[1]]
+ b <- x[[2]]
+ sigma <- x[[3]]
+ dnorm(a, a_mu, a_sd, log = TRUE) +
+ dnorm(b, b_mu, b_sd, log = TRUE) +
+ dunif(sigma, sigma_min, sigma_max, log = TRUE)
+ },
+ direct_sample = function(rng) {
+ c(rng$normal(1, a_mu, a_sd),
+ rng$normal(1, b_mu, b_sd),
+ rng$uniform(1, sigma_min, sigma_max))
+ },
+ domain = rbind(c(-Inf, Inf), c(-Inf, Inf), c(0, Inf))
+ ))
+})
The posterior distribution is the combination of these two models
+(indicated with a +
because we’re adding on a log-scale, or
+because we are using prior
and
+posterior
; you can use mcstate_model_combine()
+if you prefer).
+posterior <- likelihood + prior
Constructing a sensible initial variance-covariance matrix is a bit +of a trick, and using an adaptive sampler reduces the pain here. These +values are chosen to be reasonable starting points.
+
+vcv <- rbind(c(4.5, -0.088, 0.076),
+ c(-0.088, 0.0018, -0.0015),
+ c(0.076, -0.0015, 0.0640))
+sampler <- mcstate_sampler_random_walk(vcv = vcv)
Now run the sampler. We’ve started from a good starting point to make +this simple sampler converge quickly:
+
+samples <- mcstate_sample(posterior, sampler, 5000, initial = c(114, 0.9, 3),
+ n_chains = 4)
We don’t yet have tools for working with the samples objects, but we +can see the density over time easily enough:
+
+matplot(t(samples$density), type = "l", lty = 1,
+ xlab = "log posterior density", ylab = "sample", col = "#00000055")
And plots of our estimated parameters:
+ + +This vignette will describe the samplers available in mcstate2.
+ +This example shows HMC outperforming a random walk on a two
+dimensional banana-shaped function. Our model takes two parameters
+alpha
and beta
, and is based on two successive
+simple draws, with the one conditional on the other:
$$ +\beta \sim Normal(1,0) \\ +\alpha \sim Normal(\beta^2, \sigma) +$$
+with + +the standard deviation of the conditional draw.
+We’ll use R’s dnorm
for the likelihood calculations and
+then differentiate the density by hand (using the formula of the
+gaussian density) to derive the gradient by hand. We skip the details
+here as this will be automated in an upcoming version of the
+package.
+banana_model <- function(sd = 0.5) {
+ mcstate_model(list(
+ parameters = c("alpha", "beta"),
+ direct_sample = function(rng) {
+ beta <- rng$random_normal(1)
+ alpha <- rng$normal(1, beta^2, sd)
+ c(alpha, beta)
+ },
+ density = function(x) {
+ alpha <- x[[1]]
+ beta <- x[[2]]
+ dnorm(beta, log = TRUE) + dnorm((alpha - beta^2) / sd, log = TRUE)
+ },
+ gradient = function(x) {
+ alpha <- x[[1]]
+ beta <- x[[2]]
+ c((beta^2 - alpha) / sd^2,
+ -beta + 2 * beta * (alpha - beta^2) / sd^2)
+ },
+ domain = rbind(c(-Inf, Inf), c(-Inf, Inf))))
+}
Let’s create a model with +
+
+m <- banana_model(0.5)
We can plot a greyscale visualisation of its density by computing the +density over a grid. Normally this is not possible of course:
+
+a <- seq(-2, 6, length.out = 1000)
+b <- seq(-2, 2, length.out = 1000)
+z <- outer(a, b, function(a, b) dnorm(b) * dnorm((a - b^2) / 0.5))
+image(a, b, z, xlab = "alpha", ylab = "beta")
In this particular case we can also easily generate samples, so we +know what a good sampler will produce:
+
+rng <- mcstate_rng$new()
+s <- vapply(seq(200), function(x) m$direct_sample(rng), numeric(2))
+image(a, b, z, xlab = "alpha", ylab = "beta")
+points(s[1, ], s[2, ], pch = 19, col = "#00000055")
It is also possible to compute the 95% confidence interval of the +distribution using the relationship between the standard bivariate +normal distribution and the banana shaped distribution as defined above. +We can check than roughly 10 samples (out of 200) are out of this 95% CI +contour.
+
+theta <- seq(0, 2 * pi, length.out = 10000)
+z95 <- local({
+ sd <- 0.5
+ r <- sqrt(qchisq(.95, df = 2))
+ x <- r * cos(theta)
+ y <- r * sin(theta)
+ cbind(x^2 + y * sd, x)
+})
+image(a, b, z, xlab = "alpha", ylab = "beta")
+lines(z95[, 1], z95[, 2])
+points(s[1, ], s[2, ], pch = 19, col = "#00000055")
It is not generally possible to directly sample from a density +(otherwise MCMC and similar methods would not exist!). In these cases we +need to use a sampler based on the density and if available possibly the +gradient of the density.
+We can start with a basic random-walk sampler
+
+sampler_rw <- mcstate_sampler_random_walk(vcv = diag(2) * 0.01)
+res_rw <- mcstate_sample(m, sampler_rw, 2000)
+plot(res_rw$pars, pch = 19, col = "#ff222277")
+lines(z95[, 1], z95[, 2])
As we can see this is not great. We can probably improve the samples +here by finding a better variance covariance matrix (VCV), but a single +VCV will not hold well over the whole surface because it is not very +similar to a multivariate normal (that is, the appropriate VCV will +change depending on the position in parameter space)
+Let’s try the Hamiltonian Monte Carlo (HMC) sampler, which uses the +gradient to move efficiently in parameter space:
+
+sampler_hmc <- mcstate_sampler_hmc(epsilon = 0.1, n_integration_steps = 10)
+res_hmc <- mcstate_sample(m, sampler_hmc, 2000)
+plot(res_hmc$pars, pch = 19, col = "#ff222277")
+lines(z95[, 1], z95[, 2])
Clearly better!
+This vignette outlines some ways of working with samples generated
+from running an MCMC with mcstate_sample
. We’ll flesh this
+out as we develop nicer ways of working with samples, and mostly exists
+so that we can point to it from other documentation, including from
+other packages (e.g., from odin2
).
All samplers and runners produce samples with the same basic
+structure. This structure is documented (here) so you are free to use
+this structure if you want to just dive in and manipulate things. Please
+treat output as read-only; extract data all you want, but make a copy
+and don’t change any value within the samples structure if you are going
+to pass it back into an mcstate2
function, as we assume
+that they have not been modified.
Below, samples
is a poorly mixed result of samples with
+2 parameters, 2000 samples, and 4 chains. It was the result of running
+mcstate_sample()
on the banana model from
+vignette("samplers")
, and has class
+mcstate_samples
.
+samples
+#>
+#> ── <mcstate_samples: 2 parameters x 2000 samples x 4 chains> ───────────────────
+#> ℹ Parameters: 'alpha' and 'beta'
+#> ℹ Conversion to other types is possible:
+#> → ! posterior::as_draws_array() [package installed]
+#> → ! posterior::as_draws_df() [package installed]
+#> → ! coda::as.mcmc.list() [package installed]
+#> ℹ See `?mcstate_sample()` and `vignette("samples")` for more information
Each mcstate_samples
object contains an element
+pars
which contains sampled parameters
+dim(samples$pars)
+#> [1] 2 2000 4
posterior
’s draw
+objects
+We implement methods for posterior::as_draws_array
and
+posterior::as_draws_df
, which you can use to convert to
+formats that you might be familiar with from use with other statistical
+packages. This only preserves the parameters ($pars
) from
+above.
+samples_df <- posterior::as_draws_df(samples)
With this, you can access summary methods already implemented +elsewhere:
+
+posterior::summarise_draws(samples_df)
+#> # A tibble: 2 × 10
+#> variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
+#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+#> 1 alpha 2.27 0.645 3.06 1.39 -0.612 7.73 1.85 5.79 43.5
+#> 2 beta 0.608 0.364 1.39 1.02 -2.05 2.78 2.10 5.32 11.7
With these objects you should be able to use any of the plotting
+functions in bayesplot
, for
+example, MCMC
+visual diagnostics, without much trouble.
We also support conversion to draws_array
; let us know
+if you need the other conversion functions (as_draws_list
,
+as_draws_rvars
, as_draws_matrix
).
coda
’s mcmc.list
+objects
+The coda
package has utilities for working with MCMC,
+and many other packages are compatible with its mcmc.list
+object type (e.g., the ggmcmc
+package). We provide a method for coda
’s
+as.mcmc.list
, if that package is available:
+samples_coda <- coda::as.mcmc.list(samples)
+coda::effectiveSize(samples_coda)
+#> alpha beta
+#> 39.92519 68.47306