assignment-06.Rmd

# Assignment 6

2021-10-07

**[Assignment 6](`r paste0(CM_URL, "assignment-06.pdf")`)**

## Setup

```{r setup, warning=FALSE, message=FALSE}
knitr::opts_chunk$set(echo = TRUE, comment = "#>", dpi = 300)

for (f in list.files(here::here("src"), pattern = "R$", full.names = TRUE)) {
  source(f)
}

library(rstan)
library(tidybayes)
library(bayesplot)
library(tidyverse)

theme_set(
  theme_classic() +
    theme(
      plot.title = element_text(hjust = 0.5),
      panel.grid.major = element_line()
    )
)

rstan_options(auto_write = TRUE)
```

## Exercise 1. Generalized linear model: Bioassay with Stan

**Replicate the computations for the bioassay example of section 3.7 (BDA3) using Stan.**

The model is located in ["models/assignment06-bioassay.stan"](`r paste0(MODELS_URL, "models/assignment06-bioassay.stan")`).
I have copied it below:

```stan
data {
  int<lower=0> N;    // number of data points
  vector[N] x;       // dose
  int<lower=0> n[N]; // number of animals
  int<lower=0> y[N]; // number of deaths

  vector[2] mu;                // prior on mean of theta
  matrix<lower=0>[2, 2] sigma; // prior on covariance matrix of theta
}

parameters {
  vector[2] mdl_params;
}

transformed parameters {
  vector[N] theta;
  theta = mdl_params[1] + mdl_params[2] * x;
}

model {
  mdl_params ~ multi_normal(mu, sigma);
  y ~ binomial_logit(n, theta);
}
```


### 1. Write down the model for the bioassay data in Stan syntax.

**Use the Gaussian prior as in Assignment 4 and 5, that is:**

$$
\begin{bmatrix}
\alpha \\ \beta
\end{bmatrix} \sim \text{N} (\mu_0, \Sigma_0)
\quad \text{where} \quad
\mu_0 = \begin{bmatrix} 0 \\ 10 \end{bmatrix}
\quad \text{and} \quad
\Sigma_0 = \begin{bmatrix} 2^2 & 12 \\ 12 & 10^2 \end{bmatrix}
$$

**Hint!**
**You will need Stan functions `multi_normal` and `binomial_logit` for implementing the prior and observation model, respectively.**
**In Stan code, it is easiest to declare a variable (say `theta`) which is a two-element vector so that the first value denotes $\alpha$ and latter one $\beta$.**
**This is because the `multi_normal` function that you need for implementing the prior requires a vector as an input.**

```{r}
bioassay <- read_bioassay_data()
bioassay_mdl_posterior <- stan(
  file = here::here("models", "assignment06-bioassay.stan"),
  data = list(
    N = nrow(bioassay),
    x = bioassay$x,
    n = bioassay$n,
    y = bioassay$y,
    mu = c(0, 10),
    sigma = matrix(c(2^2, 12, 12, 10^2), nrow = 2)
  )
)
```

### 2. Use $\widehat{R}$ for convergence analysis.

**Report the $\widehat{R}$ values both for $\alpha$ and $\beta$ and discuss the convergence of the chains.**
**Briefly explain in your own words how to interpret the obtained $\widehat{R}$ values.**

Bellow is a table summarizing the results of sampling from the model's posterior.
The $\widehat{R}$ is stated in the last column of the table.
As the values are all around 1.0, this suggests that the chains mixed and converged.

```{r}
bioassay_mdl_posterior
```

$\widehat{R}$ is a metric for how well the chains mixed and converged using the within- and between-chain variance.
If the value is larger than 1, it indicates that the chains have likely not mixed, either due to model misspecification or not running the chains for long enough.
It can also be thought of a scaling factor indicating how much greater the estimated variance in the posterior is compared to the real variance caused by the chains not converging.

### 3. Plot the draws for $\alpha$ and $\beta$ (scatter plot) and include this plot in your report

```{r}
post_df <- bioassay_mdl_posterior %>%
  spread_draws(mdl_params[param]) %>%
  mutate(param = ifelse(param == 1, "alpha", "beta"))
```

```{r}
post_df %>%
  ggplot(aes(y = fct_rev(param), x = mdl_params)) +
  stat_halfeye(.width = c(0.50, 0.89)) +
  labs(x = "posterior value", y = "model parameter")
```

```{r}
post_point_est <- as_tibble(
  bayestestR::point_estimate(bioassay_mdl_posterior)
) %>%
  select(param = Parameter, mean = Mean) %>%
  filter(str_detect(param, "mdl_params")) %>%
  mutate(param = c("alpha", "beta"))

post_hdi <- as_tibble(bayestestR::hdi(bioassay_mdl_posterior, ci = 0.89)) %>%
  janitor::clean_names() %>%
  select(param = parameter, ci, ci_low, ci_high) %>%
  filter(str_detect(param, "mdl_params")) %>%
  mutate(param = c("alpha", "beta"))

parameter_post_description <- inner_join(post_point_est, post_hdi, by = "param")

ALPHA_COL <- "#F90039"
BETA_COL <- "#4E477F"

post_df %>%
  pivot_wider(names_from = param, values_from = mdl_params) %>%
  ggplot(aes(x = alpha, y = beta)) +
  geom_density_2d(alpha = 0.5) +
  geom_point(size = 0.4, alpha = 0.5) +
  geom_rect(
    aes(xmin = ci_low, xmax = ci_high),
    data = parameter_post_description %>% slice(1),
    ymin = -Inf,
    ymax = Inf,
    alpha = 0.1,
    fill = ALPHA_COL,
    inherit.aes = FALSE
  ) +
  geom_rect(
    aes(ymin = ci_low, ymax = ci_high),
    data = parameter_post_description %>% slice(2),
    xmin = -Inf,
    xmax = Inf,
    alpha = 0.1,
    fill = BETA_COL,
    inherit.aes = FALSE
  ) +
  geom_vline(
    xintercept = parameter_post_description$mean[[1]],
    color = ALPHA_COL,
    linetype = 2
  ) +
  geom_hline(
    yintercept = parameter_post_description$mean[[2]],
    color = BETA_COL,
    linetype = 2
  ) +
  labs(x = "alpha", y = "beta", title = "Bioassy model posterior")
```

### 4. To develop the course and provide feedback to Stan developers, we collect information on which Stan setup you used and whether you had any problems in setting it up or using it.

**Please report,**

1. **Operating system (Linux, Mac, Windows) or jupyter.cs.aalto.fi?** macOS Big Sur (v11.6)
2. **Programming environment used: R or Python?** R
3. **Interface used: RStan, CmdStanR, PyStan, or CmdStanPy?** RStan
4. **Did you have installation or compilation problems?** No troubles.
5. **Did you try first installing locally, but switched to jupyter.cs.aalto.fi?** No.
6. **In addition of these you can write what other things you found out difficult (or even frustrating) when making this assignment with Stan.** No frustrations this time, but I have had some in the past, but the problem was with how I had installed R and not specific to Stan.

---

```{r}
sessionInfo()
```