-
Notifications
You must be signed in to change notification settings - Fork 8
/
assignment-06.Rmd
202 lines (162 loc) · 6.11 KB
/
assignment-06.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
# Assignment 6
2021-10-07
**[Assignment 6](`r paste0(CM_URL, "assignment-06.pdf")`)**
## Setup
```{r setup, warning=FALSE, message=FALSE}
knitr::opts_chunk$set(echo = TRUE, comment = "#>", dpi = 300)
for (f in list.files(here::here("src"), pattern = "R$", full.names = TRUE)) {
source(f)
}
library(rstan)
library(tidybayes)
library(bayesplot)
library(tidyverse)
theme_set(
theme_classic() +
theme(
plot.title = element_text(hjust = 0.5),
panel.grid.major = element_line()
)
)
rstan_options(auto_write = TRUE)
```
## Exercise 1. Generalized linear model: Bioassay with Stan
**Replicate the computations for the bioassay example of section 3.7 (BDA3) using Stan.**
The model is located in ["models/assignment06-bioassay.stan"](`r paste0(MODELS_URL, "models/assignment06-bioassay.stan")`).
I have copied it below:
```stan
data {
int<lower=0> N; // number of data points
vector[N] x; // dose
int<lower=0> n[N]; // number of animals
int<lower=0> y[N]; // number of deaths
vector[2] mu; // prior on mean of theta
matrix<lower=0>[2, 2] sigma; // prior on covariance matrix of theta
}
parameters {
vector[2] mdl_params;
}
transformed parameters {
vector[N] theta;
theta = mdl_params[1] + mdl_params[2] * x;
}
model {
mdl_params ~ multi_normal(mu, sigma);
y ~ binomial_logit(n, theta);
}
```
### 1. Write down the model for the bioassay data in Stan syntax.
**Use the Gaussian prior as in Assignment 4 and 5, that is:**
$$
\begin{bmatrix}
\alpha \\ \beta
\end{bmatrix} \sim \text{N} (\mu_0, \Sigma_0)
\quad \text{where} \quad
\mu_0 = \begin{bmatrix} 0 \\ 10 \end{bmatrix}
\quad \text{and} \quad
\Sigma_0 = \begin{bmatrix} 2^2 & 12 \\ 12 & 10^2 \end{bmatrix}
$$
**Hint!**
**You will need Stan functions `multi_normal` and `binomial_logit` for implementing the prior and observation model, respectively.**
**In Stan code, it is easiest to declare a variable (say `theta`) which is a two-element vector so that the first value denotes $\alpha$ and latter one $\beta$.**
**This is because the `multi_normal` function that you need for implementing the prior requires a vector as an input.**
```{r}
bioassay <- read_bioassay_data()
bioassay_mdl_posterior <- stan(
file = here::here("models", "assignment06-bioassay.stan"),
data = list(
N = nrow(bioassay),
x = bioassay$x,
n = bioassay$n,
y = bioassay$y,
mu = c(0, 10),
sigma = matrix(c(2^2, 12, 12, 10^2), nrow = 2)
)
)
```
### 2. Use $\widehat{R}$ for convergence analysis.
**Report the $\widehat{R}$ values both for $\alpha$ and $\beta$ and discuss the convergence of the chains.**
**Briefly explain in your own words how to interpret the obtained $\widehat{R}$ values.**
Bellow is a table summarizing the results of sampling from the model's posterior.
The $\widehat{R}$ is stated in the last column of the table.
As the values are all around 1.0, this suggests that the chains mixed and converged.
```{r}
bioassay_mdl_posterior
```
$\widehat{R}$ is a metric for how well the chains mixed and converged using the within- and between-chain variance.
If the value is larger than 1, it indicates that the chains have likely not mixed, either due to model misspecification or not running the chains for long enough.
It can also be thought of a scaling factor indicating how much greater the estimated variance in the posterior is compared to the real variance caused by the chains not converging.
### 3. Plot the draws for $\alpha$ and $\beta$ (scatter plot) and include this plot in your report
```{r}
post_df <- bioassay_mdl_posterior %>%
spread_draws(mdl_params[param]) %>%
mutate(param = ifelse(param == 1, "alpha", "beta"))
```
```{r}
post_df %>%
ggplot(aes(y = fct_rev(param), x = mdl_params)) +
stat_halfeye(.width = c(0.50, 0.89)) +
labs(x = "posterior value", y = "model parameter")
```
```{r}
post_point_est <- as_tibble(
bayestestR::point_estimate(bioassay_mdl_posterior)
) %>%
select(param = Parameter, mean = Mean) %>%
filter(str_detect(param, "mdl_params")) %>%
mutate(param = c("alpha", "beta"))
post_hdi <- as_tibble(bayestestR::hdi(bioassay_mdl_posterior, ci = 0.89)) %>%
janitor::clean_names() %>%
select(param = parameter, ci, ci_low, ci_high) %>%
filter(str_detect(param, "mdl_params")) %>%
mutate(param = c("alpha", "beta"))
parameter_post_description <- inner_join(post_point_est, post_hdi, by = "param")
ALPHA_COL <- "#F90039"
BETA_COL <- "#4E477F"
post_df %>%
pivot_wider(names_from = param, values_from = mdl_params) %>%
ggplot(aes(x = alpha, y = beta)) +
geom_density_2d(alpha = 0.5) +
geom_point(size = 0.4, alpha = 0.5) +
geom_rect(
aes(xmin = ci_low, xmax = ci_high),
data = parameter_post_description %>% slice(1),
ymin = -Inf,
ymax = Inf,
alpha = 0.1,
fill = ALPHA_COL,
inherit.aes = FALSE
) +
geom_rect(
aes(ymin = ci_low, ymax = ci_high),
data = parameter_post_description %>% slice(2),
xmin = -Inf,
xmax = Inf,
alpha = 0.1,
fill = BETA_COL,
inherit.aes = FALSE
) +
geom_vline(
xintercept = parameter_post_description$mean[[1]],
color = ALPHA_COL,
linetype = 2
) +
geom_hline(
yintercept = parameter_post_description$mean[[2]],
color = BETA_COL,
linetype = 2
) +
labs(x = "alpha", y = "beta", title = "Bioassy model posterior")
```
### 4. To develop the course and provide feedback to Stan developers, we collect information on which Stan setup you used and whether you had any problems in setting it up or using it.
**Please report,**
1. **Operating system (Linux, Mac, Windows) or jupyter.cs.aalto.fi?** macOS Big Sur (v11.6)
2. **Programming environment used: R or Python?** R
3. **Interface used: RStan, CmdStanR, PyStan, or CmdStanPy?** RStan
4. **Did you have installation or compilation problems?** No troubles.
5. **Did you try first installing locally, but switched to jupyter.cs.aalto.fi?** No.
6. **In addition of these you can write what other things you found out difficult (or even frustrating) when making this assignment with Stan.** No frustrations this time, but I have had some in the past, but the problem was with how I had installed R and not specific to Stan.
---
```{r}
sessionInfo()
```