-
Notifications
You must be signed in to change notification settings - Fork 3
/
LM2.qmd
436 lines (304 loc) · 22.4 KB
/
LM2.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
---
title: "Ch. 2: Linear Model 2: Multiple predictors"
author: "Felix Schönbrodt"
execute:
cache: true
project-type: website
---
*Reading/working time: ~50 min.*
```{r setup}
#| output: false
# Preparation: Install and load all necessary packages
#install.packages(c("ggplot2", "Rfast"))
library(Rfast)
library(ggplot2)
library(tidyr)
```
In the [first chapter on linear models](LM1.qmd), we had the simplest possible linear model: A continuous outcome variable is predicted by a single dichotomous predictor. In this chapter, we build up increasingly complex models by (a) adding a single continuous predictor and (b) modeling an interaction.
# Adding a continuous predictor
We can improve our causal inference by including the pre-treatment baseline scores of depression into the model: This way, participants are "their own control group", and we can explain a lot of (formerly) unexplained error variance by controlling for this between-person variance.
::: {.callout-note}
As a side-note: Many statistical models have been proposed for analyzing a pre-post control-treatment design. There is growing consensus that a model with the baseline (pre) measurement as covariate is the most appropriate model (see, for example, Senn, [2006](http://misc.biblio.s3.amazonaws.com/Senn06Change+from+baseline+and+analysis+of+covariance.pdf) or Wan, [2021](https://doi.org/10.1186/s12874-021-01323-9)).
:::
The raw effect size stays the same -- the treatment still is expected to lead to a 6-point decrease in depression scores on average. But we expect that the residual (i.e., unexplained) variance in the model decreases by controlling for pre-treatment differences between participants. But how much will the residual variance be reduced? This depends on the pre-post-correlation.
::: {.callout-note}
If you have absolutely no idea about the pre-post-correlation, a typical default value is $r=.5$. But such generic defaults are always the very last resort -- when you run a scientific study, we hope that you bring at least some domain knowledge 🤓
:::
## Get some real data as starting point
Instead of guessing the necessary quantities -- in the current case, the pre-post-correlation -- let's look at real data. The "Beat the blues" (`BtheB`) data set from the `HSAUR` R package contains pre-treatment baseline values (`bdi.pre`), along with multiple post-treatment values. Here we focus on the first post-treatment assessment, 2 months after the treatment (`bdi.2m`).
```{r load_data}
# load the data
data("BtheB", package = "HSAUR")
# pre-post-correlation
cor(BtheB$bdi.pre, BtheB$bdi.2m, use="p")
```
In our pilot data set, the pre-post-correlation is around $r=.6$. We will use this value in our simulations.
## Update the sim() function
Now we add the continuous predictor into our `sim()` function. In order to simulate a correlated variable, we need to slightly change the workflow in our simulation.
We use the `rmvnorm` function from the `Rfast` package to create correlated, normally distributed variables. It takes three parameters:
* `n`, the number of random observations
* `mu`, vector of mean values (one for each random variable)
* `sigma`, the variance-covariance-matrix of the random variables
For `mu`, we use the value 23 for both the pre and the post value. You can imagine that we only look at the control group: The mean value is not supposed to change systematically from pre to post (although each individual person can go somewhat up or down).
The variance-covariance-matrix defines two properties at once: the variance of each variable and the covariance to all other variables. In the two-variable case, the general matrix looks like:
$$
\begin{bmatrix}
var_x & cov_{xy}\\
cov_{xy} & var_y
\end{bmatrix}
$$
Note that the covariance in the diagonal has the same values, as $cov_{xy} = cov_{yx}$.
As we need to enter the covariance into `sigma`, we need to convert the correlation into a covariance. Here's the formula:
$$cor_{xy} = \frac{cov_{xy}}{\sigma_x\sigma_y}$$
(Note: The denominator contains the standard deviation, not the variance.)
Solved for the covariance yields:
$$cov_{xy} = cor_{xy}\sigma_X\sigma_y = 0.6 * \sqrt{117} * \sqrt{117} = 70.2$$
Hence, the specific variance-covariance-matrix `sigma` is in our case (generously rounded):
$$
\begin{bmatrix}
117 & 70\\
70 & 117
\end{bmatrix}
$$
Put it all together:
```{r rmvnorm}
set.seed(0xBEEF)
mu <- c(23, 23) # the mean values of both variables
sigma <- matrix(
c(117 , 70,
70, 117), nrow=2, byrow=TRUE)
df <- rmvnorm(n=10000, mu=mu, sigma=sigma) |> data.frame()
names(df) <- c("BDI_pre", "BDI_post0")
# Check: in a large sample the correlation should be close to .6
cor(df)
```
::: {.callout-note}
A very nice and user-friendly alternative for simulating correlated variables is the [`rnorm_multi`](https://debruine.github.io/faux/articles/rnorm_multi.html) function from the `faux` package.
:::
We now have the correlated $BDI_{pre}$ and $BDI_{post}$ scores. Finally, we have to impose the treatment effect onto the post variable: In the control group, the mean value stays constant at 23 (what we already have simulated), in the treatment group, the BDI is 6 points lower:
```{r treatment_effect}
# add treatment predictor variables
df$treatment <- rep(c(0, 1), times=nrow(df)/2)
# add the treatment effect to the BDI_post0 variable
df$BDI_post <- df$BDI_post0 + -6*df$treatment
```
We do not need to add an explicit intercept, as this is already encoded in the mean value of the simulated variables.
(Alternatively, we could have simulated them centered on zero and then explicitly add the intercept in the model equation).
Let's make a plausibility check by plotting the simulated variables. We first have to convert them into long format for a nicer plot:
```{r plot}
# reduce to 100 cases for plotting; define factors
df2 <- df[1:400, ]
df2$id <- 1:nrow(df2)
df2$BDI_post0 <- NULL
df_long <- pivot_longer(df2, cols=c(BDI_pre, BDI_post), names_to="time")
df_long$time <- factor(df_long$time, levels=c("BDI_pre", "BDI_post"))
df_long$treatment <- factor(df_long$treatment, levels=c(0, 1), labels=c("Control", "Treatment"))
ggplot(df_long, aes(x=time, y=value, group=id)) + geom_point() + geom_line() + facet_wrap(~treatment)
ggplot(df_long, aes(x=time, y=value, group=time)) + geom_boxplot() + facet_wrap(~treatment)
```
Looks good -- reasonable BDI values, same means in control group, same variance. The treatment effect of -6 is visible.
Let's put that together in the new `sim` function:
```{r}
#| echo: true
#| results: hide
sim2 <- function(n=100, treatment_effect=-6, pre_post_cor = 0.6, err_var = 117, print=FALSE) {
library(Rfast)
# Here we simulate correlated BDI pre and post scores
mu <- c(23, 23) # the mean values of both variables
sigma <- matrix(
c(err_var, pre_post_cor*sqrt(err_var)*sqrt(err_var),
pre_post_cor*sqrt(err_var)*sqrt(err_var), err_var),
nrow=2, byrow=TRUE)
df <- rmvnorm(n, mu, sigma) |> data.frame()
names(df) <- c("BDI_pre", "BDI_post0")
# add treatment predictor variables
df$treatment <- c(rep(0, n/2), rep(1, n/2))
# center the BDI_pre value for better interpretability
df$BDI_pre.c <- df$BDI_pre - mean(df$BDI_pre)
# add the treatment effect to the BDI_post0 variable
# We do not need to add an intercept, as this is already encoded
# in the mean value of the simulated variables.
df$BDI_post <- df$BDI_post0 + df$treatment*treatment_effect
# fit the model
res <- lm(BDI_post ~ BDI_pre.c + treatment, data=df)
summary(res)
p_value <- summary(res)$coefficients["treatment", "Pr(>|t|)"]
if (print==TRUE) print(summary(res))
else return(p_value)
}
```
Let's test the plausibility of the new `sim2()` function by simulating a very large sample -- this should give estimates close to the true values. We also vary the assumed pre-post-correlation. When this is 0, the results should be identical to the simpler model from [Chapter 1](LM1.qmd) (except one *df* that we lost due to the additional predictor). When the pre-post-correlation is > 0, the error variance should be reduced (see `Residual standard error` at the bottom of each `lm` output).
```{r}
set.seed(0xBEEF)
# without pre_post_cor, the result should be the same
sim2(n=100000, pre_post_cor=0, print=TRUE)
# with pre_post_cor, the residual error should be reduced
sim2(n=100000, pre_post_cor=0.6, print=TRUE)
```
Indeed, with `pre_post_cor = 0` the parameter estimates are identical, and with non-zero pre-post-correlation, the error term gets reduced.
::: {.callout-note collapse="true"}
# An additional plausibility check (advanced)
We assume independence of both predictor variables (`BDI_pre` and `treatment`). This is plausible, because the treatment was randomized and therefore independent from the baseline. In this case, the variance that each predictor explains in the dependent variable is additive. The pre-measurement explains $r^2 = .6^2 = 36\%$ of the variance in the post-measurement. As this variance is unrelated to the treatment factor, it reduces the variance of the error term (which was at 117) by 36%:
$\sigma^2_{err} = 117 * (1-0.36) = 74.88$
The square root of this unexplained error variance is the `Residual standard error` from the `lm` output: $\sqrt{74.88} = 8.65$.
:::
## Do the power analysis
The next step is the same as always: Use the `replicate` function to repeatedly call the `sim` function for many iterations, and increase the simulated sample size until the desired power level is achieved.
```{r}
#| echo: true
#| results: hide
#| code-fold: true
set.seed(0xBEEF)
# define all predictor and simulation variables.
iterations <- 2000
ns <- seq(90, 180, by=10) # ns are already adjusted to cover the relevant range
result <- data.frame()
for (n in ns) { # loop through elements of the vector "ns"
p_values <- replicate(iterations, sim2(n=n, pre_post_cor = 0.6))
result <- rbind(result, data.frame(
n = n,
power = sum(p_values < .005)/iterations
)
)
# show the result after each run (not shown here in the tutorial)
print(result)
}
```
```{r}
#| echo: false
print(result)
```
In the original analysis, we needed $n=180$ (90 in each group) for 80% power. Including the baseline covariate (which explains $r^2 = .6^2 = 36\%$ of the variance in post scores) reduces that number to around $n=115$.
Let's check the plausibility of our power simulation. Borm et al. ([2007](http://www.math.chalmers.se/Stat/Grundutb/GU/MSA620/S18/Ancova.pdf), p. 1237) propose a simple method on how to arrive at a planned sample size when switching from a simple t-test (comparing post-treatment groups) to a model that controls for the baseline:
> "We propose a simple method for the sample size calculation when ANCOVA is used: multiply the number of subjects required for the t-test by (1-r^2) and add one extra subject per group."
When we enter our assumed pre-post-correlation into that formula, we arrive a $n=117$ -- very close to our value:
$$180 * (1 - .6^2) + 2 = 117$$
# Modeling an interaction
We now include the interaction between the baseline score $\text{BDI}_{pre}$ and the $\text{treatment}$ factor. Such an interaction would suggest that the treatment is more effective at a certain baseline severity. There is some evidence that depression treatment works better at higher baseline severity, compared to lower baseline severity[^1].
[^1]: Although this might be due to a floor effect of the measurement instrument: If a questionnaire item is already at the lowest possible value at mild depression, there is no more room for improvement (Hieronymus et al., [2019](https://doi.org/10.1016/S2215-0366(19)30216-0)).
To this end, we add the interaction term to the equation. We use the centered baseline value $\text{BDI}_{pre.c}$ for better interpretability:
$$\text{BDI}_{post} = b_0 + b_1*\text{BDI}_{pre.c} + b_2*\text{treatment} + b_3*\text{BDI}_{pre.c}*\text{treatment} + e$$
When we rearrange the equation by factoring out the `treatment` term, the interaction is easier to interpret:
$$\text{BDI}_{post} = b_0 + b_1*\text{BDI}_{pre.c} + (b_2 + b_3*\text{BDI}_{pre.c})*\text{treatment} + e$$
Hence, the size of the treatment effect is $b_2 + b_3*\text{BDI}_{pre}$. Remember that we centered $BDI_{pre}$ for better interpretability -- this is the place where it gets relevant: When we enter the value 0 for $BDI_{pre.c}$ (which corresponds to a typical pre-treatment depression score of around 23), the treatment effect equals the main effect $b_2$. This conditional treatment effect gets larger or smaller, depending on the value of $BDI_{pre.c}$ and the regression weight $b_3$. The meaning of $b_3$ is: How much does the treatment effect increase for every 1 unit increase in $BDI_{pre.c}$?
What are plausible values for $b_3$? First, we need to decide on the sign. We expect that the treatment effect gets more *negative* (as we expect a decrease in BDI scores) with increasing baseline severity, so the coefficient must be negative. For the moment, let's assume that the treatment effect gets 0.4 more negative for every unit increase in $BDI_{pre.c}$:
$$\text{BDI}_{post} = 23 + 0.6*\text{BDI}_{pre.c} - 6*\text{treatment} - 0.4*\text{BDI}_{pre.c}*\text{treatment} + e$$
We can plot the predicted values of that equation as two separate regression lines -- simply enter either 0 or 1 for `treatment`:
$$\text{Control group}: \widehat{\text{BDI}_{post}} = 23 + 0.6*\text{BDI}_{pre.c} - 6*0 - 0.4*\text{BDI}_{pre.c}*0 = 23 + 0.6*\text{BDI}_{pre.c}$$
$$\text{Treatment group}: \widehat{\text{BDI}_{post}} = 23 + 0.6*\text{BDI}_{pre.c} - 6*1 - 0.4*\text{BDI}_{pre.c}*1 = 17 + 0.2*\text{BDI}_{pre.c}$$
In the following plot, the limits of the x-axis are chosen to reflect the possible range of the $BDI_{pre}$ score: It can go from 0 to 63, which corresponds to -23 to 40 on the centered scale. The actual maximum in our pilot data was 49 (i.e., 26 on the centered scale), marked with a dashed line.
```{r interaction_plot}
#| code-fold: true
par(mar = c(7, 4, 4, 2) + 0.1)
plot(NA, xlim=c(-23, 40), ylim=c(0, 63), xlab="", ylab="BDI_post", main="An implausible interaction effect")
abline(a=23, b=0.6, col="red")
abline(a=17, b=0.2, col="blue")
abline(v=26, lty="dotted")
legend("topleft", legend=c("Control group", "Treatment group"),
col=c("red", "blue"), lty=1, lwd=2, cex=.8)
# secondary axis with non-centered BDI scores
title(xlab = "BDI_pre.c (upper axis), BDI_pre (lower axis)", line = 2)
axis(1, at = seq(-23, 40, by=5), labels=seq(0, 63, by=5), line = 3.5)
# treatment effect at average baseline value
segments(x0=0, y0=23, x1=0, y1=17, col="darkgreen", lwd=3)
# treatment effect at smallest possible value (a person with BDI_pre=0)
segments(x0=-23, y0=9.2, x1=-23, y1=12.4, col="darkgreen", lwd=3)
# treatment effect at smallest possible value (a person with BDI_pre=0)
segments(x0=26, y0=38.6, x1=26, y1=22.2, col="darkgreen", lwd=3)
```
With that strenth of the interaction effect, the predicted treatment effect of a person with the highest observed $\text{BDI}_{pre}$ score of 49 would be 16.4, nearly three times as large as the treatment effect at average baseline severity (see the long green line in the plot). While this seems quite high, it might still be plausible.
However, the interaction creates an implausible situation at the lower end of baseline severity. The red and the blue line cross at a $BDI_{pre}$ value of around 8: For such persons with a minimal depression, it makes zero difference whether they are in the control or the treatment group. But the effect even *reverses* for patients with very low BDI values below 8: For them, the treatment leads to worse outcomes than being in the control group.
As this is implausible, we want to calibrate the interaction effect in a way that the treatment a) makes no difference for the lowest $\text{BDI}_{pre}$ values and b) reduces the BDI by 6 points for average depression scores.
(Alternatively, we have to give up the assumption of a *linear* interaction effect and create a non-linear interaction model. But you don't want to go that route ...).
By setting the interaction effect to -0.2, we achieve a plausible interaction plot:
```{r interaction_plot2}
#| code-fold: true
par(mar = c(7, 4, 4, 2) + 0.1)
plot(NA, xlim=c(-23, 40), ylim=c(0, 63), xlab="", ylab="BDI_post", main="A plausible interaction effect")
abline(a=23, b=0.6, col="red")
abline(a=17, b=0.4, col="blue")
abline(v=26, lty="dotted")
legend("topleft", legend=c("Control group", "Treatment group"),
col=c("red", "blue"), lty=1, lwd=2, cex=.8)
# secondary axis with non-centered BDI scores
title(xlab = "BDI_pre.c (upper axis), BDI_pre (lower axis)", line = 2)
axis(1, at = seq(-23, 40, by=5), labels=seq(0, 63, by=5), line = 3.5)
# treatment effect at average baseline value
segments(x0=0, y0=23, x1=0, y1=17, col="darkgreen", lwd=3)
# treatment effect at smallest possible value (a person with BDI_pre=0)
segments(x0=-23, y0=9.2, x1=-23, y1=7.8, col="darkgreen", lwd=3)
# treatment effect at smallest possible value (a person with BDI_pre=0)
segments(x0=26, y0=38.6, x1=26, y1=27.4, col="darkgreen", lwd=3)
```
Now the treatment makes no difference for persons who are not depressed anyway; and for highly depressed persons the treatment effect is -11.2, roughly double the size of a typical treatment effect.
We calibrated the interaction effect size by looking at a boundary case. We'd consider this interaction effect to be an *upper limit* (larger interactions make no sense), but the true interaction effect could be smaller of course.
## Thinking visually
Arriving at plausible guesses for interaction effects can be tricky. We strongly recommend to **visualize** your assumed interaction effect in a plot -- first drawn by hand: Do you expect a disordinal (i.e., cross-over) interaction, or an ordinal one where the effect is amplified (but not reversed) by the moderating variable?
Look at a reasonable maximum and minimum of your variables -- how large would you expect the effect to be there?
Evaluate the effect size in subgroups: For example, imagine a group of really severely depressed patients. And then assume that the therapy works exceptionally well for them -- what would be a realistic outcome for them? Would you expect them all to be at BDI<10? Probably not. Or would a very good outcome simply be that they move from a "severe depression" to a "moderate depression"? This gives you an estimate of the upper limit of your effect size. After defining upper limits, you should ask: What effect size would be plausible, given your background knowledge of typical effects in your field?
Only when you have drawn a reasonable plot by hand (and validated that with colleagues), start to work out the parameter values that you need to enter in the regression equations in order to arrive at the desired interaction plot.
## Do the power analysis
Now that we have our desired parameter values, we update our `sim()` function by:
* imposing the interaction effect onto our dependent variable
* adding the interaction effect to our `lm` analysis model
* extracting two p-values (we now want to compute the power both for the treatment main effect and for the interaction term)
```{r sim}
#| echo: true
#| results: hide
# CHANGE: Add interaction_effect
sim3 <- function(n=100, treatment_effect=-6, pre_post_cor = 0.6,
interaction_effect = -.2, err_var = 117, print=FALSE) {
library(Rfast)
mu <- c(23, 23) # the mean values of both variables
sigma <- matrix(
c(err_var, pre_post_cor*sqrt(err_var)*sqrt(err_var),
pre_post_cor*sqrt(err_var)*sqrt(err_var), err_var),
nrow=2, byrow=TRUE)
df <- rmvnorm(n, mu, sigma) |> data.frame()
names(df) <- c("BDI_pre", "BDI_post0")
# add treatment predictor variables
df$treatment <- c(rep(0, n/2), rep(1, n/2))
# center the BDI_pre value for better interpretability
df$BDI_pre.c <- df$BDI_pre - mean(df$BDI_pre)
# CHANGE: add the treatment main effect and the interaction to the BDI_post variable
df$BDI_post <- df$BDI_post0 + treatment_effect*df$treatment + interaction_effect*df$treatment*df$BDI_pre.c
# CHANGE: Add interaction effect to analysis model
res <- lm(BDI_post ~ BDI_pre.c * treatment, data=df)
# CHANGE: extract both focal p-values
p_values <- summary(res)$coefficients[c("treatment", "BDI_pre.c:treatment"), "Pr(>|t|)"]
if (print==TRUE) print(summary(res))
else return(p_values)
}
```
```{r power_analysis_LM2}
#| echo: true
#| results: hide
set.seed(0xBEEF)
# define all predictor and simulation variables.
iterations <- 1000
ns <- seq(100, 400, by=20)
result <- data.frame()
for (n in ns) { # loop through elements of the vector "ns"
p_values <- replicate(iterations, sim3(n=n, pre_post_cor = 0.6, interaction_effect = -.2))
# CHANGE: Analyze both sets of p-values separately
# The p-values for the main effect are stored in the first row,
# the p-values for the interaction effect are stored in the 2nd row
result <- rbind(result, data.frame(
n = n,
power_treatment = sum(p_values[1, ] < .005)/iterations,
power_interaction = sum(p_values[2, ] < .005)/iterations
)
)
# show the result after each run (not shown here in the tutorial)
print(result)
}
```
```{r}
#| echo: false
print(result)
```
While the power for detecting the main effect quickly approaches 100%, even 400 participants are by far not enough to detect the interaction effect reliably. (In particular when you recall that we set the assumed interaction effect to the largest plausible value).
# References
Hieronymus, F., Lisinski, A., Nilsson, S., & Eriksson, E. (2019). Influence of baseline severity on the effects of SSRIs in depression: An item-based, patient-level post-hoc analysis. The Lancet Psychiatry, 6(9), 745–752. [https://doi.org/10.1016/S2215-0366(19)30216-0](https://doi.org/10.1016/S2215-0366(19)30216-0)
Senn, S. (2006). Change from baseline and analysis of covariance revisited. Statistics in Medicine, 25(24), 4334–4344. [https://doi.org/10.1002/sim.2682](http://misc.biblio.s3.amazonaws.com/Senn06Change+from+baseline+and+analysis+of+covariance.pdf)
Wan, F. (2021). Statistical analysis of two arm randomized pre-post designs with one post-treatment measurement. BMC Medical Research Methodology, 21(1), 150. [https://doi.org/10.1186/s12874-021-01323-9](https://doi.org/10.1186/s12874-021-01323-9)