forked from mikenguyen13/data_analysis
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path33-mediation.Rmd
295 lines (204 loc) · 8.92 KB
/
33-mediation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
# Mediation
## Traditional
[@baron1986moderator] is outdated because of step 1, but we could still see the original idea.
3 regressions
- Step 1: $X \to Y$
- Step 2: $X \to M$
- Step 3: $X + M \to Y$
where
- $X$ = independent variable
- $Y$ = dependent variable
- $M$ = mediating variable
1. Originally, the first path from $X \to Y$ suggested by [@baron1986moderator] needs to be significant. But there are cases that you could have indirect of $X$ on $Y$ without significant direct effect of $X$ on $Y$ (e.g., when the effect is absorbed into M, or there are two counteracting effects $M_1, M_2$ that cancel out each other effect).
Mathematically,
$$
Y = b_0 + b_1 X + \epsilon
$$
$b_1$ does **not** need to be **significant**.
2. We examine the effect of $X$ on $M$. This step requires that there is a significant effect of $X$ on $M$ to continue with the analysis
Mathematically,
$$
M = b_0 + b_2 X + \epsilon
$$
where $b_2$ needs to be **significant**.
3. In this step, we want to the effect of $M$ on $Y$ "absorbs" most of the direct effect of $X$ on $Y$ (or at least makes the effect smaller).
Mathematically,
$$
Y = b_0 + b_4 X + b_3 M + \epsilon
$$
$b_4$ needs to be either smaller or insignificant.
| The effect of $X$ on $Y$ | then, $M$ ... mediates between $X$ and $Y$ |
|---------------------------------------|---------------------------------|
| completely disappear ($b_4$ insignificant) | Fully (i.e., full mediation) |
| partially disappear ($b_4$ smaller than in step 1) | Partially (i.e., partial mediation) |
4. Examine the mediation effect (i.e., whether it is significant)
- Fist approach: Sobel's test [@sobel1982asymptotic]
- Second approach: bootstrapping [@preacher2004spss] (preferable)
More details can be found [here](https://cran.ism.ac.jp/web/packages/mediation/vignettes/mediation-old.pdf)
### Example 1 {#example-1-mediation-traditional}
from [Virginia's library](https://data.library.virginia.edu/introduction-to-mediation-analysis/)
```{r, message=FALSE}
myData <-
read.csv('http://static.lib.virginia.edu/statlab/materials/data/mediationData.csv')
# Step 1 (no longer necessary)
model.0 <- lm(Y ~ X, myData)
# Step 2
model.M <- lm(M ~ X, myData)
# Step 3
model.Y <- lm(Y ~ X + M, myData)
# Step 4 (boostrapping)
library(mediation)
results <- mediate(
model.M,
model.Y,
treat = 'X',
mediator = 'M',
boot = TRUE,
sims = 500
)
summary(results)
```
- Total Effect = 0.3961 = $b_1$ (step 1) = total effect of $X$ on $Y$ without $M$
- Direct Effect = ADE = 0.0396 = $b_4$ (step 3) = direct effect of $X$ on $Y$ accounting for the indirect effect of $M$
- ACME = Average Causal Mediation Effects = $b_1 - b_4$ = 0.3961 - 0.0396 = 0.3565 = $b_2 \times b_3$ = 0.56102 \* 0.6355 = 0.3565
Using `mediation` package suggested by [@imai2010general] [@imai2010identification]. More on details of the package can be found [here](https://cran.r-project.org/web/packages/mediation/vignettes/mediation.pdf)
2 types of Inference in this package:
1. Model-based inference:
- Assumptions:
- Treatment is randomized (could use matching methods to achieve this).
- Sequential Ignorability: conditional on covariates, there is other confounders that affect the relationship between (1) treatment-mediator, (2) treatment-outcome, (3) mediator-outcome. Typically hard to argue in observational data. This assumption is for the identification of ACME (i.e., average causal mediation effects).
2. Design-based inference
Notations: we stay consistent with package instruction
- $M_i(t)$ = mediator
- $T_i$ = treatment status $(0,1)$
- $Y_i(t,m)$ = outcome where $t$ = treatment, and $m$ = mediating variables.
- $X_i$ = vector of observed pre-treatment confounders
- Treatment effect (per unit $i$) = $\tau_i = Y_i(1,M_i(1)) - Y_i (0,M_i(0))$ which has 2 effects
- Causal mediation effects: $\delta_i (t) \equiv Y_i (t,M_i(1)) - Y_i(t,M_i(0))$
- Direct effects: $\zeta (t) \equiv Y_i (1, M_i(1)) - Y_i(0, M_i(0))$
- summing up to the treatment effect: $\tau_i = \delta_i (t) + \zeta_i (1-t)$
More on sequential ignorability
$$
\{ Y_i (t', m) , M_i (t) \} \perp T_i |X_i = x
$$
$$
Y_i(t',m) \perp M_i(t) | T_i = t, X_i = x
$$
where
- $0 < P(T_i = t | X_i = x)$
- $0 < P(M_i = m | T_i = t , X_i =x)$
First condition is the standard strong ignorability condition where treatment assignment is random conditional on pre-treatment confounders.
Second condition is stronger where the mediators is also random given the observed treatment and pre-treatment confounders. This condition is satisfied only when there is no unobserved pre-treatment confounders, and post-treatment confounders, and multiple mediators that are correlated.
My understanding is that until the moment I write this note, there is **no way to test the sequential ignorability assumption**. Hence, researchers can only do sensitivity analysis to argue for their result.
## Model-based causal mediation analysis
I only put my understanding of model-based causal mediation analysis because I do not encounter design-based. Maybe in the future when I have to use it, I will start reading on it.
Fit 2 models
- mediator model: conditional distribution of the mediators $M_i | T_i, X_i$
- Outcome model: conditional distribution of $Y_i | T_i, M_i, X_i$
`mediation` can accommodate almost all types of model for both mediator model and outcome model except Censored mediator model.
The update here is that estimation of ACME does not rely on product or difference of coefficients (see \@ref(example-1-mediation-traditional) ,
which requires very strict assumption: (1) linear regression models of mediator and outcome, (2) $T_i$ and $M_i$ effects are additive and no interaction
```{r}
library(mediation)
set.seed(2014)
data("framing", package = "mediation")
med.fit <-
lm(emo ~ treat + age + educ + gender + income, data = framing)
out.fit <-
glm(
cong_mesg ~ emo + treat + age + educ + gender + income,
data = framing,
family = binomial("probit")
)
# Quasi-Bayesian Monte Carlo
med.out <-
mediate(
med.fit,
out.fit,
treat = "treat",
mediator = "emo",
robustSE = TRUE,
sims = 1000 # should be 10000 in practice
)
summary(med.out)
```
Nonparametric bootstrap version
```{r}
med.out <-
mediate(
med.fit,
out.fit,
boot = TRUE,
treat = "treat",
mediator = "emo",
sims = 1000, # should be 10000 in practice
boot.ci.type = "bca" # bias-corrected and accelerated intervals
)
summary(med.out)
```
If theoretically understanding suggests that there is treatment and mediator interaction
```{r}
med.fit <-
lm(emo ~ treat + age + educ + gender + income, data = framing)
out.fit <-
glm(
cong_mesg ~ emo * treat + age + educ + gender + income,
data = framing,
family = binomial("probit")
)
med.out <-
mediate(
med.fit,
out.fit,
treat = "treat",
mediator = "emo",
robustSE = TRUE,
sims = 100
)
summary(med.out)
test.TMint(med.out, conf.level = .95) # test treatment-mediator interaction effect
```
```{r}
plot(med.out)
```
`mediation` can be used in conjunction with any of your imputation packages.
And it can also handle **mediated moderation** or **non-binary treatment variables**, or **multi-level data**
Sensitivity Analysis for sequential ignorability
- test for unobserved pre-treatment covariates
- $\rho$ = correlation between the residuals of the mediator and outcome regressions.
- If $\rho$ is significant, we have evidence for violation of sequential ignorability (i.e., there is unobserved pre-treatment confounders).
```{r}
med.fit <-
lm(emo ~ treat + age + educ + gender + income, data = framing)
out.fit <-
glm(
cong_mesg ~ emo + treat + age + educ + gender + income,
data = framing,
family = binomial("probit")
)
med.out <-
mediate(
med.fit,
out.fit,
treat = "treat",
mediator = "emo",
robustSE = TRUE,
sims = 100
)
sens.out <-
medsens(med.out,
rho.by = 0.1, # \rho varies from -0.9 to 0.9 by 0.1
effect.type = "indirect", # sensitivity on ACME
# effect.type = "direct", # sensitivity on ADE
# effect.type = "both", # sensitivity on ACME and ADE
sims = 100)
summary(sens.out)
```
```{r}
plot(sens.out, sens.par = "rho", main = "Anxiety", ylim = c(-0.2, 0.2))
```
ACME confidence intervals contains 0 when $\rho \in (0.3,0.4)$
Alternatively, using $R^2$ interpretation, we need to specify the direction of confounder that affects the mediator and outcome variables in `plot` using `sign.prod = "positive"` (i.e., same direction) or `sign.prod = "negative"` (i.e., opposite direction).
```{r, message=FALSE}
plot(sens.out, sens.par = "R2", r.type = "total", sign.prod = "positive")
```