-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy path3ipw2.Rmd
210 lines (160 loc) · 5.68 KB
/
3ipw2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
# IPTW using ML
Similar to G-computation, we will try to use machine learning methods, particularly Superlearner in estimating IPW estimates
```{r ipw2setup01ic, include=FALSE}
require(knitr)
require(glmnet)
require(kableExtra)
require(dplyr)
require(xgboost)
require(Publish)
require(tableone)
require(survey)
require(WeightIt)
require(SuperLearner)
options(knitr.kable.NA = '')
cachex=TRUE
```
```{r ipw2reg2ps, cache=cachex, echo = TRUE}
# Read the data saved at the last chapter
ObsData <- readRDS(file = "data/rhcAnalytic.RDS")
baselinevars <- names(dplyr::select(ObsData, !c(A,Y)))
ps.formula <- as.formula(paste("A ~",
paste(baselinevars,
collapse = "+")))
```
## IPTW Steps from SL
**Modelling Steps**:
We will still follow the same steps
| | |
|-|-|
|Step 1| exposure modelling: $PS = Prob(A=1|L)$|
|Step 2| Convert $PS$ to $IPW$ = $\frac{A}{PS} + \frac{1-A}{1-PS}$|
|Step 3| Assess balance in weighted sample and overlap ($PS$ and $L$)|
|Step 4| outcome modelling: $Prob(Y=1|A=1)$ to obtain treatment effect estimate|
## Step 1: exposure modelling
This is the exposure model that we decided on:
```{r ipw2ps1, cache=cachex, echo = TRUE}
ps.formula
```
```{block, type='rmdcomment'}
Fit SuperLearner (SL) to estimate propensity scores.
```
We again use the same candidate learners:
- linear model
- LASSO
- gradient boosting
```{r ipw2ps321xxx, cache=TRUE, echo = TRUE}
require(SuperLearner)
ObsData.noYA <- dplyr::select(ObsData, !c(Y,A))
PS.fit.SL <- SuperLearner(Y=ObsData$A,
X=ObsData.noYA,
cvControl = list(V = 3),
SL.library=c("SL.glm", "SL.glmnet", "SL.xgboost"),
method="method.NNLS",
family="binomial")
```
Here, `method.AUC` is also possible to use instead of `method.NNLS` for binary response. We could use `cvControl = list(V = 3, stratifyCV = TRUE)` to make the splits be stratified by the binary response.
Obtain the propesnity score (PS) values from the fit
```{r ipw2psx2, cache=TRUE, echo = TRUE}
all.pred <- predict(PS.fit.SL, type = "response")
ObsData$PS.SL <- all.pred$pred
```
Check summaries:
```{r ipw2psx2b, cache=TRUE, echo = TRUE}
summary(ObsData$PS.SL)
tapply(ObsData$PS.SL, ObsData$A, summary)
plot(density(ObsData$PS.SL[ObsData$A==0]),
col = "red", main = "")
lines(density(ObsData$PS.SL[ObsData$A==1]),
col = "blue", lty = 2)
legend("topright", c("No RHC","RHC"),
col = c("red", "blue"), lty=1:2)
```
## Step 2: Convert PS to IPW
- Convert PS from SL to IPW using the formula (again, ATE formula).
```{r ipw2psx2c, cache=TRUE, echo = TRUE}
ObsData$IPW.SL <- ObsData$A/ObsData$PS.SL + (1-ObsData$A)/(1-ObsData$PS.SL)
summary(ObsData$IPW.SL)
```
Output from pre-packged software packages to do the same (very similar estimates):
```{r ipw2psx2c2, cache=TRUE, echo = TRUE}
require(WeightIt)
W.out <- weightit(ps.formula,
data = ObsData,
estimand = "ATE",
method = "super",
SL.library = c("SL.glm",
"SL.glmnet",
"SL.xgboost"))
summary(W.out$weights)
```
```{r, cache=TRUE, echo = TRUE}
saveRDS(W.out, file = "data/ipwslps.RDS")
```
Alternatively, you can use the previously estimated PS
```{r ipw2psx2c2clone, cache=TRUE, echo = TRUE}
W.out2 <- weightit(ps.formula,
data = ObsData,
estimand = "ATE",
ps = ObsData$PS.SL)
summary(W.out2$weights)
```
## Step 3: Balance checking
- We first check balance numerically for SMD = 0.1 as threshold for balance.
```{r ipw2balp, cache=TRUE, echo = TRUE, fig.height=10, fig.width=5}
bal.tab(W.out, un = TRUE,
thresholds = c(m = .1))
```
- And also via plot
```{r ipw2balp2, cache=TRUE, echo = TRUE, fig.height=10, fig.width=5}
require(cobalt)
love.plot(W.out, binary = "std",
thresholds = c(m = .1),
abs = TRUE,
var.order = "unadjusted",
line = TRUE)
```
```{block, type='rmdcomment'}
Some covariates have SMD > 0.1 (sign of imbalance). This phenomenon is common when we use strong ML methods to obtain PS [@alam2019should].
```
## Step 4: outcome modelling
Estimate the effect of treatment on outcomes
### Crude
```{r ipw2psx4, cache=TRUE, echo = TRUE}
out.formula <- as.formula(Y ~ A)
out.fit <- glm(out.formula,
data = ObsData,
weights = IPW.SL)
publish(out.fit)
```
### Adjusted
```{block, type='rmdcomment'}
Adjusting for all covariates to deal with potential residual confounding (as was indicated by imbalance). Alternatively, could adjust for selected covariates believed to be the reasons for potential imbalance [@nguyen2017double].
```
Estimate the effect of treatment on outcomes (after adjustment)
```{r ipw2psx4adj, cache=TRUE, echo = TRUE, results='hide'}
out.formula2 <- as.formula(paste("Y~ A +",
paste(baselinevars,
collapse = "+")))
out.fit2 <- glm(out.formula2,
data = ObsData,
weights = IPW.SL)
res2 <- publish(out.fit2, digits=1)$regressionTable[2,]
```
```{r ipw2psx4adj2, cache=TRUE, echo = TRUE}
res2
```
### Adjusted (from package)
Also check the output when we used the weights from the package
```{r ipw2psx4b, cache=TRUE, echo = TRUE, results='hide'}
out.fit3 <- glm(out.formula2,
data = ObsData,
weights = W.out$weights)
res3 <- publish(out.fit3, digits=1)$regressionTable[2,]
```
```{r ipw2psx4adj3, cache=TRUE, echo = TRUE}
res3
```
```{r, cache=TRUE, echo = TRUE}
saveRDS(out.fit3, file = "data/ipwsl.RDS")
```