3ipw2.Rmd

# IPTW using ML

Similar to G-computation, we will try to use machine learning methods, particularly Superlearner in estimating IPW estimates

```{r ipw2setup01ic, include=FALSE}
require(knitr)
require(glmnet)
require(kableExtra)
require(dplyr)
require(xgboost)
require(Publish)
require(tableone)
require(survey)
require(WeightIt)
require(SuperLearner)
options(knitr.kable.NA = '')
cachex=TRUE
```

```{r ipw2reg2ps, cache=cachex, echo = TRUE}
# Read the data saved at the last chapter
ObsData <- readRDS(file = "data/rhcAnalytic.RDS")
baselinevars <- names(dplyr::select(ObsData, !c(A,Y)))
ps.formula <- as.formula(paste("A ~",
                               paste(baselinevars,
                                     collapse = "+")))
```

## IPTW Steps from SL

**Modelling Steps**:

We will still follow the same steps

|  |  |
|-|-|
|Step 1| exposure modelling: $PS = Prob(A=1|L)$|
|Step 2| Convert $PS$ to $IPW$  = $\frac{A}{PS} + \frac{1-A}{1-PS}$|
|Step 3| Assess balance in weighted sample and overlap ($PS$ and $L$)|
|Step 4| outcome modelling: $Prob(Y=1|A=1)$ to obtain treatment effect estimate|

## Step 1: exposure modelling

This is the exposure model that we decided on:

```{r ipw2ps1, cache=cachex, echo = TRUE}
ps.formula
```

```{block, type='rmdcomment'}
Fit SuperLearner (SL) to estimate propensity scores. 
```


We again use the same candidate learners:

- linear model
- LASSO
- gradient boosting

```{r ipw2ps321xxx, cache=TRUE, echo = TRUE}
require(SuperLearner)
ObsData.noYA <- dplyr::select(ObsData, !c(Y,A))
PS.fit.SL <- SuperLearner(Y=ObsData$A, 
                       X=ObsData.noYA, 
                       cvControl = list(V = 3),
                       SL.library=c("SL.glm", "SL.glmnet", "SL.xgboost"), 
                       method="method.NNLS",
                       family="binomial")
```

Here, `method.AUC` is also possible to use instead of `method.NNLS` for binary response. We could use `cvControl = list(V = 3, stratifyCV = TRUE)` to make the splits be stratified by the binary response.

Obtain the propesnity score (PS) values from the fit

```{r ipw2psx2, cache=TRUE, echo = TRUE}
all.pred <- predict(PS.fit.SL, type = "response")
ObsData$PS.SL <- all.pred$pred
```

Check summaries: 

```{r ipw2psx2b, cache=TRUE, echo = TRUE}
summary(ObsData$PS.SL)
tapply(ObsData$PS.SL, ObsData$A, summary)
plot(density(ObsData$PS.SL[ObsData$A==0]), 
     col = "red", main = "")
lines(density(ObsData$PS.SL[ObsData$A==1]), 
      col = "blue", lty = 2)
legend("topright", c("No RHC","RHC"), 
       col = c("red", "blue"), lty=1:2)
```


## Step 2: Convert PS to IPW 

- Convert PS from SL to IPW using the formula (again, ATE formula). 

```{r ipw2psx2c, cache=TRUE, echo = TRUE}
ObsData$IPW.SL <- ObsData$A/ObsData$PS.SL + (1-ObsData$A)/(1-ObsData$PS.SL)
summary(ObsData$IPW.SL)
```

Output from pre-packged software packages to do the same (very similar estimates):

```{r ipw2psx2c2, cache=TRUE, echo = TRUE}
require(WeightIt)
W.out <- weightit(ps.formula, 
                    data = ObsData, 
                    estimand = "ATE",
                    method = "super",
                    SL.library = c("SL.glm", 
                                   "SL.glmnet", 
                                   "SL.xgboost"))
summary(W.out$weights)
```

```{r, cache=TRUE, echo = TRUE}
saveRDS(W.out, file = "data/ipwslps.RDS")
```

Alternatively, you can use the previously estimated PS

```{r ipw2psx2c2clone, cache=TRUE, echo = TRUE}
W.out2 <- weightit(ps.formula, 
                    data = ObsData, 
                    estimand = "ATE",
                    ps = ObsData$PS.SL)
summary(W.out2$weights)
```

## Step 3: Balance checking

- We first check balance numerically for SMD = 0.1 as threshold for balance.


```{r ipw2balp, cache=TRUE, echo = TRUE, fig.height=10, fig.width=5}
bal.tab(W.out, un = TRUE, 
        thresholds = c(m = .1))
```

- And also via plot

```{r ipw2balp2, cache=TRUE, echo = TRUE, fig.height=10, fig.width=5}
require(cobalt)
love.plot(W.out, binary = "std",
          thresholds = c(m = .1),
          abs = TRUE, 
          var.order = "unadjusted", 
          line = TRUE)
```

```{block, type='rmdcomment'}
Some covariates have SMD > 0.1 (sign of imbalance). This phenomenon is common when we use strong ML methods to obtain PS [@alam2019should].
```

## Step 4: outcome modelling 

Estimate the effect of treatment on outcomes 

### Crude

```{r ipw2psx4, cache=TRUE, echo = TRUE}
out.formula <- as.formula(Y ~ A)
out.fit <- glm(out.formula,
               data = ObsData,
               weights = IPW.SL)
publish(out.fit)
```

### Adjusted

```{block, type='rmdcomment'}
Adjusting for all covariates to deal with potential residual confounding (as was indicated by imbalance). Alternatively, could adjust for selected covariates believed to be the reasons for potential imbalance [@nguyen2017double].
```

Estimate the effect of treatment on outcomes (after adjustment)

```{r ipw2psx4adj, cache=TRUE, echo = TRUE, results='hide'}
out.formula2 <- as.formula(paste("Y~ A +", 
                               paste(baselinevars, 
                                     collapse = "+")))
out.fit2 <- glm(out.formula2,
               data = ObsData,
               weights = IPW.SL)
res2 <- publish(out.fit2, digits=1)$regressionTable[2,]
```

```{r ipw2psx4adj2, cache=TRUE, echo = TRUE}
res2
```

### Adjusted (from package)

Also check the output when we used the weights from the package

```{r ipw2psx4b, cache=TRUE, echo = TRUE, results='hide'}
out.fit3 <- glm(out.formula2,
               data = ObsData,
               weights = W.out$weights)
res3 <- publish(out.fit3, digits=1)$regressionTable[2,]
```

```{r ipw2psx4adj3, cache=TRUE, echo = TRUE}
res3
```

```{r, cache=TRUE, echo = TRUE}
saveRDS(out.fit3, file = "data/ipwsl.RDS") 
```