- dGEM workflow
- Package requirements
- Instructions for installing and running pda package
- A database with clear and consistent variable names
- On Windows: download and install RTools
Below are the instructions for installing and then running the package.
There are several ways in which one could install the pda
package.
-
In RStudio, create a new project: File -> New Project... -> New Directory -> New Project.
-
Execute the following R code:
# Install the latest version of PDA in R:
install.packages("pda")
library(pda)
# Or you can install via github:
install.packages("devtools")
library(devtools)
devtools::install_github("penncil/pda")
library(pda)
Below are two ways to run the pda examples.
In the toy example below we aim to analyze the association of lung status with age and sex using logistic regression, data(lung) from 'survival', we randomly assign to 3 sites: 'site1', 'site2', 'site3'. We run the example in local directory. In actual collaboration, our pda-ota (pda over the air) platform can be used to coordinate the project using a cloud-based server.
You can either
demo(dGEM)
or
Step 0: load related R packages and prepare sample data
# load packages
require(survival)
require(data.table)
require(pda)
# sample data, lung, from "survival" package
data(lung)
# create 3 sites, split the lung data amongst them
sites = c('site1', 'site2', 'site3')
set.seed(42)
lung2 <- lung[,c('time', 'status', 'age', 'sex')]
lung2$sex <- lung2$sex-1
lung2$status <- ifelse(lung2$status == 2, 1, 0)
lung_split<-split(lung2, sample(1:length(sites), nrow(lung), replace=TRUE))
## fit logistic reg using pooled data
fit.pool <- glm(status ~ age + sex, family = 'binomial', data = lung2)
Step 1: Initialization
# ############################ STEP 1: initialize ###############################
## lead site1: please review and enter "1" to allow putting the control file to the server
control <- list(project_name = 'Lung cancer study',
step = 'initialize',
sites = sites,
heterogeneity = TRUE,
model = 'dGEM',
family = 'binomial',
outcome = "status",
variables = c('age', 'sex'),
variables_site_level = c('volume'),
optim_maxit = 100,
lead_site = 'site1',
upload_date = as.character(Sys.time()) )
##' specify control file (to be shared with others)
pda(site_id = 'site1', control = control, dir = getwd())
##' run dGEM step 1 under site 3
pda(site_id = 'site3', ipdata = lung_split[[3]], dir=getwd())
##' run dGEM step 1 under site 2
pda(site_id = 'site2', ipdata = lung_split[[2]], dir=getwd())
## run dGEM step 1 under site 1
##' control.json is also automatically updated
pda(site_id = 'site1', ipdata = lung_split[[1]], dir=getwd())
Step 2: Calculate hospital effects
#' ############################' STEP 2: derive ###############################
##' run dGEM step 2 under site 3
pda(site_id = 'site3', ipdata = lung_split[[3]], hosdata = c('volume' = 3000), dir=getwd())
##' run dGEM step 2 under site 2
pda(site_id = 'site2', ipdata = lung_split[[2]], hosdata = c('volume' = 5000), dir=getwd())
##' run dGEM step 2 under site 1
##' control.json is also automatically updated
pda(site_id = 'site1', ipdata = lung_split[[1]], hosdata = c('volume' = 10000), dir=getwd())
Step 3: Calculate counterfactural rates
#' ############################' STEP 3: estimate ###############################
##' run dGEM step 3 under site 3
pda(site_id = 'site3', ipdata = lung_split[[3]], dir=getwd())
##' run dGEM step 3 under site 2
pda(site_id = 'site2', ipdata = lung_split[[2]], dir=getwd())
##' run dGEM step 3 under site 1
##' control.json is also automatically updated
pda(site_id = 'site1', ipdata = lung_split[[1]], dir=getwd())
Step 4: Calculate standardized event rates
#' ############################' STEP 4: synthesize ###############################
##' run final step in the lead site or coordinating center
pda(site_id = 'site1', ipdata = lung_split[[1]], dir=getwd())
##' the PDA dGEM is now completed!