See our manuscript on bioRxiv.
The repository containts code for simulation of correlated binary data as described in Quaqish (2003), and fitting (i) log-link generalized estimating equations with identity variance and unknown dispersion (RR-GEE) and (ii) a penalized version (RR-PGEE), with the gradient of the Jeffreys-prior added to the estimating equations. The latter ensures finiteness of the estimates when boundary estimates occur.
Our manuscript compares the two log-link GEEs through extensive simulations and the illustrative example here includes the simulation of 100 data sets from the base simulation scenario (as referred to in the manuscript). The code in results/Simulation_illustration.Rmd
and its associated results/Simulation_illustration.html
include the following steps
- simulating the data through specifying the simulation parameters such as number of subjects, number of time points, within-cluster correlation, etc, using the code as part of the R package "binarySimCLF_1.0.tar" customized to simulate log-link binary data;
- fitting RR-GEE and RR-PGEE using the code provided by us in functions
src/gee_logPosisson_dispersion_fn.R
andsrc/pgee_logPoisson_dispersion.R
, respectively; - obtaining the simulation summaries as discussed in the manusctipt in Tables 2, 3, and 4;
- replicating Figure 1 of the manuscript, with the resulting three images saved in
figures/
.
Note that the minimal example here is for lower number of data sets (100, not 1000) compared to the manuscript to ensure the code can run for a couple of minutes and the code included is meant to replicate only one simulation scenario, i.e. one line of each of the tables mentioned.