Integration of individualized and population-level molecular epidemiology data to model COVID-19 outcomes
Ted Ling Hu, Lacy M. Simons, Taylor J. Dean, Estefany Rios Guzman, Matthew T. Caputo, Arghavan Alisoltani, Chao Qi, Michael Malczynski, Timothy Blanke, Lawrence J. Jennings, Michael G. Ison, Chad J. Achenbach, Ramon Lorenzo-Redondo, Egon A. Ozer, & Judd F. Hultquist
This repository contains the scripts needed to generate the figures and analysis as reported in Ling Hu and Simons et al. 2024 (Cell Reports Medicine). The script may need to be adapted to the local environment. Due to IRB constraints we are unable to share clinical data used to generate this data. We do however include GISAID accession IDs used to generate the trees in Figure 2.
- The second confirmed case of SARS-CoV-2 was discovered in Chicago.
- Since then, Chicago has accumulated over 1.4 million cases and 15,000 deaths.
- Genomic surveillance conducted by Northwestern University reveals that when accounting for epidemiolpogical, demographic and clinical (including vaccination) data, viral clades are not significantly associated with clinical severity.
SARS-CoV-2 variants with enhanced transmissibility and immune escape have emerged periodically throughout the COVID-19 pandemic, but the impact of these variants on disease severity has remained unclear. In this single-center, retrospective cohort study, we examined the association between SARS-CoV-2 clade and patient outcome over a two-year period in Chicago, Illinois, USA. Between March 2020 and March 2022, 14,252 residual diagnostic specimens were collected from SARS-CoV-2-positive inpatients and outpatients alongside linked clinical and demographic metadata, of which 2,114 were processed for viral whole genome sequencing. Clade 20G and both the Delta and Omicron variants were initially associated with a decreased risk of hospitalization when controlling for patient demographics and vaccination status, but this decreased severity was not reflected among hospitalized patients. Subsequent controls for epidemiological factors including case counts, sampling, and standard-of-care negated the association between viral clade and hospitalization, highlighting the importance of these variables in disease severity studies.
Python
- Pandas
- Numpy
- statsmodels
- scipy
- collections
- itertools
- datetime
- sklearn
- seaborn
- math
- matplotlib
- dplyr
- emmeans
- treeio
- ggtree
- emmeapeans
- ggtreeExtra
- ggplot2
- RColorBrewer
MAFFT v7.453
MEGAX v10.1.8.69
IQ-Tree v2.0.5
- ModelFinder
Chicago hospitalizations from CDPH
Chicago hospitalizations from IDPH
Chicago cases and deaths from IDPH
Cook County cases and deaths from IDPH
Cook County vaccination from IDPH
Cook County clades from GISAID
Cook County and Chicago cases from IDPH
mafft --auto --thread -1 --keeplength --addfragments Sequences.fasta NC_045512.fasta > Aligned.fasta
iqtree2 -s Aligned.fasta -T AUTO --alrt 1000 #for Chicago phylogeny also -B 1000 was used
treetime --confidence --relax 1.0 0.5 --aln Aligned.fasta --tree Aligned.fasta.treefile --dates dates.csv --coalescent skyline --clock-filter 4 --clock-rate 0.0008 --clock-std-dev 0.0004 --branch-length-mode marginal
treetime mugration --tree TreeTime_Out/timetree.nexus --states geo.csv --attribute geo_loc