This analysis aims to explore the relationship between Covid-19 vaccinations and deaths attributed to Covid-19 in the U.S. between 5/31/21 - 11/30/21 to analyze the effectiveness of the Covid-19 vaccination over time.
Data related to the Covid-19 vaccination was sourced from the Center For Disease Control and Prevention (CDC) and data related to Covid-19 deaths was sourced from Johns Hopkins University (JHU).
To begin to assess the effectiveness of the vaccination over time, the % of the vaccinated population over 18 years of age (CDC) is compared to Covid related deaths per 100k (JHU). The data was broken down by U.S. region (West, Midwest, South, Northeast) based on the Census Bureau regional divisions, and a snapshot was of data was taken at the of each month to create a more accurate representation. Because the JHU cumulates deaths, the time series was adjusted to compute the difference in death figures between dates.
CDC Data
https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-County/8xkx-amqh
Johns Hopkins University Data
https://github.com/CSSEGISandData/COVID-19/tree/f57525e860010f6c5c0c103fd97e2e7282b480c8
A Makefile has been created to streamline code compilation. To execute, clone the vaccines repository to your local drive & run the makefile outlined below.
Note: If you would prefer to skip steps 1-5 (saving time but not directly accessing the authoritative data sources), clone the main_with_data branch.
- Create a data directory:
 
make data
- Download and compress the CDC data.
 
make cdc
- Create a CSV file with sampled CDC data.
 
make vaccines
- Download deaths data from JHU and create CSV files
 
make deaths
- Merge the CDC vaccination dataset with the JHU deaths dataset and clean the dataset to remove non U.S. data.
 
make merge
- Create a time series of scatter plots and save .png files to the 'img' directory
 
make scatters
- Generate an animation using the plots generated above
 
make animation
Additionally, the make file allows you to:
- Do all of the above with one command by running 
make all - Create a single scatter plot for EDA purposes with 
make test - Delete the data and img directories to start fresh with 
make clean 
Assuming all is well, the output should look something like this:

The proportion of the population that is vaccinated increases over the time frame, which you can see as the bubbles move from left to right. The quantity of deaths increases as well month-by-month, which makes sense: more deaths accumulate each month while the denominator (the 2019 Census population) stays the same. We also know that the Delta variant hit during this time period, which was much more contagious than previous strains: we should expect deaths to increase.
However, consider the trajectories of the large bubbles (representing highly populous counties) vs. their smaller counterparts. The largest bubbles progress steadily into the 70-80% vaccinated range by the end of the timeframe, and their rate of deaths generally tops out at 100 deaths/100K people. Many smaller bubbles struggle to reach the 50% vaccinated mark, and several of their death rates soar into the 300-400 deaths/100K range. Comparing these two groups suggest that vaccination is highly effective--otherwise the more highly-populated areas would likely have similar (or even worse) death rates, given the physical proximity of the population in these urban centers.
Regional differences prove to be fertile ground for continued research. Many of the small blue dots, representing counties in the South, don't reach the 50% vaccinated threshold. However, the large blue bubbles (representing urban areas in the South) follow a more direct left-to-right trajectory--and tellingly, keep their death rates low.
These findings suggest that behavioral differences rural vs. urban communities (mask wearing, social distancing, etc.) or differences in political ideology may affect vaccine adoption--but they offer little doubt that the COVID vaccines are effective at minimizing the death toll of the virus.
Dataset
- Compare different regions with similar population sizes
 - Compare individual areas with higher vaccination rates against similar areas with lower vaccination rates
 - Subset data to explore effectiveness across different age ranges
 - Segment date ranges and countes to align with peaks and valleys in Covid spread
 
Code
- Add regression line to better visualize trends in data
 - Explore the use of different chart types (map, bar chart, line graph)
 
Philip Bogden | Sophia Cofone | Qiwei "Jerry" Hu | Connor Lynch | Philip Mathieu | Bridget Mohler | Matthew Ray | Kayne Ryan | Zheng Yune |
