Project Status: In progress
Jupyter Notebook Viewer - Full Project
Jupyter Notebook Viewer - Visuals
(Back to top)
This project aims to create data visualizations and zip code prioritizations to aid public health officials in deciding where and how to distribute the covid-19 vaccines and what equity metrics to use to track distribution. Since that's where we live, we're currently focused on California but can run a similar analysis on any of the 50 US states. Please reach out if you think your state or county would be interested to learn more about this project.
(Back to top)
Covid19 Data:
- Covid19 Confirmed Cases by Johns Hopkins CSSE (Update frequency - Daily (scoped to CA))
- Covid19 Total Deaths by Johns Hopkins CSSE (Update frequency - Daily (scoped to CA))
California State Data:
- Hospital Data by County (Update frequency - Daily) from CA.gov
- Population Data by Country from worldpopulationreview.com
- California GeoJSON file for Folium choropleth map from Code for America
Socioeconomic / Health Metrics:
- California Healthy Places Index (HPI) combined CSV called HPI2_MasterFile from 2019-04-24. Source: healthyplacesindex.org
- Covid Community Vulnerability Index (CCVI) score developed by Surgo Ventures
- Data Collection
- Data Cleaning
- Feature Engineering
- Exploratory Data Analysis
- Data Visualization
- Google Colab
- Jupyter Notebook
- Python
- Pandas
- NumPy
- Matplotlib
- Seaborn
- Plotly
- Folium
- Copy
- Json
- We collected multiple datasets to conduct this analysis. We imported Covid19, Hospital, Healthyplaceindex, and Population data.
- We cleaned our data by using Pandas and filtered all the information down to California's 58 counties. For two counties, such as Alpine and Sierra, the California government didn't provide any hospital information. As a result, we decided to set the value to zero.
- Regarding feature engineering, we converted the data type from object to DateTime, created columns such as the total covid cases and deaths for each county per 100,000 people, percentage of deaths based on total cases, etc.
- For our visuals, we primarily focused on using Seaborn, Plotly, and Folium. We had to incorporate a Geo JSON file for the Folium choropleth maps to get the county's correct layers.
- Clone this repo (for help see this tutorial).
- Raw data, data processing/transformation script is being kept in this repo. Click here for notebook.
- Note: If GitHub doesn't load the notebook please refer to: