Skip to content

A climate-based classification model predicting which U.S. counties will have a high incidence of Lyme Disease, the second fastest growing contagious disease in the world

Notifications You must be signed in to change notification settings

tcbonds/lyme-disease-classifier

Repository files navigation

lyme-disease-classifier

Lyme Disease is the the (second) fastest growing contagious disease in the world. For this project, I built a climate-based classification model with a ROC AUC of 0.96 to predict which US counties will have a high incidence of Lyme Disease. This was done by:

  • Engineering a dataset from scratch by merging Center for Disease Control data with National Oceanic and Atmospheric Administration climate data which was parsed from 78,000 csv files.
  • Using K-Nearest Neighbors, Logistic Regression, Support Vector Machines and Random Forest algorithms optimized with grid search.

Note: This project was completed in 2019, before the emergence of COVID-19. At the time, it was the fastest growing contagious disease. An amendment was made above to account for this.

About

A climate-based classification model predicting which U.S. counties will have a high incidence of Lyme Disease, the second fastest growing contagious disease in the world

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published