Lyme Disease is the the (second) fastest growing contagious disease in the world. For this project, I built a climate-based classification model with a ROC AUC of 0.96 to predict which US counties will have a high incidence of Lyme Disease. This was done by:
- Engineering a dataset from scratch by merging Center for Disease Control data with National Oceanic and Atmospheric Administration climate data which was parsed from 78,000 csv files.
- Using K-Nearest Neighbors, Logistic Regression, Support Vector Machines and Random Forest algorithms optimized with grid search.
Note: This project was completed in 2019, before the emergence of COVID-19. At the time, it was the fastest growing contagious disease. An amendment was made above to account for this.