Skip to content

juliast224/heart-disease-prediction

 
 

Repository files navigation

Heart Disease Prediction Project

Project Overview

Heart disease is a leading cause of death globally. Early detection is crucial for effective treatment. This project aims to predict the likelihood of heart disease in individuals using machine learning techniques. We analyze various patient features like age, gender, blood pressure, cholesterol levels, and more to build a predictive model.

Business Problem

Cardiovascular diseases (CVD) are the number one cause of death globally. Early detection of heart diseases can significantly improve treatment outcomes. This project aims to leverage machine learning to predict heart disease risk, thereby aiding in early detection and better healthcare management.

Dataset

We used the 2015 Behavioral Risk Factor Surveillance System (BRFSS) data from the CDC. The dataset includes 330 columns, each corresponding to a question asked in the survey, with a total of 441,456 participants.

Methodology

  1. Data Preprocessing: Handling missing values, encoding categorical variables, and normalizing data.
  2. Feature Selection: Identifying the most relevant features for predicting heart disease.
  3. Model Building: We employed various machine learning algorithms like Logistic Regression, Decision Trees, Random Forest, Naive Bayes, Bagging Classifier, XGBoost, and Neural Networks.
  4. Model Evaluation: Models were evaluated based on accuracy, precision, recall, and ROC AUC scores.

Key Challenges

  • Handling Imbalanced Data: The dataset was imbalanced with more instances of non-heart disease cases. We used techniques like class weighting and SMOTE to address this.
  • Feature Selection: With over 300 features, identifying the most relevant features was a challenge.
  • Model Selection and Tuning: We tested various models and tuned hyperparameters to improve performance.

Results

The models were evaluated based on their ability to predict heart disease. The performance varied across different algorithms, with some showing higher precision and others better recall. The final model choice would depend on the specific requirements of the healthcare provider, balancing between accurately identifying heart disease cases and minimizing false positives.

Conclusion

This project demonstrates the potential of machine learning in predicting heart disease. The models developed can assist healthcare providers in early detection and intervention, potentially saving lives. Future work could involve integrating more diverse datasets and exploring deep learning techniques for improved accuracy.

How to Run the Project

  1. Data Preparation: Load the BRFSS 2015 dataset.
  2. Preprocessing: Run the preprocessing scripts to clean and prepare the data.
  3. Model Training: Execute the model training scripts for different algorithms.
  4. Evaluation: Use the evaluation scripts to assess the performance of each model.

Dependencies

  • Python 3.x
  • Libraries: pandas, numpy, scikit-learn, keras, matplotlib, seaborn

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%