Heart Disease Prediction Project

Project Overview

Heart disease is a leading cause of death globally. Early detection is crucial for effective treatment. This project aims to predict the likelihood of heart disease in individuals using machine learning techniques. We analyze various patient features like age, gender, blood pressure, cholesterol levels, and more to build a predictive model.

Business Problem

Cardiovascular diseases (CVD) are the number one cause of death globally. Early detection of heart diseases can significantly improve treatment outcomes. This project aims to leverage machine learning to predict heart disease risk, thereby aiding in early detection and better healthcare management.

Dataset

We used the 2015 Behavioral Risk Factor Surveillance System (BRFSS) data from the CDC. The dataset includes 330 columns, each corresponding to a question asked in the survey, with a total of 441,456 participants.

Methodology

Data Preprocessing: Handling missing values, encoding categorical variables, and normalizing data.
Feature Selection: Identifying the most relevant features for predicting heart disease.
Model Building: We employed various machine learning algorithms like Logistic Regression, Decision Trees, Random Forest, Naive Bayes, Bagging Classifier, XGBoost, and Neural Networks.
Model Evaluation: Models were evaluated based on accuracy, precision, recall, and ROC AUC scores.

Key Challenges

Handling Imbalanced Data: The dataset was imbalanced with more instances of non-heart disease cases. We used techniques like class weighting and SMOTE to address this.
Feature Selection: With over 300 features, identifying the most relevant features was a challenge.
Model Selection and Tuning: We tested various models and tuned hyperparameters to improve performance.

Results

The models were evaluated based on their ability to predict heart disease. The performance varied across different algorithms, with some showing higher precision and others better recall. The final model choice would depend on the specific requirements of the healthcare provider, balancing between accurately identifying heart disease cases and minimizing false positives.

Conclusion

This project demonstrates the potential of machine learning in predicting heart disease. The models developed can assist healthcare providers in early detection and intervention, potentially saving lives. Future work could involve integrating more diverse datasets and exploring deep learning techniques for improved accuracy.

How to Run the Project

Data Preparation: Load the BRFSS 2015 dataset.
Preprocessing: Run the preprocessing scripts to clean and prepare the data.
Model Training: Execute the model training scripts for different algorithms.
Evaluation: Use the evaluation scripts to assess the performance of each model.

Dependencies

Python 3.x
Libraries: pandas, numpy, scikit-learn, keras, matplotlib, seaborn

License

This project is licensed under the MIT License - see the LICENSE.md file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.gitignore		.gitignore
Heart_disease_prediction.ipynb		Heart_disease_prediction.ipynb
LICENSE		LICENSE
README.md		README.md
heart_disease_health_features.csv		heart_disease_health_features.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heart Disease Prediction Project

Project Overview

Business Problem

Dataset

Methodology

Key Challenges

Results

Conclusion

How to Run the Project

Dependencies

License

About

Releases

Packages

Languages

License

juliast224/heart-disease-prediction

Folders and files

Latest commit

History

Repository files navigation

Heart Disease Prediction Project

Project Overview

Business Problem

Dataset

Methodology

Key Challenges

Results

Conclusion

How to Run the Project

Dependencies

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages