UofT | DSI - Team Project: 🔎 Heart Failure Prediction

🔍 Business Question

Can heart failure be accurately predicted using only demographic and baseline pre-stress test data, without the need to conduct exercise stress tests? The goal is to explore whether machine learning models can predict heart disease risk by utilizing only basic data available prior to stress testing.

🛠️ Why Address This Problem?

Value to Patients

Accessible Screening: Early detection for high-risk patients without stress testing, benefiting remote or resource-limited settings.
Reduced Burden: Stress tests can be physically and emotionally taxing; using baseline data streamlines diagnosis and supports proactive care.

Value to Healthcare Providers

Efficiency: Baseline data models allow quick preliminary screening, saving time for acute cases.
Data-Driven Decisions: Machine learning insights enhance diagnostic accuracy and care quality.

Value to the Healthcare System

Cost Savings: Reducing unnecessary stress tests can save significant resources (e.g., Ontario spends ~C$300M annually on ~500,000 non-invasive cardiac tests).

🎯 Project Overview

This project analyzes a dataset containing clinical and demographic features to predict heart disease events. The goal is to create a machine learning model capable of predicting the likelihood of heart disease using only basic patient data, improving accessibility to heart disease screening and potentially reducing mortality rates.

📊 Dataset

We are using the Heart Failure Prediction Dataset which includes:

11 clinical features (2 demographic and 9 medical measurements)
Binary classification target (Heart Disease: Yes/No)

👥 Team Members

🎥 Project Video

🏗️ Project Folder Structure

|-- data
|   |-- processed     
|   |-- raw           
|-- experiments
|   |-- model_development  
|-- models
|   |-- logistic_regression  
|   |-- neural_networks      
|   |-- xgboost            
|-- reports
|   |-- figures

🏁 Setup Instructions

Clone the repository:

git clone https://github.com/sehroz/heart-failure-prediction.git
cd heart-failure-prediction

Create and activate conda environment:

conda env create -f environment.yml

conda activate heart-ml

Run Jupyter Notebook:

jupyter notebook

💡 Project Context

Cardiovascular diseases (CVDs) are the number 1 cause of death globally, taking an estimated 17.9 million lives each year, which accounts for 31% of all deaths worldwide. Four out of 5 CVD deaths are due to heart attacks and strokes, and one-third of these deaths occur prematurely in people under 70 years of age. Heart failure is a common event caused by CVDs, and this dataset contains 11 features that can be used to predict a possible heart disease.

People with cardiovascular disease or at high cardiovascular risk (due to the presence of one or more risk factors such as hypertension, diabetes, hyperlipidemia, or already established disease) need early detection and management, where a machine learning model can be of great help.

📋 Attribute Information

Age: Age of the patient [years]
Sex: Sex of the patient [M: Male, F: Female]
ChestPainType: Chest pain type [TA: Typical Angina, ATA: Atypical Angina, NAP: Non-Anginal Pain, ASY: Asymptomatic]
RestingBP: Resting blood pressure [mm Hg]
Cholesterol: Serum cholesterol [mg/dl]
FastingBS: Fasting blood sugar [1: if FastingBS > 120 mg/dl, 0: otherwise]
RestingECG: Resting electrocardiogram results [Normal: Normal, ST: ST-T wave abnormality, LVH: Left ventricular hypertrophy]
MaxHR: Maximum heart rate achieved [Numeric]
ExerciseAngina: Exercise-induced angina [Y: Yes, N: No]
Oldpeak: ST depression induced by exercise [Numeric]
ST_Slope: The slope of the peak exercise ST segment [Up: upsloping, Flat: flat, Down: downsloping]
HeartDisease: Target class [1: heart disease, 0: Normal]

📋 Main Findings

Insights

Stress Testing: Baseline data offers reasonable predictions, but incorporating stress tests enhances diagnostic accuracy.
Model Performance: XGBoost outperformed other models, showing robust predictive ability. Neural Networks required more tuning but captured complex relationships.
Minimizing False Negatives: Prioritized recall to reduce the risk of undiagnosed heart disease.

Challenges & Solutions

Feature Encoding: Used one-hot encoding for categorical variables like ChestPainType.
New Model Adoption: Successfully implemented XGBoost using documentation and GridSearch, despite limited prior experience.
Explainability: Enhanced interpretability of Neural Networks using SHAP values.

🧪 Results and Insights

Model Performance:

The XGBoost model demonstrated the highest accuracy (92%) and F1-score, making it the best-performing algorithm among the models tested (Logistic Regression, Neural Networks, and XGBoost).

Feature Importance:

The top predictors of heart disease include:

Age: Higher age groups show increased risk.
ST_Slope: A flat or down-sloping ST segment strongly correlates with heart disease.
ChestPainType: Asymptomatic cases exhibit the highest risk factor.
ExerciseAngina: Strongly linked to higher disease likelihood.
Oldpeak: ST depression during exercise was a significant indicator.

📊 Data Analysis & Results

🎧 Project Overview Audio

Listen to the project report:

Heart.Failure.Project.Report.1.webm

To learn more about our methodology, results, and insights, view the complete project report.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UofT | DSI - Team Project: 🔎 Heart Failure Prediction

🔍 Business Question

🛠️ Why Address This Problem?

Value to Patients

Value to Healthcare Providers

Value to the Healthcare System

🎯 Project Overview

📊 Dataset

👥 Team Members

🎥 Project Video

🏗️ Project Folder Structure

🏁 Setup Instructions

💡 Project Context

📋 Attribute Information

📋 Main Findings

Insights

Challenges & Solutions

🧪 Results and Insights

Model Performance:

Feature Importance:

📊 Data Analysis & Results

🎧 Project Overview Audio

About

Releases 2

Contributors 5

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 95 Commits
data		data
experiments		experiments
models		models
reports		reports
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml

sehroz/heart-failure-prediction

Folders and files

Latest commit

History

Repository files navigation

UofT | DSI - Team Project: 🔎 Heart Failure Prediction

🔍 Business Question

🛠️ Why Address This Problem?

Value to Patients

Value to Healthcare Providers

Value to the Healthcare System

🎯 Project Overview

📊 Dataset

👥 Team Members

🎥 Project Video

🏗️ Project Folder Structure

🏁 Setup Instructions

💡 Project Context

📋 Attribute Information

📋 Main Findings

Insights

Challenges & Solutions

🧪 Results and Insights

Model Performance:

Feature Importance:

📊 Data Analysis & Results

🎧 Project Overview Audio

About

Topics

Resources

Stars

Watchers

Forks

Releases 2

Contributors 5

Languages