This repository contains an exploratory data analysis (EDA) and various statistical tests performed on a dataset focused on stroke prediction. The analysis includes linear and logistic regression models, univariate descriptive analysis, ANOVA, and chi-square tests, among others. The dataset provides insights into factors such as age, work type, smoking status, hypertension, and marital status, and their potential impact on the likelihood of stroke.
-
Data_Documentation.docx: Detailed documentation of the dataset and methodology followed.
-
Graphs & Models:
- Linear Regression: Analysis of the relationship between age and average glucose levels.
- Logistic Regression: Binary outcome predictions of stroke occurrence.
- Univariant Descriptive Analysis: Summary statistics for key variables.
- ANOVA Test: Comparison of variance between groups.
- Chi-Square, OR, and RR Tests: Relationships between categorical variables.
- Forecast Model: A model predicting glucose levels based on age.
-
Screenshots: Visual representations of the various graphs and statistical tests conducted.
- Download the
project files
from the repository. - Open the Excel workbook and navigate through the sheets to review each analysis.
- Refer to the Data_Documentation.docx for a detailed explanation of each test and analysis conducted.
- Microsoft Excel for analysis and statistical testing.
- Various statistical methods including regression, ANOVA, and chi-square tests.
The visual representation of the analysis can be found in the screenshots
folder, which includes the following:
- Linear Regression Analysis
- Logistic Regression Model
- Forecast Model
- ANOVA, Chi-Square, and RR Tests
The dataset used in this analysis focuses on stroke prediction based on medical and lifestyle attributes like age, hypertension, work type, smoking status, and more. You can access the dataset on Kaggle: Stroke Prediction Dataset