Diabetes-Prediction

In this particular project, I have used data from Kaggle of diabetes patients. In this dataset, I have 2000 readings and 9 columns. First I have divided it into dependent and independent variables. At initially I used the Logistic Regression model, and I got 78.2% accuracy for the training set. As I started improving my model by using Logit function from the statsmodels module with all variables (i.e. with 8 variables) I got 77.4% accuracy for testing dataset. Again tried to improve the model by using the backward elimination method. Eliminated variable "SkinThickness" with 0.5 as a threshold value, as it has the highest 'p' value. I got 77.8% accuracy for the testing dataset. Now plotted the ROC (Receiver Operating Characteristic) curve and tried to improve the roc_auc_score. At last, I was able to achieve 0.75 roc_auc_score and accuracy was 76%, which is good enough. I compromised with the accuracy so that I can get a minimum value of False Negative. False Negative value is the number of patients who have Diabetes but my model is predicting them as non-diabetic patients.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Diabetes Prediction.py		Diabetes Prediction.py
README.md		README.md
diabetes.csv		diabetes.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diabetes-Prediction

About

Releases

Packages

Languages

soorykant/Diabetes-Prediction

Folders and files

Latest commit

History

Repository files navigation

Diabetes-Prediction

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages