Skip to content

soorykant/Diabetes-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Diabetes-Prediction

In this particular project, I have used data from Kaggle of diabetes patients. In this dataset, I have 2000 readings and 9 columns. First I have divided it into dependent and independent variables. At initially I used the Logistic Regression model, and I got 78.2% accuracy for the training set. As I started improving my model by using Logit function from the statsmodels module with all variables (i.e. with 8 variables) I got 77.4% accuracy for testing dataset. Again tried to improve the model by using the backward elimination method. Eliminated variable "SkinThickness" with 0.5 as a threshold value, as it has the highest 'p' value. I got 77.8% accuracy for the testing dataset. Now plotted the ROC (Receiver Operating Characteristic) curve and tried to improve the roc_auc_score. At last, I was able to achieve 0.75 roc_auc_score and accuracy was 76%, which is good enough. I compromised with the accuracy so that I can get a minimum value of False Negative. False Negative value is the number of patients who have Diabetes but my model is predicting them as non-diabetic patients.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages