In this particular project, I have used data from Kaggle of diabetes patients. In this dataset, I have 2000 readings and 9 columns. First I have divided it into dependent and independent variables. At initially I used the Logistic Regression model, and I got 78.2% accuracy for the training set. As I started improving my model by using Logit function from the statsmodels module with all variables (i.e. with 8 variables) I got 77.4% accuracy for testing dataset. Again tried to improve the model by using the backward elimination method. Eliminated variable "SkinThickness" with 0.5 as a threshold value, as it has the highest 'p' value. I got 77.8% accuracy for the testing dataset. Now plotted the ROC (Receiver Operating Characteristic) curve and tried to improve the roc_auc_score. At last, I was able to achieve 0.75 roc_auc_score and accuracy was 76%, which is good enough. I compromised with the accuracy so that I can get a minimum value of False Negative. False Negative value is the number of patients who have Diabetes but my model is predicting them as non-diabetic patients.
-
Notifications
You must be signed in to change notification settings - Fork 0
soorykant/Diabetes-Prediction
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
No description, website, or topics provided.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published