Diabetes classification

The dataset contains 8 medical conditions features (X) :

Pregnancies, Glucose, BloodPressure, SkinThickness, Insulin, BMI, DiabetesPedigreeFunction, Age

Outcome is present a binary target (y) label

0 : Diabetes False

1 : Diabetes True

dimension of diabetes data: (768, 9)

Dataset is small but well labeled. There are no null values present.
very suitable to supervised machine learning formulation.
This is a binary classification problem, where we have 2 classes in the target (y) (i.e.df['Outcome']) and the medical conditions can be used as the feature (X).

Machine-Learning Models

I have used 8 different machine learning classifiers to diabetes classfication :

Results are Shown below

Logistic regression and Neural Netowrks seems to provide the best performance based on 10-fold cross validation of the dataset. Logistic regression achieves a higher F1-score as well, which is better metric for model evalution.
From the confusion matricies, decision tree has the highest success in detecting the diabetes.
Feature selection suggests the Glucose is the most crucial factor for the successful prediction of diabetes.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.ipynb_checkpoints		.ipynb_checkpoints
dashboard		dashboard
html-notebook		html-notebook
picture		picture
.DS_Store		.DS_Store
Machine-Learning-Diabetes.ipynb		Machine-Learning-Diabetes.ipynb
README.md		README.md
diabetes.csv		diabetes.csv