I've explored, analysed the Pima Indians Diabetes Dataset, and applied Machine Learning Techniques. I've analysed and discussed the results using the knowledge acquired as a experienced Registered Dietitian.
The Pima Indian Diabetes Dataset, originally from the National Institute of Diabetes and Digestive and Kidney Diseases, contains information of 768 women from a population near Phoenix, Arizona, USA. The outcome tested was Diabetes, 258 tested positive and 500 tested negative. Therefore, there is one target (dependent) variable and the 8 attributes (TYNECKI, 2018): pregnancies, OGTT(Oral Glucose Tolerance Test), blood pressure, skin thickness, insulin, BMI(Body Mass Index), age, pedigree diabetes function. The Pima population has been under study by the National Institute of Diabetes and Digestive and Kidney Diseases at intervals of 2 years since 1965. As epidemiological evidence indicates that T2DM results from interaction of genetic and environmental factors, the Pima Indians Diabetes Dataset includes information about attributes that could and should be related to the onset of diabetes and its future complications.
📌 Python v 3.7
📌 Libraries used:
- pandas
- numpy
- seaborn
- matplotlib.pyplot
- sklearn
- statsmodels
📌 Jupyter notebook was used. 🔔 In case of any problem to visualise the project, please check here