This repository contains a Jupyter Notebook for predicting diabetes using machine learning techniques. The project demonstrates data preprocessing, model training, evaluation, and prediction steps.
The goal of this project is to build a predictive model that can accurately determine whether an individual has diabetes based on certain medical attributes. This is done using various machine learning algorithms.
Diabetes Prediction-checkpoint.ipynb
: The main Jupyter Notebook containing the code for data analysis, model training, and evaluation.data/
: Directory containing the dataset used for training and testing the model.models/
: Directory where trained models are saved.results/
: Directory for storing results and evaluation metrics.
The dataset used in this project is the Pima Indians Diabetes Database. It includes the following features:
- Pregnancies
- Glucose
- Blood Pressure
- Skin Thickness
- Insulin
- BMI
- Diabetes Pedigree Function
- Age
- Outcome (0 or 1 indicating the presence or absence of diabetes)
To run this project, you need to have the following software installed:
- Python 3.x
- Jupyter Notebook
- Required Python libraries (listed in
requirements.txt
)
You can install the necessary libraries using the following command:
Usage Clone the repository to your local machine:
Copy git clone https://github.com/yourusername/diabetes-prediction.git Navigate to the project directory:
Copy cd diabetes-prediction Launch Jupyter Notebook:
Copy jupyter notebook Open the Diabetes Prediction-checkpoint.ipynb notebook and run the cells to execute the code.
pip install -r requirements.txt
Model Training The notebook covers the following steps:
Data Preprocessing: Handling missing values, feature scaling, and splitting the dataset into training and testing sets. Model Selection: Training various machine learning models such as Logistic Regression, Decision Trees, and Random Forest. Model Evaluation: Evaluating the models using metrics like accuracy, precision, recall, and F1-score. Prediction: Making predictions on new data using the trained model. Results The results of the model training and evaluation are stored in the results/ directory. This includes performance metrics and visualizations.
Contributing Contributions are welcome! If you have any ideas or improvements, feel free to open an issue or submit a pull request.
License This project is licensed under the MIT License. See the LICENSE file for more details.
Acknowledgments Kaggle for providing the dataset. The open-source community for providing the tools and libraries used in this project.