GitHub - iair/deployment_demo_model_training: This a demo made to deploy a simple machine learning model as example to teach student to use open source as a tool to do it

# Deployment Demo: Model Training

This repository demonstrates the deployment of a machine learning model training pipeline. The project includes data preprocessing, model training, evaluation, and saving the trained model for later use.

## Table of Contents
- [Project Structure](#project-structure)
- [Requirements](#requirements)
- [Setup](#setup)
- [Usage](#usage)
- [Details](#details)
  - [Data](#data)
  - [Training](#training)
- [Contributing](#contributing)
- [License](#license)

---

## Project Structure

```plaintext
deployment_demo_model_training/
├── data/                     # Input data folder (ignored by Git)
├── models/                   # Trained models folder (ignored by Git)
├── src/                      # Source code
│   ├── data/                 # Data-related modules
│   │   ├── data_loader.py    # Data loading
│   │   ├── data_splitter.py  # Splits data into train/test sets
│   │   ├── data_processor.py # Data preprocessing and transformation
│   ├── model/                # Model-related modules
│   │   ├── trainer.py        # Model training
│   │   ├── evaluator.py      # Model evaluation
│   │   ├── saver.py          # Saves the trained model
│   ├── main.py               # Orchestrates the training pipeline
├── .gitignore                # Specifies files and folders to ignore by Git
├── pyproject.toml            # Poetry configuration
├── README.md                 # Project documentation (this file)

Requirements

Python 3.10 or higher
Poetry for dependency management

Dependencies

All dependencies are listed in pyproject.toml. Key libraries include:

pandas: Data manipulation
xgboost: Model training
scikit-learn: Data splitting and evaluation
joblib: Saving and loading models

Setup

Clone the Repository:

git clone https://github.com/iair/deployment_demo_model_training.git
cd deployment_demo_model_training

Install Dependencies: Use Poetry to install all required dependencies.
```
poetry install
```
Prepare the Environment:
- Ensure the data/ folder contains your dataset.
- The models/ folder will store trained models.

Usage

Run the training pipeline using the following command:

poetry run python src/main.py

Output

The trained model will be saved in the models/ folder with the current date appended to its name.

Details

Data

Input Data:
- Place the dataset in the data/ folder.
- The dataset should include a target column for supervised learning.
Preprocessing:
- Missing values are handled and numerical features are scaled.
- Specify columns for imputation in data_processor.py.

Training

Model:
- The pipeline uses XGBoost for training.
- Default parameters are defined in trainer.py but can be modified as needed.
Evaluation:
- Model performance is evaluated on a test set.
- Metrics like accuracy are printed to the console.
Saving:
- Trained models are saved with a timestamp for version control.

Contributing

Contributions are welcome! Please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature/YourFeature).
Commit your changes (git commit -m "Add your message").
Push to the branch (git push origin feature/YourFeature).
Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Requirements

Dependencies

Setup

Usage

Output

Details

Data

Training

Contributing

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
models		models
src		src
.DS_Store		.DS_Store
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

License

iair/deployment_demo_model_training

Folders and files

Latest commit

History

Repository files navigation

Requirements

Dependencies

Setup

Usage

Output

Details

Data

Training

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages