Skip to content

This a demo made to deploy a simple machine learning model as example to teach student to use open source as a tool to do it

License

Notifications You must be signed in to change notification settings

iair/deployment_demo_model_training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

# Deployment Demo: Model Training

This repository demonstrates the deployment of a machine learning model training pipeline. The project includes data preprocessing, model training, evaluation, and saving the trained model for later use.

## Table of Contents
- [Project Structure](#project-structure)
- [Requirements](#requirements)
- [Setup](#setup)
- [Usage](#usage)
- [Details](#details)
  - [Data](#data)
  - [Training](#training)
- [Contributing](#contributing)
- [License](#license)

---

## Project Structure

```plaintext
deployment_demo_model_training/
├── data/                     # Input data folder (ignored by Git)
├── models/                   # Trained models folder (ignored by Git)
├── src/                      # Source code
│   ├── data/                 # Data-related modules
│   │   ├── data_loader.py    # Data loading
│   │   ├── data_splitter.py  # Splits data into train/test sets
│   │   ├── data_processor.py # Data preprocessing and transformation
│   ├── model/                # Model-related modules
│   │   ├── trainer.py        # Model training
│   │   ├── evaluator.py      # Model evaluation
│   │   ├── saver.py          # Saves the trained model
│   ├── main.py               # Orchestrates the training pipeline
├── .gitignore                # Specifies files and folders to ignore by Git
├── pyproject.toml            # Poetry configuration
├── README.md                 # Project documentation (this file)

Requirements

  • Python 3.10 or higher
  • Poetry for dependency management

Dependencies

All dependencies are listed in pyproject.toml. Key libraries include:

  • pandas: Data manipulation
  • xgboost: Model training
  • scikit-learn: Data splitting and evaluation
  • joblib: Saving and loading models

Setup

  1. Clone the Repository:

    git clone https://github.com/iair/deployment_demo_model_training.git
    cd deployment_demo_model_training
  2. Install Dependencies: Use Poetry to install all required dependencies.

    poetry install
  3. Prepare the Environment:

    • Ensure the data/ folder contains your dataset.
    • The models/ folder will store trained models.

Usage

Run the training pipeline using the following command:

poetry run python src/main.py

Output

  • The trained model will be saved in the models/ folder with the current date appended to its name.

Details

Data

  • Input Data:

    • Place the dataset in the data/ folder.
    • The dataset should include a target column for supervised learning.
  • Preprocessing:

    • Missing values are handled and numerical features are scaled.
    • Specify columns for imputation in data_processor.py.

Training

  • Model:

    • The pipeline uses XGBoost for training.
    • Default parameters are defined in trainer.py but can be modified as needed.
  • Evaluation:

    • Model performance is evaluated on a test set.
    • Metrics like accuracy are printed to the console.
  • Saving:

    • Trained models are saved with a timestamp for version control.

Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/YourFeature).
  3. Commit your changes (git commit -m "Add your message").
  4. Push to the branch (git push origin feature/YourFeature).
  5. Open a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

About

This a demo made to deploy a simple machine learning model as example to teach student to use open source as a tool to do it

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages