GitHub - chuvalniy/mlops-practices: Target classification with MLOps practices (CI/CD, Docker, Cloud services, etc...)

Overview

Machine Learning application of classifying people for bad habits based on medical indicators with extensive use of MLOps practices.

Installation

Prerequisites

The project is structured according to microservice architecture so make sure you have Docker installed on your computer.

Clone repository & install dependencies

git clone https://github.com/chuvalniy/mlops-practices.git
pip install -r requirements.txt

Create & update credentials

Create an .env file you project root directory and copy variables from .env-example file. By default, .env-example has settings to run project locally, so no need to update credentials.

The next step is to create credentials for S3 storage. Go to the to your user's directory (i.e. C:\Users\MyUser) and create a folder called .aws. In this directory create a file called credentials and put this into file.

[default]
aws_access_key_id=minioadmin
aws_secret_access_key=minioadmin
aws_bucket_name=arts

[admin]
aws_access_key_id=minioadmin
aws_secret_access_key=minioadmin

Caution: aws_bucket_name should have the same content as AWS_S3_BUCKET

After all these steps you should have the following directory path C:\Users\MyUser.aws\credentials.

These are default credentials in case if you're running this project locally and didn't make any changes in .env file.

Run Docker

Navigate to project root directory and run docker containers.

docker-compose up -d --build

Create S3 Bucket in Minio

To make mlflow be able to store model artifacts in S3 we need to make a bucket in S3 storage.

Navigate to Minio console, by default the link is http://localhost:9001/.

In the console you can see Buckets tab so open it. Click Create new bucket and call it arts. The name should be the same as your AWS_S3_BUCKET variable in the .env file.

Attention (Windows)

This step is only necessary if you intend to use your experiments to deploy an ML service in the future. The project should work without it, but the solution below may solve some of your problems.

If you want to serve mlflow models locally on you machine, you have to set MLFLOW_S3_ENDPOINT_URL additionally in your PowerShell so mlflow can connect to Minio S3.

$env:MLFLOW_S3_ENDPOINT_URL = "http://localhost:9000"
mlflow models serve

How to use

If you installed everything correctly, then this step will be simple.

Execute pipeline

Execute this in project's root directory.

dvc pull

Run machine learning training pipeline.

dvc repro

[Optional] Change model & tune hypeparameters.

You can choose your own hyperparameters or change the model (Random Forest by default) by modifying train.py file.

# Define parameters and model.
params = {
    "max_depth": 3,
    "n_estimators": 100,
    "random_state": RANDOM_STATE
}
model = RandomForestClassifier(**params)

Documenation

In general, all the code is covered with docstrings and comments about what each component does, but there are some points that cannot be particularly described. Below is a description of the architecture, tech stack used and the data source.

Architecture

If you want to check app architecture I suggest you to visit this link.

Stack

A more detailed description of each library that was used to create this application can be found here.

Data

Training data can be found on Kaggle. If you are interested in exploratory data analysis, you can find it at this link in two Jupyter Notebooks.

Testing

Almost every function is provided with unit test via pytest and Click libraries.

Execute the following command in your project directory to run the tests.

pytest -v

License

// add

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
.dvc		.dvc
.github/workflows		.github/workflows
Docker		Docker
__pycache__		__pycache__
data		data
docs		docs
models		models
notebooks		notebooks
reports		reports
src		src
tests		tests
.dvcignore		.dvcignore
.env-example		.env-example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Installation

Prerequisites

Clone repository & install dependencies

Create & update credentials

Run Docker

Create S3 Bucket in Minio

Attention (Windows)

How to use

Execute pipeline

[Optional] Change model & tune hypeparameters.

Documenation

Architecture

Stack

Data

Testing

License

About

Releases

Packages

Languages

chuvalniy/mlops-practices

Folders and files

Latest commit

History

Repository files navigation

Overview

Installation

Prerequisites

Clone repository & install dependencies

Create & update credentials

Run Docker

Create S3 Bucket in Minio

Attention (Windows)

How to use

Execute pipeline

[Optional] Change model & tune hypeparameters.

Documenation

Architecture

Stack

Data

Testing

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages