This project is based on MLE Basic Example by MarkKorvin. The requirements page mentioned it can be used as a template, so I used the main structure from that template even though I did not approve some parts (I will explain why in later sections).
Before going into how we can run training and inference, firstly we will clone this repository to our local environment. In this example, I am using a Windows environment, but I will use only universal commands the same as on Linux and macOS.
git clone https://github.com/mtech00/MLE_Epam.git
After cloning, you will have this directory and files:
- inference:
Dockerfile
,inference.csv
,inference.py
,requirements.txt
- training:
Dockerfile
,train.csv
,train.py
,requirements.txt
.gitignore
readme.md
data_loader.py
On the requirements for the work, it says you must have data ready for inference and training, but it also says you have a data loader script. After cloning this repo, you will not use this script unless you have a problem about datasets. If there is a problem, just run data_loader.py
; it will automatically get the iris dataset, split it into train and inference subsets, and dump the relevant folders to the training
and inference
directories.
Since this is a containerized project on Docker, you must have Docker Engine/Daemon running successfully. That is all you need; everything else will be handled by the Dockerfiles and Docker Engine.
- Train folder
- Build
train_image
- Run
train_container
- Copy model to
inference
folder - Build
inference_image
- Run
inference_container
- Copy
results.csv
file
Your results.csv
file will be ready to analyze afterward.
-
Go to the repo folder
cd MLE_Epam
-
Go to the training folder
cd training
-
Build the training Docker image
docker build -t training_image .
Building will take about one minute due to PyTorch, even though I used the CPU-only version. (If you are using an ARM-based CPU, use only the corresponding Torch build. Since the industry is mostly x86-based,##edit## I used the universal Torch model on the CPU index since this model is very small, and downloading a GPU-based version would be overkill. However, mentioning only the CPU version in the requirements text is not ideal due to platform dependency, such as x86 or ARM architectures. Instead of explicitly stating 'Torch' for the CPU index, it would be better to automatically determine the appropriate CPU architecture , but I decided for the universal Torch library for Apple Silicon Mac users.)
-
Run the training Docker container
docker run --name training_container training_image
In the Markov example, it uses the container ID with a specified name. However, this approach is unnecessary and overly complicated. Even if a name conflict occurs, it can be easily resolved by renaming .The container will automatically run and stop. If there is an error, the scripts will raise info about it. If it runs successfully, you will see info about every epoch, dataset metrics, and where the model (
model.pt
) is saved (inside the container). -
Copy the trained model to the inference folder
docker cp training_container:/app/model.pt ../inference/model.pt
MarkKorvin’s example used
docker cp
; I am not sure if this is the best option. I think we could use a shared volume or Docker networks without manual commands, but I followed the project requirements. This command lets us access files inside Docker containers and copy them to our local directory. It works with relative paths.
-
Go back to the parent directory
cd ..
-
Go to the inference folder
cd inference
-
Build the inference Docker image
docker build -t inference_image .
Again, this build will take about one minute. In this container, I think we can find more lightweight libraries for inference like ONNX, etc. I mentioned every specific version in the requirements.txt file due to reliability. Choosing known combinations is better. If we don’t specify a version, it will use the latest ones, but there might be compatibility issues with the newest versions, which we cannot predict.
-
Run the inference Docker container
docker run --name inference_container inference_image
This will handle errors such as there being no model, and it will give info about the inference dataset and where it dumps
results.csv
(inside the container). -
Copy the results file to local
docker cp inference_container:/app/results.csv ../inference/results.csv
Now you have your
results.csv
file in your local environment.
Without the headache of code dependencies, environment setups, or even Python installation on your local machine, you can train the neural network and perform inference purely using Docker. The final results.csv
file contains your inference results.