This repo will demonstrate how to take the first step towards MLOps by setting up and deploying a simple ML CI/CD pipeline using Google Clouds AI Platform, Kubeflow and Docker.
- Johan Hammarstedt, jhammarstedt
- Matej Sestak, Sestys
The following topics will be covered:
- Building each task as a docker container and running them with cloud build
- Preprocessing step: Loading data from GC bucket, editing it and storing a new file
- Training: Creating a pytorch model and build a custom prediction routine (GCP mainly supporst tensorflow, but you can add custom models)
- Deployment: Deploying your custom model to the AI Platform with version control
- Creating a Kubeflow pipeline and connecting the above tasks
- Perform CI by building Github Triggers in Cloud Build that will rebuild container upon a push to repository
- CD by using Cloud Functions to trigger upon uploading new data to your bucket
There's a short video demo of the project available here.
Note that it was created for a DevOps course at KTH with a 3 minute limit and is therefore very breif and compressed to fit these requirements.
Here we will go through the process of running the pipeline step by step: (Note at the moment there are some hard coded project names/repos etc that you might want to change, this will be updated here eventually)
-
Create a gcp project, open the shell (make sure you're in the project), and clone the repository
$ git clone https://github.com/jhammarstedt/gcloud_MLOPS_demo.git
-
Create a kubeflow pipeline
-
Run the
$ ./scripts/set_auth.sh
script in google cloud shell (might wanna change the SA_NAME), this gives us the roles we need to run the pipeline -
Create a project bucket and a data bucket (used for CD later), here we named just it {PROJECT_NAME}_bucket and {PROJECT_NAME}-data-bucket
- In the general project bucket add following subfolders: models, packages,data
-
Locally, create a package from the
models
directory in the containers/train folder by running:$ python containers/train/models/setup.py sdist
, this creates a package with pytorch and the model structure, just drag and drop it to the package subfolder. -
Create a docker container for each step (each of the folders in the containers repo representes a different step) * Do this by running:
$ gcloud_MLOPS_demo/containers ./build_containers.sh
from the cloud shell.This will run "build_single_container.sh in each directory"
-
If you wish to try and just build one container, enter the directory which you want to build and run:
$ bash ../build_single_container.sh {directory name}
-
-
Each subfolder (which will be a container) includes:
-
A cloudbuild.yaml file (created in build_single_repo.sh) which will let Cloud Build create a docker container by running the included Dockerfile.
-
The DockerFile that mainly runs the task script (e.g deploy.sh)
-
A task script that tells the Docker container what to do (e.g preproc/train/deploy the trained model to the AI-platform)
-
-
To test the container manually run
$ docker run -t gcr.io/{YOUR_PROJECT}/{IMAGE}:latest --project {YOUR_PROJECT} --bucket {YOUR_BUCKET} local
e.g to run the container that deploys the model to AI platform I would run:
$ docker run -t gcr.io/ml-pipeline-309409/ml-demo-deploy-toai
-
Create a pipeline in python using the kubeflow API (currently a notebook in AI platform)
-
Now we can either run the pipeline manually at the pipeline dashbord from 1. or run it as a script.
To set up CI and rebuild at every push:
- Connect gcloud to github, either in the Trigger UI or run:
$ ./scripts setup_trigger.sh
- Push the newly created cloudbuilds from GCP into the origin otherwise the trigger won't find them
- This trigger will run everytime a push to master happens in any of the containers and thus rebuild the affected Docker Image
CD can be necessary when we want to retrain/finetune the model give that we get new data, not every time we update a component. So we will have a Cloud function that will trigger a training pipeline when we upload new data to the Cloud Storage.
-
Get the pipeline host url from pipeline settings (looks like this:, ideally save it as an PIPELINE_HOST environment variable).
-
in pipeline folder, run the deploy script
$ ./deploy_cloudfunction $PIPELINE_HOST
-
Now, whenever a new file is added or deleted from the project bucket, it will rerun the pipeline.