Skip to content

This repository contains a use case tailored to an MLOps framework using Kubeflow Pipelines hosted on GCP as the cloud provider and GitHub Actions as the CICD flow orchestrator.

License

Notifications You must be signed in to change notification settings

ferneutron/mlops

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MLOps

This repository aims to showcase a use case under an MLOps paradigm using GCP as the cloud provider and GitHub Actions as the CICD flow manager.

The detailed explanation of this repository can be found in the article I published in Towards Data Science called: Part 1: Let's Build an Operational MLOps Framework from Scratch

1. Getting started

This repository contains the resources to carry out what is described in the following flow

workflow

In order to adjust this repository to your use case, I recommend you follow the guide I described in the article Part 1: Let's Build an Operational MLOps Framework from Scratch or, if applicable, in the next section I explain which files you would have to modify to be able to use this content.

2. Usage

First, in GCP you will need to create a Workload Identity Provider which will enable the connection between GitHub and GCP.

You will also need to create a service account with the following roles:

"roles/artifactregistry.writer"
"roles/bigquery.readSessionUser"
"roles/cloudbuild.builds.builder"
"roles/cloudbuild.tokenAccessor"
"roles/cloudbuild.workerPoolUser"
"roles/logging.logWriter"
"roles/iam.serviceAccountUser"
"roles/aiplatform.user"
"roles/developerconnect.user"
"roles/storage.objectCreator"

Then, the service account you created will have to be associated with the Workload Identity Pool.

Finally, in the .github/workflows/cicd.yaml file, you will need to adjust the variables PROJECT_ID, WORKLOAD_IDENTITY_PROVIDER and SERVICE_ACCOUNT in the cd job for the step GCP Auth. This change must look like:

- name: 'GCP Auth'
  uses: 'google-github-actions/[email protected]'
  with:
    project_id: ${{ vars.PROJECT_ID }}
    workload_identity_provider: ${{ vars.WORKLOAD_IDENTITY_PROVIDER }}
    service_account: ${{ vars.SERVICE_ACCOUNT }}

- name: Register
  run: |
    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -F tags=latest \
    -F [email protected] \
    ${{ vars.PIPELINE_REPOSITORY }}

And that is pretty much it!

What is next?

This repo will be updated as I add other MLOps resources (continuous training, model monitoring, data validation, etc.). So as soon as I can, I will be updating this README and the repository code.

Happy coding 🤓!

About

This repository contains a use case tailored to an MLOps framework using Kubeflow Pipelines hosted on GCP as the cloud provider and GitHub Actions as the CICD flow orchestrator.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages