This directory contains an ML project based on the default Databricks MLOps Stack, defining a production-grade ML pipeline for automated retraining and batch inference of an ML model on tabular data.
See the Project overview for details on the ML pipeline and code structure in this repo.
The table below links to detailed docs explaining how to use this repo for different use cases.
If you're a data scientist just getting started with this repo for a brand new ML project, we recommend starting with the Project overview and ML quickstart.
When you're satisfied with initial ML experimentation (e.g. validated that a model with reasonable performance can be trained on your dataset) and ready to deploy production training/inference pipelines, ask your ops team to follow the MLOps setup guide to configure CI/CD and deploy production ML pipelines.
After that, follow the ML pull request guide and ML resource config guide to propose, test, and deploy changes to production ML code (e.g. update model parameters) or pipeline resources (e.g. use a larger instance type for model training) via pull request.
Role | Goal | Docs |
---|---|---|
First-time users of this repo | Understand the ML pipeline and code structure in this repo | Project overview |
Data Scientist | Get started writing ML code for a brand new project | ML quickstart. |
Data Scientist | Update production ML code (e.g. model training logic) for an existing project | ML pull request guide |
Data Scientist | Modify production model ML resources, e.g. model training or inference jobs | ML resource config guide |
MLOps / DevOps | Set up CI/CD for the current ML project | MLOps setup guide |
It's possible to use the repo as a monorepo that contains multiple projects. All projects share the same workspaces and service principals.
For example, assuming there's existing repo with root directory name monorepo_root_dir
and project name project1
- Create another project from cookiecutter with project name
project2
and root directory nameproject2
. - Copy the internal directory
project2/project2
to root directory of existing repomonorepo_root_dir/project2
. - Copy yaml files from
project2/.github/workflows/
tomonorepo_root_dir/.github/workflows/
and make sure there's no name conflicts.