Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add piece on MLflow Go #149

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions website/blog/2025-02-28-mlflow-go.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
title: MLflow Go
tags: [mlflow]
slug: mlflow-go
authors: [florian-verdonck]
thumbnail: /img/blog/mlflow-go.jpeg
---

The [MLflow project](https://mlflow.org) is a cornerstone of the machine learning community. It is highly flexible and makes machine learning workflows more reproducible, manageable and collaborative.

A key part of MLflow is experiment tracking, the system for logging and querying experiments. Developers can log parameters, metrics, inputs, and artifacts to keep track of different experiment runs, store them into a database, and then compare results.

This experiment tracking in MLflow is written in Python and has performance limitations when dealing with huge volumes of data. At G-Research, 10TB amount of data in 2 months amount of time is not uncommon. In these cases, our researchers can become frustrated with just how long it takes the system to execute operations.

To address this situation, we took apart the component parts of MLflow, and soon saw where the system could be improved – its backend. Writing to the Python-scripted database just wasn’t going to cut it for the sheer number of operations we had to perform. We started to imagine swapping out the experiment tracking with another backend. This train of thought evolved into the project called [FastTrackML](https://fasttrackml.io/).

## Fasttrack ML

Fasttrack ML was a [huge success for G-Research](https://www.gresearch.com/news/fasttrackml-the-fastest-ml-experiment-tracker-yet/), as its implementation in Go was significantly faster and able to deal with the high volumes of data. Go was chosen for this project because its concurrency model is based on lightweight goroutines and channels, which make it extremely efficient at handling multiple simultaneous tasks (e.g., logging experiment data, handling user requests). Go is a compiled language, and produces native machine code, resulting in faster execution compared to Python, which is interpreted. Go is statically typed which leads to a more memory efficient usage compared to Python's dynamic typing.

The success of FastTrackML, to a certain degree, unlocked the usage of MLflow within G-Research. It allowed quant researchers to process their datasets and experiment tracking in reasonable timeframes.

## Back to MLflow

Then the [Open Source team at G-Research](https://www.gresearch.com/teams/open-source-software/) looked at incorporating the lessons learned from FastTrackML into MLflow itself - the mlflow-go project is a first step in that direction. It is a Python package that is meant to be a drop-in replacement for MLflow. Once the mlflow-go package is proven technology, we hope it could be absorbed by MLflow.

Getting started with `mlflow-go` is as simple as installing MLflow, `pip install mlflow-go-backend` and then swapping `mlflow` for `mlflow-go` when running it from the command line. This will spin up a Go webserver with the exact API as the [mlflow REST API](https://mlflow.org/docs/latest/rest-api.html).

At the time of this writing, users are required to pass a database connection string as '--backend-store-uri' argument, because the Go implementation is currently database-only. As the intention is to use mlflow-go for its performance, it makes sense to use an actual production worthy database and not the filesystem.

## Closing thoughts

Although the backend we’ve built in MLflow-go is more performant for certain big data applications, it doesn’t necessarily mean it will replace the existing Python implementation. As the MLflow maintainers within Databricks are not specifically Golang developers, MLflow-go has to develop its own user base and established community of active contributors before it can become widely adopted.

We found that G-Research really liked MLflow but needed it to be more performant, notably for large datasets. We anticipate that other organizations, especially those with the massive data-processing needs, have run into similar issues. We are keen to learn if our efforts have benefited others, so please let us know how you’re using MLflow and if FastTrackML is faster for you on [GitHub](https://github.com/mlflow/mlflow-go-backend) or [Slack](https://mlflow.org/community/#slack).
6 changes: 6 additions & 0 deletions website/blog/authors.yml
Original file line number Diff line number Diff line change
Expand Up @@ -121,3 +121,9 @@ daniel-lok:
title: Software Engineer at Databricks
url: https://www.linkedin.com/in/daniel-yk-lok/
image_url: /img/authors/daniel_lok.png

florian-verdonck:
name: Florian Verdonck
title: Open Source Developer at G-Research
url: https://www.linkedin.com/in/florian-verdonck/
image_url: /img/authors/florian_verdonck.jpeg
Binary file added website/static/img/authors/florian_verdonck.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added website/static/img/blog/mlflow-go.jpeg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.