Push MLflow experiment tracking to a reconciler to avoid race conditions #135

ewilliams-cloudera · 2025-02-21T18:37:20Z

RAG Studio tracks session metrics (#124) and data source metrics (#125) in their own experiments. However, CML's MLflow tracking store relies on global environmental state, and makes it impossible for us to run multiple MLflow experiments in parallel in CDSW.

So to keep RAG Studio's ability to serve asynchronous requests, we're pushing MLflow experiment tracking operations to a reconciler that can process one experiment run at a time on the side—without blocking the server.

ewilliams-cloudera and others added 15 commits February 19, 2025 16:44

wip on moving chat mlflow

a45301a

centralize datasources metrics to mlflow

0e92048

Resolve circular import

3bf99b8

fix tests

4dd6643

more cleanup of mlflow

2638691

wip on creating a reconciler

21ec428

getting close on the reconciler

10b325a

small changes

0551535

remove unnecessary check

41572b5

fix mypy

20a7fd8

add back tracing

7ac2043

Load JSON into MlflowRunData

9a6ed39

Resolve circular imports

c18ede5

Don't pass Request classes into services

7abe527

Create data directory before running reconciler

6972ced

mliu-cloudera changed the title ~~Mob/mlflow refactor~~ Push MLflow experiment tracking to a reconciler to avoid race conditions Feb 21, 2025

ewilliams-cloudera added 4 commits February 21, 2025 12:29

remove tracing and fix max score type

02ae01b

remove unused

b3a60ba

fix pytests

b420df5

remove unused

2898dc4

mliu-cloudera approved these changes Feb 21, 2025

View reviewed changes

Create the reconciler data dir right before starting app

54f87a9

ewilliams-cloudera marked this pull request as ready for review February 21, 2025 21:38

ewilliams-cloudera merged commit e80b155 into main Feb 21, 2025
3 checks passed

ewilliams-cloudera deleted the mob/mlflow-refactor branch February 21, 2025 21:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Push MLflow experiment tracking to a reconciler to avoid race conditions #135

Push MLflow experiment tracking to a reconciler to avoid race conditions #135

ewilliams-cloudera commented Feb 21, 2025 •

edited by mliu-cloudera

Loading

Push MLflow experiment tracking to a reconciler to avoid race conditions #135

Push MLflow experiment tracking to a reconciler to avoid race conditions #135

Conversation

ewilliams-cloudera commented Feb 21, 2025 • edited by mliu-cloudera Loading

ewilliams-cloudera commented Feb 21, 2025 •

edited by mliu-cloudera

Loading