Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push MLflow experiment tracking to a reconciler to avoid race conditions #135

Merged
merged 20 commits into from
Feb 21, 2025

Conversation

ewilliams-cloudera
Copy link
Collaborator

@ewilliams-cloudera ewilliams-cloudera commented Feb 21, 2025

(tl;dr of this Slack message)

RAG Studio tracks session metrics (#124) and data source metrics (#125) in their own experiments. However, CML's MLflow tracking store relies on global environmental state, and makes it impossible for us to run multiple MLflow experiments in parallel in CDSW.

So to keep RAG Studio's ability to serve asynchronous requests, we're pushing MLflow experiment tracking operations to a reconciler that can process one experiment run at a time on the side—without blocking the server.

@mliu-cloudera mliu-cloudera changed the title Mob/mlflow refactor Push MLflow experiment tracking to a reconciler to avoid race conditions Feb 21, 2025
@ewilliams-cloudera ewilliams-cloudera marked this pull request as ready for review February 21, 2025 21:38
@ewilliams-cloudera ewilliams-cloudera merged commit e80b155 into main Feb 21, 2025
3 checks passed
@ewilliams-cloudera ewilliams-cloudera deleted the mob/mlflow-refactor branch February 21, 2025 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants