Skip to content

Latest commit

 

History

History
153 lines (104 loc) · 7.65 KB

README.md

File metadata and controls

153 lines (104 loc) · 7.65 KB

Sync your ML data seamlessly with productivity tools you love


WebsiteInstallationDocsExamplesContributing

PyPI Status PyPI Status ReadTheDocs Slack license

Overview

What is MLSync?

MLSync is a Python library that acts as a bridge between your ML workflow and your project planning and management tools.

Why MLSync?

Developing ML projects is a lot of fun, but they are also hard to plan and manage. While the ML community has built several tools for developers to better track and visualize their ML workflow data, there is a disconnect between ML workflow data and the tools that are used for project management. MLSync is designed to bridge this gap.

How Does it Work?

There are four main aspects of MLSync:

  1. MLSync interfaces with modern ML experiment tracking tools such as MLflow and imports the raw data.
  2. Raw data from ML experiment tracking tools are converted to MLSync internal data format (user-defined) and stored in a database.
  3. MLSync engine processes this raw data and generates consolidated insights for your project.
  4. The insights are then converted to suitable formats and sent to your project planning and management tools such as Notion.

The figure above describes the architectural vision of MLSync. All the functionality is not yet available; please refer to the Roadmap for the current status. If you would like to contribute to MLSync, please refer to the Contributing section.

Installation

pip install mlsync

Example

In this example, we will sync your machine learning experiments to Notion in a few simple steps!

Configuration Setup

Let us first set up the run environment.

  1. To begin, checkout this repository: git clone https://github.com/paletteml/mlsync.git
  2. Change to the mlsync/examples/mlflow-notion directory: cd mlsync/examples/mlflow-notion/
  3. Rename the .env.example file in your path: mv .env.example .env. This file is intended to store your personal API keys.

Note that the directory contains YAML files for configurations (config.yaml) and report formatting (format.yaml). We will leave the configurations as is for now.

ML Training Setup

Now let us set up our ML Training environment. For this example, we will rely on the MLFlow framework and Pytorch as our ML framework. Since MLFlow supports all major ML frameworks, this example can be easily adapted to other frameworks.

  1. If not already installed, install PyTorch based on the guide here. (Only needed for the provided example).
  2. Install mlflow package using pip install mlflow. More about installation here.
  3. Run example training using python mlflow_pytorch.py --run-name <Run 1>. This will create a new MLFlow run.
  4. Launch MLFlow UI using mlflow ui &. Copy the mlflow uri (seen in the command line as [INFO] Listening at: <URL>). Let it run in the background.
  5. Update the uri field in the configuration file in your folder (config.yaml) under mlflow with the just copied mlflow uri.

Notion Setup

Let us now link Notion to MLSync. This is required only for the first time you run MLSync.

  1. Create a new integration to Notion.
    1. Visit notion.so/my-integrations
    2. Click the + New Integration button
    3. Let us name it as MLSync.
    4. Ensure Read, Update and Insert Content Capabilities are selected.
    5. Copy your "Internal Integration Token" from your Notion integration page into the .env file in your path.
      • NOTION_TOKEN=secret_0000000000000000000000000000000000000000000
  2. Create a new page in Notion. This will serve as the root page for your MLFlow runs.
    1. Click the Share button on the top right corner of the page.
    2. Click the Invite button and then choose MLSync integration.
    3. Back in the Share dialog, click the Copy link button.
    4. Paste the URL to the page_id field in the configuration file (config.yaml) under notion.

Syncing

You are now all set! Now let us sync your MLFlow runs to Notion.

mlsync --config config.yaml --format format.yaml

That's it! You can now view your MLFlow runs in Notion. As long as mlsync is running, all your future experiments and runs should appear in the selected Notion page.

Troubleshooting

  1. If you are getting an error related to the NOTION_TOKEN not being found, you can pass the --notion-token flag to mlsync to specify the token.
  2. If you are seeing an error related to time out, ensure mlflow ui is running in the background.
  3. If you are having trouble with MNIST dataflow download, you can try to download the data manually from here.

Please raise an issue, or reach out if you have any other errors.

Advanced

  1. You can override the Notion page id, token, and other configurations by either modifying the config.yaml file or by passing the arguments to the mlsync command. Run mlsync --help to see the available arguments.
  2. Custom Report Formats: mlsync allows you to customize the report much further. You can customize the report by adding your own format.yaml file. Read documentation here to learn more.
  3. Custom Refresh Rates: You can control the refresh rate of the report by setting the refresh_rate field in the configuration file.
  4. Restarting mlsync: You can restart mlsync any time without losing earlier runs.

Enjoy! If you have any further questions, please contact us.

Roadmap

We want to support different training environments and different productivity tools.

  1. Productivity Tools
    1. Notion: Supported
    2. Trello: Planned
    3. Confluence: In progress
    4. Jira: Planned
  2. Monitoring Frameworks
    1. MLFlow: Supported
    2. TensorBoard: Planned
    3. ClearML: Planned
  3. Programmatic API
    1. Planned

Do you have other tools/frameworks you would like to see supported? Let us know!

Contributing

We welcome contributions from the community. Please feel free to open an issue or pull request. Or, if you are interested in working closely with us, please contact us directly. We will be happy to talk to you!