Skip to content

Design Document

Amrit Krishnan edited this page Jun 13, 2023 · 1 revision

Introduction

Welcome to the cyclops design document! In this document, we describe the design of the components and APIs of cyclops. This is a live document, and will be updated as the design and interfaces change.

Packages

Query

Data

Requirements

  • Filter: Slice across attributes, basically get all rows based on filtering across one or multiple columns.
  • Support for different domains of data, including text, image/video and tabular data.
  • Batching, especially how well it integrates with filtering. (environmental study).
  • Splitting

Potential Features

  • Support for different domains of data, including text, image/video and tabular data.
  • Scalable i.e. can handle large datasets.
  • Spliting.
  • Slicing.
  • Batching. For instance, a PyTorch DataLoader object should be able to make use of the Dataset object for batching.
  • Streaming.
  • Caching pre-processing operations.
  • Inverting some pre-processing operations.

Ideas

  • MonAI for images + metadata
  • Move process API functionality into pre processing subpackage inside cyclops.dataset. So have transforms appropriate for each modality.
  • Another idea is to have cyclops.data package with the transforms, and main class that handles the different data modalities, and also have cyclops.dataset which has specific processing functions for datasets like THP, MIMIC, etc.

Evaluate

Overview

As part of the CyclOps framework, the evaluation package will provide data and model analysis tools that will allow users (machine learning engineers and data scientists) to evaluate their machine learning models on different data slices rigorously. The goal of the package is to provide an evaluation tool for health use cases, which has similar functionality as the TensorFlow model analysis library and has the following features:

  • A suite of common evaluation metrics for machine learning models, including drift detection metrics and fairness/bias indicators (and the ability to define custom metrics).
  • The functionality for computing (and comparing) metrics on different slices of data (including the capacity for full-pass computation on large datasets and computing confidence intervals).
  • Tools for comparing one or more candidate models against one or more baseline models.
  • Tools for visualizing metrics for a more in-depth analysis of model performance.

Workflow

The user defines the following:

  • Data source (or data loading function).
  • [Optional] Data processing function to prepare the data for evaluation (e.g. normalizing numerical features).
  • [Optional] (PyTorch or scikit-learn) trained model configuration.
  • Evaluation metrics to compute.
  • [Optional] Features or values for slicing the data.
  • Configurations for visualizing the results.

The evaluation workflow is as follows:

  1. Ingest the data from the data source.
  2. [Optional] Apply the data processing function to prepare the data for inference (feature handling/extraction). Keep the raw data.
  3. [Optional] Run the data through the model to generate predictions.
  4. Compute the evaluation metrics on data slices and cache the results.
  5. Generate visualizations of the model performance.

Components

Metrics

This package includes functions and classes for computing metrics given arrays of targets and predictions.

Specifications
  • The style for this package will mirror the torchmetrics API by extending functions in the sklearn.metrics module such that (1) metrics can be computed with predictions in the form of logits and probabilities, including the ability to threshold the scores and select the top k scores (2) metrics can be accumulated over several batches.
  • Following the torchmetrics style, this package will include a ‘functional’ subpackage where functions from sklearn.metrics will be wrapped together with logic that supports (1). Each of the functions will be wrapped in a class, inheriting from a base class which provides a template for accumulating metric states over batches.
  • The metrics will only accept numerical targets and predictions as input. This means that the data must be transformed to numerical format prior to using the metrics.
  • Multidimensional inputs (more than 2 dimensions) are currently not supported.
Supported metrics
  • Classification
  • Precision
  • Recall
  • Specificity
  • Fbeta (and F1) score
  • Accuracy
  • ROC
  • AUROC
  • AUPRC
  • Confusion matrix
  • Matthew's Correlation Coefficient
  • Threat Score
Issues
  • Support for distributed evaluation.
  • Data handling
  • Currently, data is mainly handled through the process API. Data can be ingested either using the query API or the cyclops.utils.file module, which includes functions for loading and saving CSV, parquet, pickle and npy files.
  • The process API provides data containers and functions that can be used to prepare the data for modeling and inference.
  • Non-tabular data (predominantly image data) is currently handled via a torch.utils.data.Dataset object.
  • Supporting large datasets.
  • Standardizing columns for computing metrics.
  • Handling both continuous and retrospective data.
  • Handling non-tabular datasets (e.g. medical images).

Evaluator

The evaluator class brings together the data and model(s) for evaluation purposes.

Specifications
  • Input: Dataset, model(s), metric(s), slice configuration
  • Output: Metric values

models package

Overview

The models package will provide a familiar scikit-learn estimator API for PyTorch and scikit-learn models. This will be achieved by using the decorator pattern and wrapper classes to dynamically extend the functionality of the models, providing enough flexibility to support for other ML/DL frameworks like TensorFlow in the future (by the CyclOps team or user). Overall, the models package will have the following features:

  • Reference implementations of published and established ML/DL network architectures as well as custom low-level building blocks that are used for creating such networks.
  • Wrapper classes that provide a uniform API for scikit-learn and PyTorch models (e.g. fit and predict functions).

Components

Wrappers

Specifications
  • All wrappers will support the following methods:

    • partial_fit: Fit the model on the given data incrementally.
    • fit: Fit the model on the given data.
    • predict: Predict the output of the model for the given input.
    • predict_proba: Return the output probabilities of the model output for the given input.
    • find_best: Find the best model from hyperparameter search.
    • save_model: Save model to file.
    • load_model: Load a saved model.
    • get_params: Get parameters for the wrapper.
    • set_params: Set the parameters of the wrapper.

report package

Overview

The core offerings of the CyclOps framework are rigorous evaluation and monitoring of machine learning models/datasets. Reports are an essential component of these core offerings as they provide an easy-to-understand, visual overview of the model performance across various axes (e.g. time, sub-population, different metrics) as well as dataset shift. In other words, a report provides machine learning engineers, data scientists and other decision makers with information to decide if a model in production is operating within desired margins.

Functional Requirements

  • Build on the model card framework as they are more generally accepted (e.g. UNICEF).

  • See Hugging Face landscape analysis for an overview of documentation frameworks in ML. IBM’s Factsheets is notable.

  • Support the inclusion of multiple graphs/plots, tables and rich media (e.g. image)

  • Support the comparison of two or more models on various factors, including versions (e.g. same model architecture trained with different hyperparameters)

  • Time (e.g. snapshots of model performance over time).

  • Support the merging of two full reports or sections of reports.

  • Example 1: A user creates a report with only evaluation results, dataset drift test results are added at a later point.

  • Example 2: Reports generated at different points in time are merged to create a new report that shows trends in model performance.

Non-Functional Requirements

  • Easy to understand for different stakeholders (technical and non-technical). Flexible.
  • Easy for software/ML developers to customize (even if cosmetic) for their use-case.
  • Easy to adapt as the standards for reporting in ML evolves (e.g. moving towards system cards).
  • Able to handle many plots/tables in the report.
  • Portable output (small file size) even with many plots/tables/rich media (METRIC: can it be sent as email attachment?).
  • Two types of data in reports: Experimental (Fairness Indicators, Drift Experiments) and Testing (Pass/Fail criterion with thresholds.)

Some Challenges

  • Some fields can be filled in automatically while others require human input.
  • Access management to different sections (e.g. proprietary model architecture accessible internally but not accessible publicly, while results are publicly available)? -> One solution is to create different templates for different viewers/stakeholders.

Environmental Scan

  • Model Card Generation Tools

Other notes

Tech stack

  • Schema: Protobuf/JSON
  • Template Engine: Jinja
  • VerifyML
  • Tech stack: Same as the Model Card Toolkit for the backend. Includes frontend.

Template Engine

Integration with CyclOps

  • Query API Info on data source(s).

  • Process API Info on transforms.