MLBlocks

An Open Source Project from the Data to AI Lab, at MIT

Pipelines and Primitives for Machine Learning and Data Science.

MLBlocks

Documentation: https://mlbazaar.github.io/MLBlocks
Github: https://github.com/MLBazaar/MLBlocks
License: MIT
Development Status: Pre-Alpha

Overview

MLBlocks is a simple framework for composing end-to-end tunable Machine Learning Pipelines by seamlessly combining tools from any python library with a simple, common and uniform interface.

Features include:

Build Machine Learning Pipelines combining any Machine Learning Library in Python.
Access a repository with hundreds of primitives and pipelines ready to be used with little to no python code to write, carefully curated by Machine Learning and Domain experts.
Extract machine-readable information about which hyperparameters can be tuned and within which ranges, allowing automated integration with Hyperparameter Optimization tools like BTB.
Complex multi-branch pipelines and DAG configurations, with unlimited number of inputs and outputs per primitive.
Easy save and load Pipelines using JSON Annotations.

Install

Requirements

MLBlocks has been developed and tested on Python 3.8, 3.9, 3.10, 3.11, 3.12, 3.13

Install with `pip`

The easiest and recommended way to install MLBlocks is using pip:

pip install mlblocks

This will pull and install the latest stable release from PyPi.

If you want to install from source or contribute to the project please read the Contributing Guide.

MLPrimitives

In order to be usable, MLBlocks requires a compatible primitives library.

The official library, required in order to follow the following MLBlocks tutorial, is MLPrimitives, which you can install with this command:

pip install mlprimitives

Quickstart

Below there is a short example about how to use MLBlocks to solve the Adult Census Dataset classification problem using a pipeline which combines primitives from MLPrimitives, scikit-learn and xgboost.

import pandas as pd
from mlblocks import MLPipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

dataset = pd.read_csv('http://mlblocks.s3.amazonaws.com/census.csv')
label = dataset.pop('label')

X_train, X_test, y_train, y_test = train_test_split(dataset, label, stratify=label)

primitives = [
    'mlprimitives.custom.preprocessing.ClassEncoder',
    'mlprimitives.custom.feature_extraction.CategoricalEncoder',
    'sklearn.impute.SimpleImputer',
    'xgboost.XGBClassifier',
    'mlprimitives.custom.preprocessing.ClassDecoder'
]
pipeline = MLPipeline(primitives)

pipeline.fit(X_train, y_train)
predictions = pipeline.predict(X_test)

accuracy_score(y_test, predictions)

What's Next?

If you want to learn more about how to tune the pipeline hyperparameters, save and load the pipelines using JSON annotations or build complex multi-branched pipelines, please check our documentation site.

Also do not forget to have a look at the notebook tutorials!

Citing MLBlocks

If you use MLBlocks for your research, please consider citing our related papers.

For the current design of MLBlocks and its usage within the larger Machine Learning Bazaar project at the MIT Data To AI Lab, please see:

Micah J. Smith, Carles Sala, James Max Kanter, and Kalyan Veeramachaneni. "The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development." arXiv Preprint 1905.08942. 2019.

@article{smith2019mlbazaar,
  author = {Smith, Micah J. and Sala, Carles and Kanter, James Max and Veeramachaneni, Kalyan},
  title = {The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development},
  journal = {arXiv e-prints},
  year = {2019},
  eid = {arXiv:1905.08942},
  pages = {arXiv:1905.08942},
  archivePrefix = {arXiv},
  eprint = {1905.08942},
}

For the first MLBlocks version from 2015, designed for only multi table, multi entity temporal data, please refer to Bryan Collazo’s thesis:

Machine learning blocks. Bryan Collazo. Masters thesis, MIT EECS, 2015.

With recent availability of a multitude of libraries and tools, we decided it was time to integrate them and expand the library to address other data types: images, text, graph, time series and integrate with deep learning libraries.

Name		Name	Last commit message	Last commit date
Latest commit History 525 Commits
.github		.github
docs		docs
examples		examples
mlblocks		mlblocks
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.md		HISTORY.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
apt.txt		apt.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MLBlocks

Overview

Install

Requirements

Install with `pip`

MLPrimitives

Quickstart

What's Next?

Citing MLBlocks

About

Uh oh!

Releases 26

Packages

Uh oh!

Contributors 10

Uh oh!

Languages

License

MLBazaar/MLBlocks

Folders and files

Latest commit

History

Repository files navigation

MLBlocks

Overview

Install

Requirements

Install with pip

MLPrimitives

Quickstart

What's Next?

Citing MLBlocks

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 26

Packages 0

Uh oh!

Contributors 10

Uh oh!

Languages

Install with `pip`

Packages