From 3ca1026c19480b6a8d691865b3351d105b1afb84 Mon Sep 17 00:00:00 2001
From: Micah Smith
Date: Fri, 25 Jun 2021 13:41:06 -0400
Subject: [PATCH 1/2] Update readme
---
.editorconfig | 3 ++
README.md | 109 +++++++++++++++++++++++++++-----------------------
2 files changed, 61 insertions(+), 51 deletions(-)
diff --git a/.editorconfig b/.editorconfig
index f742251..7345ce1 100644
--- a/.editorconfig
+++ b/.editorconfig
@@ -10,6 +10,9 @@ insert_final_newline = true
charset = utf-8
end_of_line = lf
+[*.md]
+max_line_length = 99
+
[*.py]
max_line_length = 99
diff --git a/README.md b/README.md
index 3a65000..0dd8a08 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
-[![PyPi](https://img.shields.io/pypi/v/autobazaar.svg)](https://pypi.python.org/pypi/autobazaar)
+[![PyPi](https://img.shields.io/pypi/v/autobazaar.svg)](https://pypi.Python.org/pypi/autobazaar)
[![Tests](https://github.com/MLBazaar/AutoBazaar/workflows/Run%20Tests/badge.svg)](https://github.com/MLBazaar/AutoBazaar/actions?query=workflow%3A%22Run+Tests%22+branch%3Amaster)
[![Downloads](https://pepy.tech/badge/autobazaar)](https://pepy.tech/project/autobazaar)
@@ -14,15 +14,15 @@
* Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
* Documentation: https://MLBazaar.github.io/AutoBazaar/
* Homepage: https://github.com/MLBazaar/AutoBazaar
-* Paper: https://arxiv.org/pdf/1905.08942.pdf
+* Paper: [ml-bazaar-paper]
## Overview
-AutoBazaar is an AutoML system created using [The Machine Learning Bazaar](https://arxiv.org/abs/1905.08942),
-a research project and framework for building ML and AutoML systems by the Data To AI Lab at MIT.
+*AutoBazaar* is an AutoML system created using [The Machine Learning Bazaar](https://mlbazaar.github.io),
+a research project and framework for building ML and AutoML systems by the [Data To AI Lab](https://dai.lids.mit.edu) at MIT.
See [below](#citing-autobazaar) for more references.
-It comes in the form of a python library which can be used directly inside any other python
+It comes in the form of a Python library which can be used directly inside any other Python
project, as well as a CLI which allows searching for pipelines to solve a problem directly
from the command line.
@@ -30,18 +30,18 @@ from the command line.
## Requirements
-**AutoBazaar** has been developed and tested on [Python 3.6 and 3.7](https://www.python.org/downloads/)
+AutoBazaar has been developed and tested on [Python 3.6 and 3.7](https://www.Python.org/downloads/)
Also, although it is not strictly required, the usage of a
[virtualenv](https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid
-interfering with other software installed in the system where **AutoBazaar** is run.
+interfering with other software installed in the system where AutoBazaar is run.
## Install with pip
-The easiest and recommended way to install **AutoBazaar** is using
+The easiest and recommended way to install AutoBazaar is using
[pip](https://pip.pypa.io/en/stable/):
-```
+```bash
pip install autobazaar
```
@@ -69,13 +69,12 @@ demonstration purposes:
- [185_baseball](https://github.com/MLBazaar/AutoBazaar/tree/master/input/185_baseball): Single Table Regression
- [196_autoMpg](https://github.com/MLBazaar/AutoBazaar/tree/master/input/196_autoMpg): Single Table Classification
-
+Additionally, you can find a collection with ~450 datasets already in the D3M Schema in the [ML Bazaar Task Suite](https://mlbazaar.github.io/#datasets-and-tasks) (please request access [here](https://mlbazaar.github.io/#how-can-i-request-access-to-the-datasets)).
# Quickstart
In this short tutorial we will guide you through a series of steps that will help you getting
-started with **AutoBazaar** using its CLI command `abz`.
+started with AutoBazaar using its CLI command `abz`.
For more details about its usage and the available options, please execute `abz --help`
on your command line.
@@ -83,7 +82,7 @@ on your command line.
## 1. Prepare your Data
Make sure to have your data prepared in the [Data Format](#data-format) explained above inside
-and uncompressed folder in a filesystem directly accessible by **AutoBazaar**.
+and uncompressed folder in a filesystem directly accessible by AutoBazaar.
In order to check, whether your dataset is available and ready to use, you can execute
the `abz` command in your command line with its `list` subcommand.
@@ -94,8 +93,8 @@ the path to the folder that contains your dataset.
Assuming that the data is inside a folder called `input` within your current folder,
you can run:
-```
-$ abz list -i /path/to/your/datasets/folder
+```bash
+$ abz list -i path/to/your/datasets/folder
```
The output should be a table which includes the details of all the datasets found inside
@@ -111,8 +110,8 @@ dataset
60_jester single_table collaborative_filtering meanAbsoluteError 44M 880719
```
-**Note:** If you see an error saying that `No matching datasets found`, please review your
-dataset format and make sure to have indicated the right path.
+> :bulb: If you see an error saying that `No matching datasets found`, please review your
+> dataset format and make sure you have indicated the right path.
For the rest of this quickstart, we will be using the `185_baseball` dataset that you can
find inside the [input folder](https://github.com/MLBazaar/AutoBazaar/tree/master/input)
@@ -121,43 +120,45 @@ contained in this repository.
## 2. Start the search process
Once your data is ready, you can start the AutoBazaar search process using the `abz search`
-command.
-To do this, you will need to provide again the path to where your datasets are contained, as
+command. To do this, you will need to provide again the path to where your datasets are contained, as
well as the name of the datasets that you want to process.
-For example if you want to search for the best
+Without further configuration, the search process will evaluate only the default pipeline without performing additional tuning iteration on it.
-```
-$ abz search -i /path/to/your/datasets/folder name_of_your_dataset
+```bash
+abz search -i path/to/your/datasets/folder name_of_your_dataset
```
-This will evaluate the default pipeline without performing additional tuning iteration on it.
-
-In order to start an actual tuning process, you will need to provide at least one of the
+In order to start a real search process, you will need to provide at least one of the
following additional options:
-* `-b, --budget`: Maximum number of tuning iterations to perform.
-* `-t, --timeout`: Maximum time that the system needs to run, in seconds.
-* `-c, --checkpoints`: Comma separated string containing the different checkpoints where
- the best pipeline so far must be stored and evaluated against the test dataset. There must be
- no spaces between the checkpoint times. For example, to store the best pipeline every 10 minutes
- until 30 minutes have passed, you would use the option `-c 600,1200,1800`.
-
-For example, to search process the `185_baseball` dataset during 30 seconds evaluating the
-best pipeline so far every 10 seconds but with a maximum of 10 tuning iterations, we would
+* `-b, --budget`:
+ Maximum number of tuning iterations to perform.
+* `-c, --checkpoints`:
+ Comma separated string containing the different checkpoints, in seconds,
+ where the best pipeline so far must be stored and evaluated against the
+ test dataset. There must be no spaces between the checkpoint times. For
+ example, to store the best pipeline every 10 minutes until 30 minutes have
+ elapsed, you would use the option `-c 600,1200,1800`. If checkpoints are
+ provided, the system will terminate at the time of the final checkpoint.
+* `-t, --timeout`:
+ Maximum time for the system to run, in seconds. Ignored if checkpoints are
+ given.
+
+For example, to search over the `185_baseball` dataset for a 30 second period, evaluating the
+best pipeline so far every 10 seconds, but with a maximum of 10 tuning iterations, we would
use the following command:
```bash
abz search 185_baseball -c10,20,30 -b10
```
-For further details about the available options, please execute `abz search --help` in your
-terminal.
+For further details about the available options, run `abz search --help`.
## 3. Explore the results
-Once the **AutoBazaar** has finished searching for the best pipeline, a table will be printed
-in stdout with a summary of the best pipeline found for each dataset.
+Once AutoBazaar has finished searching for the best pipeline, a table will be printed
+to stdout with a summary of the best pipeline found for each dataset.
If multiple checkpoints were provided, details about the best pipeline in each checkpoint
will also be included.
@@ -180,22 +181,28 @@ abz search 185_baseball -c10,20,30 -b10 -r results.csv
## What's next?
-For more details about **AutoBazaar** and all its possibilities and features, please check the
+For more details about AutoBazaar and all its possibilities and features, please check the
[project documentation site](https://MLBazaar.github.io/AutoBazaar/)!
## Citing AutoBazaar
-If you use AutoBazaar for your research, please consider citing the following paper (https://arxiv.org/pdf/1905.08942.pdf):
-
-```
-@article{smith2019mlbazaar,
- author = {Smith, Micah J. and Sala, Carles and Kanter, James Max and Veeramachaneni, Kalyan},
- title = {The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development},
- journal = {arXiv e-prints},
- year = {2019},
- eid = {arXiv:1905.08942},
- pages = {arxiv:1904.09535},
- archivePrefix = {arXiv},
- eprint = {1905.08942},
+If you use AutoBazaar for your research, please consider citing
+[our paper about ML Bazaar][ml-bazaar-paper]
+
+```bibtex
+@inproceedings{smith2020machine,
+ author = "Smith, Micah J. and Sala, Carles and Kanter, James Max and Veeramachaneni, Kalyan",
+ title = "The {{Machine Learning Bazaar}}: {{Harnessing}} the {{ML Ecosystem}} for {{Effective System Development}}",
+ booktitle = "Proceedings of the 2020 {{ACM SIGMOD International Conference}} on {{Management}} of {{Data}}",
+ year = "2020",
+ pages = "785--800",
+ publisher = "{Association for Computing Machinery}",
+ address = "{Portland, OR, USA}",
+ doi = "10.1145/3318464.3386146",
+ isbn = "978-1-4503-6735-6",
+ language = "en",
+ series = "{{SIGMOD}} '20"
}
```
+
+[ml-bazaar-paper]: https://doi.org/10.1145/3318464.3386146
From 9b4ea2bf0275a9c504169273d59081c0d0558868 Mon Sep 17 00:00:00 2001
From: Micah Smith
Date: Fri, 25 Jun 2021 13:55:19 -0400
Subject: [PATCH 2/2] More updates to README
---
README.md | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/README.md b/README.md
index 0dd8a08..45e0f95 100644
--- a/README.md
+++ b/README.md
@@ -4,7 +4,7 @@
[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
-[![PyPi](https://img.shields.io/pypi/v/autobazaar.svg)](https://pypi.Python.org/pypi/autobazaar)
+[![PyPi](https://img.shields.io/pypi/v/autobazaar.svg)](https://pypi.python.org/pypi/autobazaar)
[![Tests](https://github.com/MLBazaar/AutoBazaar/workflows/Run%20Tests/badge.svg)](https://github.com/MLBazaar/AutoBazaar/actions?query=workflow%3A%22Run+Tests%22+branch%3Amaster)
[![Downloads](https://pepy.tech/badge/autobazaar)](https://pepy.tech/project/autobazaar)
@@ -14,7 +14,7 @@
* Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
* Documentation: https://MLBazaar.github.io/AutoBazaar/
* Homepage: https://github.com/MLBazaar/AutoBazaar
-* Paper: [ml-bazaar-paper]
+* Paper: [here][ml-bazaar-paper]
## Overview
@@ -30,7 +30,7 @@ from the command line.
## Requirements
-AutoBazaar has been developed and tested on [Python 3.6 and 3.7](https://www.Python.org/downloads/)
+AutoBazaar has been developed and tested on [Python 3.6 and 3.7](https://www.python.org/downloads/)
Also, although it is not strictly required, the usage of a
[virtualenv](https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid
@@ -85,9 +85,9 @@ Make sure to have your data prepared in the [Data Format](#data-format) explaine
and uncompressed folder in a filesystem directly accessible by AutoBazaar.
In order to check, whether your dataset is available and ready to use, you can execute
-the `abz` command in your command line with its `list` subcommand.
+the `abz list` subcommand.
If your dataset is in a different place than inside a folder called `data` within your
-current working directory, do not forget to add the `-i` argument to your command indicating
+current working directory, add the `-i` argument to your command indicating
the path to the folder that contains your dataset.
Assuming that the data is inside a folder called `input` within your current folder,
@@ -123,7 +123,7 @@ Once your data is ready, you can start the AutoBazaar search process using the `
command. To do this, you will need to provide again the path to where your datasets are contained, as
well as the name of the datasets that you want to process.
-Without further configuration, the search process will evaluate only the default pipeline without performing additional tuning iteration on it.
+Without further configuration, the search process will evaluate only the default pipeline without performing additional tuning iterations on it.
```bash
abz search -i path/to/your/datasets/folder name_of_your_dataset
@@ -187,7 +187,7 @@ For more details about AutoBazaar and all its possibilities and features, please
## Citing AutoBazaar
If you use AutoBazaar for your research, please consider citing
-[our paper about ML Bazaar][ml-bazaar-paper]
+[our paper about ML Bazaar][ml-bazaar-paper]:
```bibtex
@inproceedings{smith2020machine,