From 3ca1026c19480b6a8d691865b3351d105b1afb84 Mon Sep 17 00:00:00 2001 From: Micah Smith Date: Fri, 25 Jun 2021 13:41:06 -0400 Subject: [PATCH 1/2] Update readme --- .editorconfig | 3 ++ README.md | 109 +++++++++++++++++++++++++++----------------------- 2 files changed, 61 insertions(+), 51 deletions(-) diff --git a/.editorconfig b/.editorconfig index f742251..7345ce1 100644 --- a/.editorconfig +++ b/.editorconfig @@ -10,6 +10,9 @@ insert_final_newline = true charset = utf-8 end_of_line = lf +[*.md] +max_line_length = 99 + [*.py] max_line_length = 99 diff --git a/README.md b/README.md index 3a65000..0dd8a08 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@

[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha) -[![PyPi](https://img.shields.io/pypi/v/autobazaar.svg)](https://pypi.python.org/pypi/autobazaar) +[![PyPi](https://img.shields.io/pypi/v/autobazaar.svg)](https://pypi.Python.org/pypi/autobazaar) [![Tests](https://github.com/MLBazaar/AutoBazaar/workflows/Run%20Tests/badge.svg)](https://github.com/MLBazaar/AutoBazaar/actions?query=workflow%3A%22Run+Tests%22+branch%3Amaster) [![Downloads](https://pepy.tech/badge/autobazaar)](https://pepy.tech/project/autobazaar) @@ -14,15 +14,15 @@ * Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha) * Documentation: https://MLBazaar.github.io/AutoBazaar/ * Homepage: https://github.com/MLBazaar/AutoBazaar -* Paper: https://arxiv.org/pdf/1905.08942.pdf +* Paper: [ml-bazaar-paper] ## Overview -AutoBazaar is an AutoML system created using [The Machine Learning Bazaar](https://arxiv.org/abs/1905.08942), -a research project and framework for building ML and AutoML systems by the Data To AI Lab at MIT. +*AutoBazaar* is an AutoML system created using [The Machine Learning Bazaar](https://mlbazaar.github.io), +a research project and framework for building ML and AutoML systems by the [Data To AI Lab](https://dai.lids.mit.edu) at MIT. See [below](#citing-autobazaar) for more references. -It comes in the form of a python library which can be used directly inside any other python +It comes in the form of a Python library which can be used directly inside any other Python project, as well as a CLI which allows searching for pipelines to solve a problem directly from the command line. @@ -30,18 +30,18 @@ from the command line. ## Requirements -**AutoBazaar** has been developed and tested on [Python 3.6 and 3.7](https://www.python.org/downloads/) +AutoBazaar has been developed and tested on [Python 3.6 and 3.7](https://www.Python.org/downloads/) Also, although it is not strictly required, the usage of a [virtualenv](https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid -interfering with other software installed in the system where **AutoBazaar** is run. +interfering with other software installed in the system where AutoBazaar is run. ## Install with pip -The easiest and recommended way to install **AutoBazaar** is using +The easiest and recommended way to install AutoBazaar is using [pip](https://pip.pypa.io/en/stable/): -``` +```bash pip install autobazaar ``` @@ -69,13 +69,12 @@ demonstration purposes: - [185_baseball](https://github.com/MLBazaar/AutoBazaar/tree/master/input/185_baseball): Single Table Regression - [196_autoMpg](https://github.com/MLBazaar/AutoBazaar/tree/master/input/196_autoMpg): Single Table Classification - +Additionally, you can find a collection with ~450 datasets already in the D3M Schema in the [ML Bazaar Task Suite](https://mlbazaar.github.io/#datasets-and-tasks) (please request access [here](https://mlbazaar.github.io/#how-can-i-request-access-to-the-datasets)). # Quickstart In this short tutorial we will guide you through a series of steps that will help you getting -started with **AutoBazaar** using its CLI command `abz`. +started with AutoBazaar using its CLI command `abz`. For more details about its usage and the available options, please execute `abz --help` on your command line. @@ -83,7 +82,7 @@ on your command line. ## 1. Prepare your Data Make sure to have your data prepared in the [Data Format](#data-format) explained above inside -and uncompressed folder in a filesystem directly accessible by **AutoBazaar**. +and uncompressed folder in a filesystem directly accessible by AutoBazaar. In order to check, whether your dataset is available and ready to use, you can execute the `abz` command in your command line with its `list` subcommand. @@ -94,8 +93,8 @@ the path to the folder that contains your dataset. Assuming that the data is inside a folder called `input` within your current folder, you can run: -``` -$ abz list -i /path/to/your/datasets/folder +```bash +$ abz list -i path/to/your/datasets/folder ``` The output should be a table which includes the details of all the datasets found inside @@ -111,8 +110,8 @@ dataset 60_jester single_table collaborative_filtering meanAbsoluteError 44M 880719 ``` -**Note:** If you see an error saying that `No matching datasets found`, please review your -dataset format and make sure to have indicated the right path. +> :bulb: If you see an error saying that `No matching datasets found`, please review your +> dataset format and make sure you have indicated the right path. For the rest of this quickstart, we will be using the `185_baseball` dataset that you can find inside the [input folder](https://github.com/MLBazaar/AutoBazaar/tree/master/input) @@ -121,43 +120,45 @@ contained in this repository. ## 2. Start the search process Once your data is ready, you can start the AutoBazaar search process using the `abz search` -command. -To do this, you will need to provide again the path to where your datasets are contained, as +command. To do this, you will need to provide again the path to where your datasets are contained, as well as the name of the datasets that you want to process. -For example if you want to search for the best +Without further configuration, the search process will evaluate only the default pipeline without performing additional tuning iteration on it. -``` -$ abz search -i /path/to/your/datasets/folder name_of_your_dataset +```bash +abz search -i path/to/your/datasets/folder name_of_your_dataset ``` -This will evaluate the default pipeline without performing additional tuning iteration on it. - -In order to start an actual tuning process, you will need to provide at least one of the +In order to start a real search process, you will need to provide at least one of the following additional options: -* `-b, --budget`: Maximum number of tuning iterations to perform. -* `-t, --timeout`: Maximum time that the system needs to run, in seconds. -* `-c, --checkpoints`: Comma separated string containing the different checkpoints where - the best pipeline so far must be stored and evaluated against the test dataset. There must be - no spaces between the checkpoint times. For example, to store the best pipeline every 10 minutes - until 30 minutes have passed, you would use the option `-c 600,1200,1800`. - -For example, to search process the `185_baseball` dataset during 30 seconds evaluating the -best pipeline so far every 10 seconds but with a maximum of 10 tuning iterations, we would +* `-b, --budget`: + Maximum number of tuning iterations to perform. +* `-c, --checkpoints`: + Comma separated string containing the different checkpoints, in seconds, + where the best pipeline so far must be stored and evaluated against the + test dataset. There must be no spaces between the checkpoint times. For + example, to store the best pipeline every 10 minutes until 30 minutes have + elapsed, you would use the option `-c 600,1200,1800`. If checkpoints are + provided, the system will terminate at the time of the final checkpoint. +* `-t, --timeout`: + Maximum time for the system to run, in seconds. Ignored if checkpoints are + given. + +For example, to search over the `185_baseball` dataset for a 30 second period, evaluating the +best pipeline so far every 10 seconds, but with a maximum of 10 tuning iterations, we would use the following command: ```bash abz search 185_baseball -c10,20,30 -b10 ``` -For further details about the available options, please execute `abz search --help` in your -terminal. +For further details about the available options, run `abz search --help`. ## 3. Explore the results -Once the **AutoBazaar** has finished searching for the best pipeline, a table will be printed -in stdout with a summary of the best pipeline found for each dataset. +Once AutoBazaar has finished searching for the best pipeline, a table will be printed +to stdout with a summary of the best pipeline found for each dataset. If multiple checkpoints were provided, details about the best pipeline in each checkpoint will also be included. @@ -180,22 +181,28 @@ abz search 185_baseball -c10,20,30 -b10 -r results.csv ## What's next? -For more details about **AutoBazaar** and all its possibilities and features, please check the +For more details about AutoBazaar and all its possibilities and features, please check the [project documentation site](https://MLBazaar.github.io/AutoBazaar/)! ## Citing AutoBazaar -If you use AutoBazaar for your research, please consider citing the following paper (https://arxiv.org/pdf/1905.08942.pdf): - -``` -@article{smith2019mlbazaar, - author = {Smith, Micah J. and Sala, Carles and Kanter, James Max and Veeramachaneni, Kalyan}, - title = {The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development}, - journal = {arXiv e-prints}, - year = {2019}, - eid = {arXiv:1905.08942}, - pages = {arxiv:1904.09535}, - archivePrefix = {arXiv}, - eprint = {1905.08942}, +If you use AutoBazaar for your research, please consider citing +[our paper about ML Bazaar][ml-bazaar-paper] + +```bibtex +@inproceedings{smith2020machine, + author = "Smith, Micah J. and Sala, Carles and Kanter, James Max and Veeramachaneni, Kalyan", + title = "The {{Machine Learning Bazaar}}: {{Harnessing}} the {{ML Ecosystem}} for {{Effective System Development}}", + booktitle = "Proceedings of the 2020 {{ACM SIGMOD International Conference}} on {{Management}} of {{Data}}", + year = "2020", + pages = "785--800", + publisher = "{Association for Computing Machinery}", + address = "{Portland, OR, USA}", + doi = "10.1145/3318464.3386146", + isbn = "978-1-4503-6735-6", + language = "en", + series = "{{SIGMOD}} '20" } ``` + +[ml-bazaar-paper]: https://doi.org/10.1145/3318464.3386146 From 9b4ea2bf0275a9c504169273d59081c0d0558868 Mon Sep 17 00:00:00 2001 From: Micah Smith Date: Fri, 25 Jun 2021 13:55:19 -0400 Subject: [PATCH 2/2] More updates to README --- README.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 0dd8a08..45e0f95 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@

[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha) -[![PyPi](https://img.shields.io/pypi/v/autobazaar.svg)](https://pypi.Python.org/pypi/autobazaar) +[![PyPi](https://img.shields.io/pypi/v/autobazaar.svg)](https://pypi.python.org/pypi/autobazaar) [![Tests](https://github.com/MLBazaar/AutoBazaar/workflows/Run%20Tests/badge.svg)](https://github.com/MLBazaar/AutoBazaar/actions?query=workflow%3A%22Run+Tests%22+branch%3Amaster) [![Downloads](https://pepy.tech/badge/autobazaar)](https://pepy.tech/project/autobazaar) @@ -14,7 +14,7 @@ * Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha) * Documentation: https://MLBazaar.github.io/AutoBazaar/ * Homepage: https://github.com/MLBazaar/AutoBazaar -* Paper: [ml-bazaar-paper] +* Paper: [here][ml-bazaar-paper] ## Overview @@ -30,7 +30,7 @@ from the command line. ## Requirements -AutoBazaar has been developed and tested on [Python 3.6 and 3.7](https://www.Python.org/downloads/) +AutoBazaar has been developed and tested on [Python 3.6 and 3.7](https://www.python.org/downloads/) Also, although it is not strictly required, the usage of a [virtualenv](https://virtualenv.pypa.io/en/latest/) is highly recommended in order to avoid @@ -85,9 +85,9 @@ Make sure to have your data prepared in the [Data Format](#data-format) explaine and uncompressed folder in a filesystem directly accessible by AutoBazaar. In order to check, whether your dataset is available and ready to use, you can execute -the `abz` command in your command line with its `list` subcommand. +the `abz list` subcommand. If your dataset is in a different place than inside a folder called `data` within your -current working directory, do not forget to add the `-i` argument to your command indicating +current working directory, add the `-i` argument to your command indicating the path to the folder that contains your dataset. Assuming that the data is inside a folder called `input` within your current folder, @@ -123,7 +123,7 @@ Once your data is ready, you can start the AutoBazaar search process using the ` command. To do this, you will need to provide again the path to where your datasets are contained, as well as the name of the datasets that you want to process. -Without further configuration, the search process will evaluate only the default pipeline without performing additional tuning iteration on it. +Without further configuration, the search process will evaluate only the default pipeline without performing additional tuning iterations on it. ```bash abz search -i path/to/your/datasets/folder name_of_your_dataset @@ -187,7 +187,7 @@ For more details about AutoBazaar and all its possibilities and features, please ## Citing AutoBazaar If you use AutoBazaar for your research, please consider citing -[our paper about ML Bazaar][ml-bazaar-paper] +[our paper about ML Bazaar][ml-bazaar-paper]: ```bibtex @inproceedings{smith2020machine,