From 0474b2ee69a74b6f61384f74fdb7ccf2295a129e Mon Sep 17 00:00:00 2001 From: Joeri Hermans Date: Tue, 18 Oct 2016 17:35:07 +0200 Subject: [PATCH] Add documentation --- README.md | 12 ++----- docs/index.md | 20 ++++++++---- docs/optimizers.md | 79 ++++++++++++++++++++++++++++++++++++++++++++++ mkdocs.yml | 3 +- 4 files changed, 96 insertions(+), 18 deletions(-) create mode 100644 docs/optimizers.md diff --git a/README.md b/README.md index a0cf26e..a34b64b 100644 --- a/README.md +++ b/README.md @@ -58,21 +58,13 @@ If you want to run the examples using Apache Spark 2.0.0 and higher. You will ne ### Single Trainer -A single trainer is in all simplicity a trainer which will use a single Spark executor to train a model. This trainer is usually used as a baseline metrics for new distributed optimizers. +A single trainer is in all simplicity a trainer which will use a single Spark thread to train a model. This trainer is usually used as a baseline metrics for new distributed optimizers. ```python SingleTrainer(keras_model, num_epoch=1, batch_size=1000, features_col="features", label_col="label") ``` -### Ensemble Trainer - -This trainer will employ [ensemble learning](https://en.wikipedia.org/wiki/Ensemble_learning) to build a classifier. There are two modes, in the first mode, you will get a list of Keras models which have been trained on different partitions of the data. In the other mode, all the models will be merged by adding an averaging layer to the networks. The resulting model will thus have the same outputs as the specified Keras model, with the difference that the actual output is the averaged output of the parallely trained models. - -```python -EnsembleTrainer(keras_model, num_models=2, merge_models=False, features_col="features", label_col="label") -``` - -### EASGD +### EASGD (currently recommended, testing AEASGD) The distinctive idea of EASGD is to allow the local workers to perform more exploration (small rho) and the master to perform exploitation. This approach differs from other settings explored in the literature, and focus on how fast the center variable converges [[1]](https://arxiv.org/pdf/1412.6651.pdf) . diff --git a/docs/index.md b/docs/index.md index 9292390..1fbd543 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,12 +1,9 @@ # Distributed Keras -Distributed Keras (DK) is a **distributed deep learning framework** built op top of Apache Spark and Keras with the goal to significantly reduce the training time using distributed machine learning algorithms. We designed the framework in such a way that a developer could implement a new distributed optimizer with ease, thus enabling a person to focus on research and model development. Several distributed methods are implemented, such as, but not restricted to the training of **ensemble models**, and **data parallel** models. +Distributed Keras (DK) is a **distributed deep learning framework** built op top of Apache Spark and Keras with the goal to significantly reduce the training time using distributed machine learning algorithms. We designed the framework in such a way that a developer could implement a new distributed optimizer with ease, thus enabling a person to focus on research and model development. As mentioned above, most of our methods follow the data parallel approach as described in the paper on [Large Scale Distributed Deep Networks](http://papers.nips.cc/paper/4687-large-scale-distributed-deep-networks.pdf). In this paradigm, replicas of a model are distributed over several "trainers", and every model replica will be trained on a different partition of the dataset. The gradient (or all network weights, depending on the implementation details) will be communicated with the parameter server after every gradient update. The parameter server is responsible for handling the gradient updates of all workers and incorperating all gradient updates into a single master model which will be returned to the user after the training procedure is complete. -!!! warning - Since this is alpha software, we have hardcoded the loss in the workers for now. You can change this easily by modifying the compilation arguments of the models. - ## Installation We rely on [Keras](https://keras.io) for the construction of models, and thus following the Keras dependencies. Furthermore, PySpark is also a dependency for this project since DK is using Apache Spark for the distribution of the data and the model replicas. @@ -21,7 +18,7 @@ pip install git+https://github.com/JoeriHermans/dist-keras.git ### Git -However, if you would like to play with the examples and install the framework, it is recommended to use to following approach. +However, if you would like to play with the examples and notebooks, simply install the framework using the approach described below. ```bash git clone https://github.com/JoeriHermans/dist-keras @@ -29,10 +26,17 @@ cd dist-keras pip install -e . ``` -## Architecture - ## Getting Started +We recommend starting with the `workflow` notebook located in the `examples` directory. This Python notebook will guide you through all general steps which should need to perform. This includes setting up a Spark Context, reading the data, applying preprocessing, training and evaluation of your model in a distributed way. + +!!! Note + Running the **workflow.ipyn** notebook can be run on your local machine. However, we recommend running the notebook on a Spark cluster since the distributed trainers start to outperform the *SingleTrainer* when the number of workers (cores multiplied by executors) is usually higher than 10. + +## Support + +For issues, bugs, questions, and suggestions. Please use the appropriate channels on [GitHub](https://github.com/JoeriHermans/dist-keras/). + ## References * Zhang, S., Choromanska, A. E., & LeCun, Y. (2015). Deep learning with elastic averaging SGD. In Advances in Neural Information Processing Systems (pp. 685-693). @@ -41,6 +45,8 @@ pip install -e . * Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., ... & Ng, A. Y. (2012). Large scale distributed deep networks. In Advances in neural information processing systems (pp. 1223-1231). +* Pumperla, M. (2015). Elephas. Github Repository https://github.com/maxpumperla/elephas/. [4] + ## Licensing ![GPLv3](images/gpl_v3.png) ![CERN](images/cern_logo.jpg) diff --git a/docs/optimizers.md b/docs/optimizers.md new file mode 100644 index 0000000..3d4ff1a --- /dev/null +++ b/docs/optimizers.md @@ -0,0 +1,79 @@ +# Optimizers + +Optimizers, or trainers, are the main component in Distributed Keras (DK). All trainers share a single interface, which is the `Trainer` class, defined in `distkeras/distributed.py`. This class also contains the `serialized model`, the `loss`, and the `Keras optimizer` the workers need to use. Generally, a trainer will run on a single worker. In the context of Apache Spark, this means that the thread which is responsible for doing the foreachPartition or mapPartitions will have been assigned a trainer. In reality however, the training of the model itself will utilise more physical cores. In fact, it will employ all available cores, and thus bypassing resource managers such as YARN. + +## Single Trainer + +A single trainer is in all simplicity a trainer which will use a single thread (as discussed above) to train a model. This trainer is usually used as a baseline metric for new distributed optimizers. + +```python +SingleTrainer(keras_model, worker_optimizer, loss, num_epoch=1, + batch_size=32, features_col="features", label_col="label") +``` +**Parameters**: + +- **keras_model**: The Keras model which should be trained. +- **worker_optmizer**: Keras optimizer for workers. +- **num_epoch**: Number of epoch iterations over the data. +- **batch_size**: Mini-batch size. +- **features_col**: Column of the feature vector in the Spark Dataframe. +- **label_col**: Column of the label in the Spark Dataframe. + +## EASGD + +The distinctive idea of EASGD is to allow the local workers to perform more exploration (small rho) and the master to perform exploitation. This approach differs from other settings explored in the literature, and focus on how fast the center variable converges [(paper)](https://arxiv.org/pdf/1412.6651.pdf) . + +We want to note the basic version of EASGD is a synchronous algorithm, i.e., once a worker is done processing a batch of the data, it will wait until all other workers have submitted their variables (this includes the weight parameterization, iteration number, and worker id) to the parameter server before starting the next data batch. + +```python +EASGD(keras_model, worker_optimizer, loss, num_workers=2, features_col="features", label_col="label", + rho=5.0, learning_rate=0.01, batch_size=32, num_epoch=1, master_port=5000) +``` + +**Parameters**: + +TODO + +## Asynchronous EASGD + +In this section we propose the asynchronous version of EASGD. Instead of waiting on the synchronization of other trainers, this method communicates the elastic difference (as described in the paper), with the parameter server. The only synchronization mechanism that has been implemented, is to ensure no race-conditions occur when updating the center variable. + +```python +AsynchronousEASGD(keras_model, worker_optimizer, loss, num_workers=2, batch_size=1000, + features_col="features", label_col="label", communication_window=3, + rho=0.01, learning_rate=0.01, master_port=5000, num_epoch=1) +``` + +**Parameters**: + +TODO + +## DOWNPOUR + +An asynchronous stochastic gradient descent procedure supporting a large number of model replicas and leverages adaptive learning rates. This implementation is based on the pseudocode provided by [Zhang et al.](https://arxiv.org/pdf/1412.6651.pdf) + +```python +DOWNPOUR(keras_model, worker_optimizer, loss, num_workers=2, batch_size=1000, + features_col="features", label_col="label", communication_window=5, + master_port=5000, num_epoch=1, learning_rate=0.01)) +``` + +**Parameters**: + +TODO + +## Custom distributed optimizer + +TODO + +### Synchronized Distributed Trainer + +TODO + +### Asynchronous Distributed Trainer + +TODO + +### Implementing a custom worker + +TODO diff --git a/mkdocs.yml b/mkdocs.yml index ace6155..b667ad1 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -7,10 +7,11 @@ site_url: 'http://dist-keras.joerihermans.com' # Page definitions. pages: - Home: index.md +- Optimizers: optimizers.md - License: license.md # Documentation and theme configuration -theme: material +theme: readthedocs docs_dir: 'docs' markdown_extensions: - admonition