Skip to content
This repository has been archived by the owner on Nov 8, 2018. It is now read-only.

Commit

Permalink
Add documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
Joeri Hermans committed Oct 18, 2016
1 parent 8f3c45f commit 0474b2e
Show file tree
Hide file tree
Showing 4 changed files with 96 additions and 18 deletions.
12 changes: 2 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,21 +58,13 @@ If you want to run the examples using Apache Spark 2.0.0 and higher. You will ne

### Single Trainer

A single trainer is in all simplicity a trainer which will use a single Spark executor to train a model. This trainer is usually used as a baseline metrics for new distributed optimizers.
A single trainer is in all simplicity a trainer which will use a single Spark thread to train a model. This trainer is usually used as a baseline metrics for new distributed optimizers.

```python
SingleTrainer(keras_model, num_epoch=1, batch_size=1000, features_col="features", label_col="label")
```

### Ensemble Trainer

This trainer will employ [ensemble learning](https://en.wikipedia.org/wiki/Ensemble_learning) to build a classifier. There are two modes, in the first mode, you will get a list of Keras models which have been trained on different partitions of the data. In the other mode, all the models will be merged by adding an averaging layer to the networks. The resulting model will thus have the same outputs as the specified Keras model, with the difference that the actual output is the averaged output of the parallely trained models.

```python
EnsembleTrainer(keras_model, num_models=2, merge_models=False, features_col="features", label_col="label")
```

### EASGD
### EASGD (currently recommended, testing AEASGD)

The distinctive idea of EASGD is to allow the local workers to perform more exploration (small rho) and the master to perform exploitation. This approach differs from other settings explored in the literature, and focus on how fast the center variable converges [[1]](https://arxiv.org/pdf/1412.6651.pdf) .

Expand Down
20 changes: 13 additions & 7 deletions docs/index.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,9 @@
# Distributed Keras

Distributed Keras (DK) is a **distributed deep learning framework** built op top of Apache Spark and Keras with the goal to significantly reduce the training time using distributed machine learning algorithms. We designed the framework in such a way that a developer could implement a new distributed optimizer with ease, thus enabling a person to focus on research and model development. Several distributed methods are implemented, such as, but not restricted to the training of **ensemble models**, and **data parallel** models.
Distributed Keras (DK) is a **distributed deep learning framework** built op top of Apache Spark and Keras with the goal to significantly reduce the training time using distributed machine learning algorithms. We designed the framework in such a way that a developer could implement a new distributed optimizer with ease, thus enabling a person to focus on research and model development.

As mentioned above, most of our methods follow the data parallel approach as described in the paper on [Large Scale Distributed Deep Networks](http://papers.nips.cc/paper/4687-large-scale-distributed-deep-networks.pdf). In this paradigm, replicas of a model are distributed over several "trainers", and every model replica will be trained on a different partition of the dataset. The gradient (or all network weights, depending on the implementation details) will be communicated with the parameter server after every gradient update. The parameter server is responsible for handling the gradient updates of all workers and incorperating all gradient updates into a single master model which will be returned to the user after the training procedure is complete.

!!! warning
Since this is alpha software, we have hardcoded the loss in the workers for now. You can change this easily by modifying the compilation arguments of the models.

## Installation

We rely on [Keras](https://keras.io) for the construction of models, and thus following the Keras dependencies. Furthermore, PySpark is also a dependency for this project since DK is using Apache Spark for the distribution of the data and the model replicas.
Expand All @@ -21,18 +18,25 @@ pip install git+https://github.com/JoeriHermans/dist-keras.git

### Git

However, if you would like to play with the examples and install the framework, it is recommended to use to following approach.
However, if you would like to play with the examples and notebooks, simply install the framework using the approach described below.

```bash
git clone https://github.com/JoeriHermans/dist-keras
cd dist-keras
pip install -e .
```

## Architecture

## Getting Started

We recommend starting with the `workflow` notebook located in the `examples` directory. This Python notebook will guide you through all general steps which should need to perform. This includes setting up a Spark Context, reading the data, applying preprocessing, training and evaluation of your model in a distributed way.

!!! Note
Running the **workflow.ipyn** notebook can be run on your local machine. However, we recommend running the notebook on a Spark cluster since the distributed trainers start to outperform the *SingleTrainer* when the number of workers (cores multiplied by executors) is usually higher than 10.

## Support

For issues, bugs, questions, and suggestions. Please use the appropriate channels on [GitHub](https://github.com/JoeriHermans/dist-keras/).

## References

* Zhang, S., Choromanska, A. E., & LeCun, Y. (2015). Deep learning with elastic averaging SGD. In Advances in Neural Information Processing Systems (pp. 685-693).
Expand All @@ -41,6 +45,8 @@ pip install -e .

* Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., ... & Ng, A. Y. (2012). Large scale distributed deep networks. In Advances in neural information processing systems (pp. 1223-1231).

* Pumperla, M. (2015). Elephas. Github Repository https://github.com/maxpumperla/elephas/. [4]

## Licensing

![GPLv3](images/gpl_v3.png) ![CERN](images/cern_logo.jpg)
79 changes: 79 additions & 0 deletions docs/optimizers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Optimizers

Optimizers, or trainers, are the main component in Distributed Keras (DK). All trainers share a single interface, which is the `Trainer` class, defined in `distkeras/distributed.py`. This class also contains the `serialized model`, the `loss`, and the `Keras optimizer` the workers need to use. Generally, a trainer will run on a single worker. In the context of Apache Spark, this means that the thread which is responsible for doing the foreachPartition or mapPartitions will have been assigned a trainer. In reality however, the training of the model itself will utilise more physical cores. In fact, it will employ all available cores, and thus bypassing resource managers such as YARN.

## Single Trainer

A single trainer is in all simplicity a trainer which will use a single thread (as discussed above) to train a model. This trainer is usually used as a baseline metric for new distributed optimizers.

```python
SingleTrainer(keras_model, worker_optimizer, loss, num_epoch=1,
batch_size=32, features_col="features", label_col="label")
```
**Parameters**:

- **keras_model**: The Keras model which should be trained.
- **worker_optmizer**: Keras optimizer for workers.
- **num_epoch**: Number of epoch iterations over the data.
- **batch_size**: Mini-batch size.
- **features_col**: Column of the feature vector in the Spark Dataframe.
- **label_col**: Column of the label in the Spark Dataframe.

## EASGD

The distinctive idea of EASGD is to allow the local workers to perform more exploration (small rho) and the master to perform exploitation. This approach differs from other settings explored in the literature, and focus on how fast the center variable converges [(paper)](https://arxiv.org/pdf/1412.6651.pdf) .

We want to note the basic version of EASGD is a synchronous algorithm, i.e., once a worker is done processing a batch of the data, it will wait until all other workers have submitted their variables (this includes the weight parameterization, iteration number, and worker id) to the parameter server before starting the next data batch.

```python
EASGD(keras_model, worker_optimizer, loss, num_workers=2, features_col="features", label_col="label",
rho=5.0, learning_rate=0.01, batch_size=32, num_epoch=1, master_port=5000)
```

**Parameters**:

TODO

## Asynchronous EASGD

In this section we propose the asynchronous version of EASGD. Instead of waiting on the synchronization of other trainers, this method communicates the elastic difference (as described in the paper), with the parameter server. The only synchronization mechanism that has been implemented, is to ensure no race-conditions occur when updating the center variable.

```python
AsynchronousEASGD(keras_model, worker_optimizer, loss, num_workers=2, batch_size=1000,
features_col="features", label_col="label", communication_window=3,
rho=0.01, learning_rate=0.01, master_port=5000, num_epoch=1)
```

**Parameters**:

TODO

## DOWNPOUR

An asynchronous stochastic gradient descent procedure supporting a large number of model replicas and leverages adaptive learning rates. This implementation is based on the pseudocode provided by [Zhang et al.](https://arxiv.org/pdf/1412.6651.pdf)

```python
DOWNPOUR(keras_model, worker_optimizer, loss, num_workers=2, batch_size=1000,
features_col="features", label_col="label", communication_window=5,
master_port=5000, num_epoch=1, learning_rate=0.01))
```

**Parameters**:

TODO

## Custom distributed optimizer

TODO

### Synchronized Distributed Trainer

TODO

### Asynchronous Distributed Trainer

TODO

### Implementing a custom worker

TODO
3 changes: 2 additions & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,11 @@ site_url: 'http://dist-keras.joerihermans.com'
# Page definitions.
pages:
- Home: index.md
- Optimizers: optimizers.md
- License: license.md

# Documentation and theme configuration
theme: material
theme: readthedocs
docs_dir: 'docs'
markdown_extensions:
- admonition
Expand Down

0 comments on commit 0474b2e

Please sign in to comment.