Skip to content

Commit

Permalink
Provide Dockers and final check for 0.1 release. (#178)
Browse files Browse the repository at this point in the history
* provide a Docker file for KungFu GPU.

* update Docker files.

* fix ldconfig

* Add GOBIN

* Update README.

* README for 0.1 release.

* add commands for docker run.
  • Loading branch information
luomai authored Oct 26, 2019
1 parent ba9ab1e commit d77a7d6
Show file tree
Hide file tree
Showing 6 changed files with 50 additions and 23 deletions.
12 changes: 11 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ All source code are under `./srcs/<lang>/` where `<lang> := cpp | go | python`.

* Graph: A directed graph, which may contain self loops. The vertices are numbered from 0 to n - 1.

## Useful commands for development
## Useful commands

### Format code

Expand All @@ -50,6 +50,16 @@ All source code are under `./srcs/<lang>/` where `<lang> := cpp | go | python`.
pip3 wheel -vvv --no-index .
```

### Docker

```bash
# Run the following command in the KungFu folder
docker build -f docker/Dockerfile.tf-gpu -t kungfu:gpu .

# Run the docker
docker run -it kungfu:gpu
```

## Use NVIDIA NCCL

KungFu can use [NCCL](https://developer.nvidia.com/nccl) to leverage GPU-GPU direct communication.
Expand Down
12 changes: 8 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ Easy, adaptive and fast distributed machine learning.

KungFu enables users to achieve *fast* and *adaptive* distributed machine learning. This is important because machine learning systems must cope with growing complex models and increasingly complicated deployment environments. KungFu has the following unique features:

* Simplicity: KungFu permits distributed training by adding only one line of code in the training program. KungFu is easy to deploy. It does not require partitioning resources as in parameter servers and heavy dependency like MPI in Horovod.
* Simplicity: KungFu permits distributed training by adding only one line of code in your existing training program.
* Easy to deploy: KungFu has minimal dependency. It does not require heavy dependency like MPI in Horovod and external resource like parameter servers. Check the [GPU](docker/Dockerfile.tf-gpu) and [CPU](docker/Dockerfile.tf-cpu) docker files.
* Adaptive distributed training: KungFu provides many advanced [distributed optimizers](srcs/python/kungfu/optimizers/__init__.py) such as
communication-efficient [AD-PSGD](https://arxiv.org/abs/1710.06952) and small-batch-efficient [SMA](http://www.vldb.org/pvldb/vol12/p1399-koliousis.pdf) to help you address the cases in which [Synchronous SGD](https://papers.nips.cc/paper/4687-large-scale-distributed-deep-networks.pdf) does not scale.
* Monitoring: KungFu supports [distributed SGD metrics](srcs/python/kungfu/optimizers/sync_sgd.py) such as [gradient variance](https://en.wikipedia.org/wiki/Variance) and [gradient noise scale](https://openai.com/blog/science-of-ai/) to help understand the training process with low overhead.
Expand All @@ -23,7 +24,7 @@ To use KungFu to scale out your TensorFlow training program, you simply need to
1. Wrap the optimizer in ``SynchronousSGDOptimizer`` or another [distributed optimizer](srcs/python/kungfu/optimizers/__init__.py).

2. Run ``distributed_initializer()`` after calling ``global_variables_initializer()``.
The distributed initializer synchronizes the initial variables on all workers.
The distributed initializer ensures the initial variables on all workers are consistent.

```python
import tensorflow as tf
Expand Down Expand Up @@ -71,7 +72,8 @@ kungfu-run -np $NUM_GPUS \

## Install

KungFu requires [Python 3](https://www.python.org/downloads/), [CMake 3.5+](https://cmake.org/install/), [Golang 1.11+](https://golang.org/dl/) and [TensorFlow <=1.13.2](https://www.tensorflow.org/install/pip#older-versions-of-tensorflow).
KungFu requires [Python 3](https://www.python.org/downloads/), [CMake 3.5+](https://cmake.org/install/), [Golang 1.13+](https://golang.org/dl/) and [TensorFlow <=1.13.2](https://www.tensorflow.org/install/pip#older-versions-of-tensorflow).
You can also install KungFu using the following few lines assuming you have installed the above pre-requites.

```bash
# Install tensorflow CPU
Expand All @@ -96,6 +98,8 @@ GOBIN=$(pwd)/bin go install -v ./srcs/go/cmd/kungfu-run
./bin/kungfu-run -help
```

You can also use KungFu within a Docker. Check the docker files for [GPU](docker/Dockerfile.tf-gpu) and [CPU](docker/Dockerfile.tf-gpu) machines.

## Benchmark

We benchmark the performance of KungFu in a cluster that has 16 V100 GPUs hosted by 2 DGX-1 machines.
Expand All @@ -116,7 +120,7 @@ All benchmark scripts are available [here](benchmarks/system/).

## Convergence

The synchronization algorithms (``SynchronousSGDOptimizer``, ``PairAveragingOptimizer`` and ``SynchronousAveragingOptimizer``)
The distributed optimizers (``SynchronousSGDOptimizer``, ``PairAveragingOptimizer`` and ``SynchronousAveragingOptimizer``)
can reach the same evaluation accuracy as Horovod. We validated this with the ResNet-50 and ResNet-101 models in the [TensorFlow benchmark](https://github.com/luomai/benchmarks/tree/cnn_tf_v1.12_compatible_kungfu).
You can also add your own KungFu distributed optimizer to the benchmark by adding one line of code, see [here](https://github.com/luomai/benchmarks/blob/1eb102a81cdcd42cdbea56d2d19f36a8018e9f80/scripts/tf_cnn_benchmarks/benchmark_cnn.py#L1197).

Expand Down
10 changes: 8 additions & 2 deletions benchmarks/system/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,13 @@

Distributed training benchmark of KungFu, Horovod and Parameter Servers.

We assume the benchmark runs on a server with 4 GPUs. The Tensorflow version is 1.13.1.
## Intro

This benchmark requires TensorFlow <=1.13.2, KungFu and Horovod.
We have run this benchmark on two clusters: one has two DGX-1 machines (each has 8 V100) and one has 16 P100 machines. You can see the benchmark result [here](result/).

In the following, we provide sample commands to run the benchmark.
We assume the benchmark runs on a server with 4 GPUs.
The benchmark imports models from [tf.keras.applications](https://www.tensorflow.org/api_docs/python/tf/keras/applications). You can freely choose different models
and batch sizes.

Expand Down Expand Up @@ -48,7 +54,7 @@ kungfu-run -np 4 python3 benchmark_kungfu.py --kungfu=async-sgd --model=ResNet50
Use the following shell script to run the parameter server benchmark.

```bash
# Configure 1 local parameter server (You can create more parameter servers)
# Configure 1 local parameter server (We suggest users to have a 1:1 ratio between parameter servers and workers)
PS_HOSTS="localhost:2220"

# Configure four training workers
Expand Down
6 changes: 3 additions & 3 deletions docker/Dockerfile.builder-ubuntu18
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@ RUN apt update && \
ARG PY_MIRROR='-i https://pypi.tuna.tsinghua.edu.cn/simple'
RUN pip3 install ${PY_MIRROR} tensorflow

RUN wget -q https://dl.google.com/go/go1.11.linux-amd64.tar.gz && \
tar -C /usr/local -xf go1.11.linux-amd64.tar.gz && \
rm go1.11.linux-amd64.tar.gz
RUN wget -q https://dl.google.com/go/go1.13.linux-amd64.tar.gz && \
tar -C /usr/local -xf go1.13.linux-amd64.tar.gz && \
rm go1.13.linux-amd64.tar.gz

ENV PATH=${PATH}:/usr/local/go/bin
13 changes: 13 additions & 0 deletions docker/Dockerfile.tf-cpu
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
FROM tensorflow/tensorflow:1.13.1-py3

RUN apt update && apt install -y cmake wget
RUN wget -q https://dl.google.com/go/go1.13.linux-amd64.tar.gz && \
tar -C /usr/local -xf go1.13.linux-amd64.tar.gz && \
rm go1.13.linux-amd64.tar.gz
ENV PATH=${PATH}:/usr/local/go/bin

ADD . /src/kungfu
WORKDIR /src/kungfu

RUN pip3 install --no-index -U .
RUN GOBIN=/usr/bin go install -v ./srcs/go/cmd/kungfu-run
20 changes: 7 additions & 13 deletions docker/Dockerfile.tf-gpu
Original file line number Diff line number Diff line change
@@ -1,19 +1,13 @@
FROM tensorflow/tensorflow:1.12.0-gpu-py3 AS builder
FROM tensorflow/tensorflow:1.13.1-gpu-py3

ADD docker/sources.list.aliyun /etc/apt/sources.list
RUN rm -fr /etc/apt/sources.list.d/* && \
apt update && \
apt install -y cmake wget
RUN wget -q https://dl.google.com/go/go1.11.linux-amd64.tar.gz && \
tar -C /usr/local -xf go1.11.linux-amd64.tar.gz && \
rm go1.11.linux-amd64.tar.gz
RUN apt update && apt install -y cmake wget
RUN wget -q https://dl.google.com/go/go1.13.linux-amd64.tar.gz && \
tar -C /usr/local -xf go1.13.linux-amd64.tar.gz && \
rm go1.13.linux-amd64.tar.gz
ENV PATH=${PATH}:/usr/local/go/bin

ADD scripts /src/scripts
RUN PREFIX=/usr /src/scripts/install-nccl.sh && \
rm /usr/lib/x86_64-linux-gnu/libnccl.so.2

ADD . /src/kungfu
WORKDIR /src/kungfu

# RUN pip3 install --no-index -U .
RUN ldconfig /usr/local/cuda-10.0/targets/x86_64-linux/lib/stubs && pip3 install --no-index -U .
RUN GOBIN=/usr/bin go install -v ./srcs/go/cmd/kungfu-run

0 comments on commit d77a7d6

Please sign in to comment.