Provide Dockers and final check for 0.1 release. (#178)

* provide a Docker file for KungFu GPU. * update Docker files. * fix ldconfig * Add GOBIN * Update README. * README for 0.1 release. * add commands for docker run.
lsds · Oct 26, 2019 · d77a7d6 · d77a7d6
1 parent ba9ab1e
commit d77a7d6
Show file tree

Hide file tree

Showing 6 changed files with 50 additions and 23 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -35,7 +35,7 @@ All source code are under `./srcs/<lang>/` where `<lang> := cpp | go | python`.
 
 * Graph: A directed graph, which may contain self loops. The vertices are numbered from 0 to n - 1.
 
-## Useful commands for development
+## Useful commands
 
 ### Format code
 
@@ -50,6 +50,16 @@ All source code are under `./srcs/<lang>/` where `<lang> := cpp | go | python`.
 pip3 wheel -vvv --no-index .
 ```
 
+### Docker
+
+```bash
+# Run the following command in the KungFu folder
+docker build -f docker/Dockerfile.tf-gpu -t kungfu:gpu .
+
+# Run the docker
+docker run -it kungfu:gpu
+```
+
 ## Use NVIDIA NCCL
 
 KungFu can use [NCCL](https://developer.nvidia.com/nccl) to leverage GPU-GPU direct communication.

diff --git a/README.md b/README.md
@@ -6,7 +6,8 @@ Easy, adaptive and fast distributed machine learning.
 
 KungFu enables users to achieve *fast* and *adaptive* distributed machine learning. This is important because machine learning systems must cope with growing complex models and increasingly complicated deployment environments. KungFu has the following unique features:
 
-* Simplicity: KungFu permits distributed training by adding only one line of code in the training program. KungFu is easy to deploy. It does not require partitioning resources as in parameter servers and heavy dependency like MPI in Horovod.
+* Simplicity: KungFu permits distributed training by adding only one line of code in your existing training program.
+* Easy to deploy: KungFu has minimal dependency. It does not require heavy dependency like MPI in Horovod and external resource like parameter servers. Check the [GPU](docker/Dockerfile.tf-gpu) and [CPU](docker/Dockerfile.tf-cpu) docker files.
 * Adaptive distributed training: KungFu provides many advanced [distributed optimizers](srcs/python/kungfu/optimizers/__init__.py) such as
 communication-efficient [AD-PSGD](https://arxiv.org/abs/1710.06952) and small-batch-efficient [SMA](http://www.vldb.org/pvldb/vol12/p1399-koliousis.pdf) to help you address the cases in which [Synchronous SGD](https://papers.nips.cc/paper/4687-large-scale-distributed-deep-networks.pdf) does not scale.
 * Monitoring: KungFu supports [distributed SGD metrics](srcs/python/kungfu/optimizers/sync_sgd.py) such as [gradient variance](https://en.wikipedia.org/wiki/Variance) and [gradient noise scale](https://openai.com/blog/science-of-ai/) to help understand the training process with low overhead.
@@ -23,7 +24,7 @@ To use KungFu to scale out your TensorFlow training program, you simply need to
 1. Wrap the optimizer in ``SynchronousSGDOptimizer`` or another [distributed optimizer](srcs/python/kungfu/optimizers/__init__.py).
 
 2. Run ``distributed_initializer()`` after calling ``global_variables_initializer()``.
-    The distributed initializer synchronizes the initial variables on all workers.
+    The distributed initializer ensures the initial variables on all workers are consistent.
 
 ```python
 import tensorflow as tf
@@ -71,7 +72,8 @@ kungfu-run -np $NUM_GPUS \
 
 ## Install
 
-KungFu requires [Python 3](https://www.python.org/downloads/), [CMake 3.5+](https://cmake.org/install/), [Golang 1.11+](https://golang.org/dl/) and [TensorFlow <=1.13.2](https://www.tensorflow.org/install/pip#older-versions-of-tensorflow).
+KungFu requires [Python 3](https://www.python.org/downloads/), [CMake 3.5+](https://cmake.org/install/), [Golang 1.13+](https://golang.org/dl/) and [TensorFlow <=1.13.2](https://www.tensorflow.org/install/pip#older-versions-of-tensorflow).
+You can also install KungFu using the following few lines assuming you have installed the above pre-requites.
 
 ```bash
 # Install tensorflow CPU
@@ -96,6 +98,8 @@ GOBIN=$(pwd)/bin go install -v ./srcs/go/cmd/kungfu-run
 ./bin/kungfu-run -help
 ```
 
+You can also use KungFu within a Docker. Check the docker files for [GPU](docker/Dockerfile.tf-gpu) and [CPU](docker/Dockerfile.tf-gpu) machines.
+
 ## Benchmark
 
 We benchmark the performance of KungFu in a cluster that has 16 V100 GPUs hosted by 2 DGX-1 machines.
@@ -116,7 +120,7 @@ All benchmark scripts are available [here](benchmarks/system/).
 
 ## Convergence
 
-The synchronization algorithms (``SynchronousSGDOptimizer``, ``PairAveragingOptimizer`` and ``SynchronousAveragingOptimizer``)
+The distributed optimizers (``SynchronousSGDOptimizer``, ``PairAveragingOptimizer`` and ``SynchronousAveragingOptimizer``)
 can reach the same evaluation accuracy as Horovod. We validated this with the ResNet-50 and ResNet-101 models in the [TensorFlow benchmark](https://github.com/luomai/benchmarks/tree/cnn_tf_v1.12_compatible_kungfu).
 You can also add your own KungFu distributed optimizer to the benchmark by adding one line of code, see [here](https://github.com/luomai/benchmarks/blob/1eb102a81cdcd42cdbea56d2d19f36a8018e9f80/scripts/tf_cnn_benchmarks/benchmark_cnn.py#L1197).
 

diff --git a/benchmarks/system/README.md b/benchmarks/system/README.md
@@ -2,7 +2,13 @@
 
 Distributed training benchmark of KungFu, Horovod and Parameter Servers.
 
-We assume the benchmark runs on a server with 4 GPUs. The Tensorflow version is 1.13.1.
+## Intro
+
+This benchmark requires TensorFlow <=1.13.2, KungFu and Horovod.
+We have run this benchmark on two clusters: one has two DGX-1 machines (each has 8 V100) and one has 16 P100 machines. You can see the benchmark result [here](result/).
+
+In the following, we provide sample commands to run the benchmark.
+We assume the benchmark runs on a server with 4 GPUs.
 The benchmark imports models from [tf.keras.applications](https://www.tensorflow.org/api_docs/python/tf/keras/applications). You can freely choose different models
 and batch sizes.
 
@@ -48,7 +54,7 @@ kungfu-run -np 4 python3 benchmark_kungfu.py --kungfu=async-sgd --model=ResNet50
 Use the following shell script to run the parameter server benchmark.
 
 ```bash
-# Configure 1 local parameter server (You can create more parameter servers)
+# Configure 1 local parameter server (We suggest users to have a 1:1 ratio between parameter servers and workers)
 PS_HOSTS="localhost:2220"
 
 # Configure four training workers

diff --git a/docker/Dockerfile.builder-ubuntu18 b/docker/Dockerfile.builder-ubuntu18
@@ -16,8 +16,8 @@ RUN apt update && \
 ARG PY_MIRROR='-i https://pypi.tuna.tsinghua.edu.cn/simple'
 RUN pip3 install ${PY_MIRROR} tensorflow
 
-RUN wget -q https://dl.google.com/go/go1.11.linux-amd64.tar.gz && \
-    tar -C /usr/local -xf go1.11.linux-amd64.tar.gz && \
-    rm go1.11.linux-amd64.tar.gz
+RUN wget -q https://dl.google.com/go/go1.13.linux-amd64.tar.gz && \
+    tar -C /usr/local -xf go1.13.linux-amd64.tar.gz && \
+    rm go1.13.linux-amd64.tar.gz
 
 ENV PATH=${PATH}:/usr/local/go/bin
diff --git a/docker/Dockerfile.tf-cpu b/docker/Dockerfile.tf-cpu
@@ -0,0 +1,13 @@
+FROM tensorflow/tensorflow:1.13.1-py3
+
+RUN apt update && apt install -y cmake wget
+RUN wget -q https://dl.google.com/go/go1.13.linux-amd64.tar.gz && \
+    tar -C /usr/local -xf go1.13.linux-amd64.tar.gz && \
+    rm go1.13.linux-amd64.tar.gz
+ENV PATH=${PATH}:/usr/local/go/bin
+
+ADD . /src/kungfu
+WORKDIR /src/kungfu
+
+RUN pip3 install --no-index -U .
+RUN GOBIN=/usr/bin go install -v ./srcs/go/cmd/kungfu-run
diff --git a/docker/Dockerfile.tf-gpu b/docker/Dockerfile.tf-gpu
@@ -1,19 +1,13 @@
-FROM tensorflow/tensorflow:1.12.0-gpu-py3 AS builder
+FROM tensorflow/tensorflow:1.13.1-gpu-py3
 
-ADD docker/sources.list.aliyun /etc/apt/sources.list
-RUN rm -fr /etc/apt/sources.list.d/* && \
-    apt update && \
-    apt install -y cmake wget
-RUN wget -q https://dl.google.com/go/go1.11.linux-amd64.tar.gz && \
-    tar -C /usr/local -xf go1.11.linux-amd64.tar.gz && \
-    rm go1.11.linux-amd64.tar.gz
+RUN apt update && apt install -y cmake wget
+RUN wget -q https://dl.google.com/go/go1.13.linux-amd64.tar.gz && \
+    tar -C /usr/local -xf go1.13.linux-amd64.tar.gz && \
+    rm go1.13.linux-amd64.tar.gz
 ENV PATH=${PATH}:/usr/local/go/bin
 
-ADD scripts /src/scripts
-RUN PREFIX=/usr /src/scripts/install-nccl.sh && \
-    rm /usr/lib/x86_64-linux-gnu/libnccl.so.2
-
 ADD . /src/kungfu
 WORKDIR /src/kungfu
 
-# RUN pip3 install --no-index -U .
+RUN ldconfig /usr/local/cuda-10.0/targets/x86_64-linux/lib/stubs && pip3 install --no-index -U .
+RUN GOBIN=/usr/bin go install -v ./srcs/go/cmd/kungfu-run