Skip to content

GetStarted_Standalone

leewyang edited this page Sep 28, 2017 · 19 revisions

Running TensorFlowOnSpark on a Spark Standalone cluster

We illustrate how to use TensorFlowOnSpark on a Spark Standalone cluster running on a single machine. Note that TensorFlowOnSpark does not work in Spark Local (single-process) mode, since it expects the executors to be running in separate processes.

Clone TensorFlowOnOnSpark code

git clone https://github.com/yahoo/TensorFlowOnSpark.git
cd TensorFlowOnSpark
export TFoS_HOME=$(pwd)

Install Spark

Download Apache Spark per instructions. Note: we use version 1.6.0 here, but you can install a later version if you like.

${TFoS_HOME}/scripts/local-setup-spark.sh
rm spark-1.6.0-bin-hadoop2.6.tar
export SPARK_HOME=$(pwd)/spark-1.6.0-bin-hadoop2.6
export PATH=${SPARK_HOME}/bin:${PATH}

Install TensorFlow and TensorFlowOnSpark

Please build and install TensorFlow per instructions.

For example, using the pip install method, you should be able to install TensorFlow as follows:

sudo pip install tensorflow
sudo pip install tensorflowonspark

To view the installed packages:

pip list

Download MNIST data

mkdir ${TFoS_HOME}/mnist
pushd ${TFoS_HOME}/mnist
curl -O "http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz"
curl -O "http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz"
curl -O "http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz"
curl -O "http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz"
popd
export MASTER=spark://$(hostname):7077
export SPARK_WORKER_INSTANCES=2
export CORES_PER_WORKER=1 
export TOTAL_CORES=$((${CORES_PER_WORKER}*${SPARK_WORKER_INSTANCES})) 
${SPARK_HOME}/sbin/start-master.sh; ${SPARK_HOME}/sbin/start-slave.sh -c $CORES_PER_WORKER -m 3G ${MASTER}

Go to MasterWebUI, make sure that you have the exact number of workers launched.

Test Pypark, TensorFlow, and TensorFlowOnSpark

Start a pyspark shell and import tensorflow and tensorflowonspark. If everything is setup correctly, you shouldn't see any errors.

pyspark
>>> import tensorflow as tf
>>> from tensorflowonspark import TFCluster
>>> exit()

Convert the MNIST zip files using Spark

cd ${TFoS_HOME}
# rm -rf examples/mnist/csv
${SPARK_HOME}/bin/spark-submit \
--master ${MASTER} \
${TFoS_HOME}/examples/mnist/mnist_data_setup.py \
--output examples/mnist/csv \
--format csv
ls -lR examples/mnist/csv

Run distributed MNIST training (using feed_dict)

# rm -rf mnist_model
${SPARK_HOME}/bin/spark-submit \
--master ${MASTER} \
--py-files ${TFoS_HOME}/examples/mnist/spark/mnist_dist.py \
--conf spark.cores.max=${TOTAL_CORES} \
--conf spark.task.cpus=${CORES_PER_WORKER} \
--conf spark.executorEnv.JAVA_HOME="$JAVA_HOME" \
${TFoS_HOME}/examples/mnist/spark/mnist_spark.py \
--cluster_size ${SPARK_WORKER_INSTANCES} \
--images examples/mnist/csv/train/images \
--labels examples/mnist/csv/train/labels \
--format csv \
--mode train \
--model mnist_model

ls -l mnist_model

Run distributed MNIST inference (using feed_dict)

# rm -rf predictions
${SPARK_HOME}/bin/spark-submit \
--master ${MASTER} \
--py-files ${TFoS_HOME}/examples/mnist/spark/mnist_dist.py \
--conf spark.cores.max=${TOTAL_CORES} \
--conf spark.task.cpus=${CORES_PER_WORKER} \
--conf spark.executorEnv.JAVA_HOME="$JAVA_HOME" \
${TFoS_HOME}/examples/mnist/spark/mnist_spark.py \
--cluster_size ${SPARK_WORKER_INSTANCES} \
--images examples/mnist/csv/test/images \
--labels examples/mnist/csv/test/labels \
--mode inference \
--format csv \
--model mnist_model \
--output predictions

less predictions/part-00000

The prediction result should look like:

2017-02-10T23:29:17.009563 Label: 7, Prediction: 7
2017-02-10T23:29:17.009677 Label: 2, Prediction: 2
2017-02-10T23:29:17.009721 Label: 1, Prediction: 1
2017-02-10T23:29:17.009761 Label: 0, Prediction: 0
2017-02-10T23:29:17.009799 Label: 4, Prediction: 4
2017-02-10T23:29:17.009838 Label: 1, Prediction: 1
2017-02-10T23:29:17.009876 Label: 4, Prediction: 4
2017-02-10T23:29:17.009914 Label: 9, Prediction: 9
2017-02-10T23:29:17.009951 Label: 5, Prediction: 6
2017-02-10T23:29:17.009989 Label: 9, Prediction: 9
2017-02-10T23:29:17.010026 Label: 0, Prediction: 0

Interactive Learning with Jupyter Notebook

Install additional software required by Jupyter Notebooks.

sudo pip install jupyter jupyter[notebook]

Launch IPython notebook on Master node.

pushd ${TFoS_HOME}/examples/mnist
PYSPARK_DRIVER_PYTHON="jupyter" \
PYSPARK_DRIVER_PYTHON_OPTS="notebook --no-browser --ip=`hostname`" \
pyspark  --master ${MASTER} \
--conf spark.cores.max=${TOTAL_CORES} \
--conf spark.task.cpus=${CORES_PER_WORKER} \
--py-files ${TFoS_HOME}/examples/mnist/spark/mnist_dist.py \
--conf spark.executorEnv.JAVA_HOME="$JAVA_HOME"

Launch a browser to access notebook

Shutdown Spark cluster

${SPARK_HOME}/sbin/stop-slave.sh; ${SPARK_HOME}/sbin/stop-master.sh
Clone this wiki locally