Skip to content

GetStarted_Standalone

leewyang edited this page Oct 11, 2019 · 19 revisions

Running TensorFlowOnSpark on a Spark Standalone cluster (Single Host)

We illustrate how to use TensorFlowOnSpark on a Spark Standalone cluster running on a single machine. While this is not a true distributed cluster, it is useful for small scale development and testing of distributed Spark applications. After your application is working in this environment, it should run in a true distributed Spark cluster with minimal changes. Note that a Spark Standalone cluster running on multiple machines requires a distributed file system that is accessible from each of the executors/workers.

Install Spark

Install Apache Spark per instructions. Make sure that you can successfully run some of the basic examples. Also make sure you set the following environment variables:

export SPARK_HOME=<path to Spark>
export PATH=${SPARK_HOME}/bin:${PATH}

Install TensorFlow and TensorFlowOnSpark

Install TensorFlow per instructions. For example, using the pip install method, you should be able to install TensorFlow and TensorFlowOnSpark as follows:

pip install tensorflow
pip install tensorflowonspark

View the installed packages:

pip list

Launch Spark Standalone cluster

export MASTER=spark://$(hostname):7077
export SPARK_WORKER_INSTANCES=2
export CORES_PER_WORKER=1 
export TOTAL_CORES=$((${CORES_PER_WORKER}*${SPARK_WORKER_INSTANCES})) 
${SPARK_HOME}/sbin/start-master.sh; ${SPARK_HOME}/sbin/start-slave.sh -c $CORES_PER_WORKER -m 3G ${MASTER}

You can browse to the Spark Web UI to view your Spark cluster along with your application logs. In particular, each of the TensorFlow nodes in a TensorFlowOnSpark cluster will be "running" on a Spark executor/worker, so its logs will be available in the stderr logs of its associated executor/worker.

Test Pypark, TensorFlow, and TensorFlowOnSpark

Start a pyspark shell and import tensorflow and tensorflowonspark. If everything is setup correctly, you shouldn't see any errors.

pyspark --master $MASTER
>>> import tensorflow as tf
>>> import tensorflowonspark as tfos
>>> from tensorflowonspark import TFCluster
>>> tf.__version__
>>> tfos.__version__
>>> exit()

Run the MNIST examples

Once your Spark Standalone cluster is setup, you should now be able to run the MNIST examples. Note: if you are using TensorFlow 1.x, please use the examples from the v1.4.4 tag.

Shutdown Spark cluster

When you're done with the local Spark Standalone cluster, shut it down as follows:

${SPARK_HOME}/sbin/stop-slave.sh; ${SPARK_HOME}/sbin/stop-master.sh
Clone this wiki locally