Skip to content

GetStarted_Standalone

leewyang edited this page Aug 28, 2019 · 19 revisions

Running TensorFlowOnSpark on a Spark Standalone cluster (Single Host)

We illustrate how to use TensorFlowOnSpark on a Spark Standalone cluster running on a single machine. While this is not a true distributed cluster, it is useful for small scale development and testing of distributed Spark applications. After your application is working in this environment, it should run in a true distributed Spark cluster with minimal changes. Note that a Spark Standalone cluster running on multiple machines requires a distributed file system that is accessible from each of the executors/workers.

Install Spark

Install Apache Spark per instructions. Make sure that you can successfully run some of the basic examples. Also make sure you set the following environment variables:

export SPARK_HOME=<path to Spark>
export PATH=${SPARK_HOME}/bin:${PATH}

Install TensorFlow and TensorFlowOnSpark

Install TensorFlow per instructions. For example, using the pip install method, you should be able to install TensorFlow and TensorFlowOnSpark as follows:

pip install tensorflow
pip install tensorflowonspark

View the installed packages:

pip list

Launch Spark Standalone cluster

export MASTER=spark://$(hostname):7077
export SPARK_WORKER_INSTANCES=2
export CORES_PER_WORKER=1 
export TOTAL_CORES=$((${CORES_PER_WORKER}*${SPARK_WORKER_INSTANCES})) 
${SPARK_HOME}/sbin/start-master.sh; ${SPARK_HOME}/sbin/start-slave.sh -c $CORES_PER_WORKER -m 3G ${MASTER}

You can browse to the Spark Web UI to view your Spark cluster along with your application logs. In particular, each of the TensorFlow nodes in a TensorFlowOnSpark cluster will be "running" on a Spark executor/worker, so its logs will be available in the stderr logs of its associated executor/worker.

Test Pypark, TensorFlow, and TensorFlowOnSpark

Start a pyspark shell and import tensorflow and tensorflowonspark. If everything is setup correctly, you shouldn't see any errors.

pyspark --master $MASTER
>>> import tensorflow as tf
>>> import tensorflowonspark as tfos
>>> from tensorflowonspark import TFCluster
>>> tf.__version__
>>> tfos.__version__
>>> exit()

Run the MNIST examples

Once your Spark Standalone cluster is setup, you should now be able to run the MNIST examples.

Interactive Learning with Jupyter Notebook

If you'd like to work with Spark (and TensorFlowOnSpark) interactively, you can run Spark inside a Jupyter Notebook using the following instructions:

# Install Jupyter
pip install jupyter

# Launch Jupyter notebook on Spark master node.
pushd ${TFoS_HOME}/examples/mnist
PYSPARK_DRIVER_PYTHON="jupyter" \
PYSPARK_DRIVER_PYTHON_OPTS="notebook" \
pyspark  --master ${MASTER} \
--conf spark.cores.max=${TOTAL_CORES} \
--conf spark.task.cpus=${CORES_PER_WORKER} \
--py-files ${TFoS_HOME}/examples/mnist/spark/mnist_dist.py \
--conf spark.executorEnv.JAVA_HOME="$JAVA_HOME"

This should launch Jupyter in a browser. Open the mnist_spark.ipynb notebook and follow the instructions within.

Shutdown Spark cluster

When you're done with the local Spark Standalone cluster, shut it down as follows:

${SPARK_HOME}/sbin/stop-slave.sh; ${SPARK_HOME}/sbin/stop-master.sh
Clone this wiki locally