Slightly modified solution by Marco (
Run docker build -t cluster-apache-spark:3.0.2 .
, remember to be in the directory where Dockerfile resides.
Lets bring the cluster to life with docker-compose up -d
and verify http://localhost:9090 works.
is made out of few functions which are referenced in main()
. Data is downloaded from url, transformed in Spark dataframe and written into MySQL.
statement is embedded in prep_table()
Below code will submit the job. Run once logged in into the master node (docker exec -it apache_spark_2_mysql_spark-master_1 bash
Simultaneously, open MySQL instance for querying data (docker exec -it apache_spark_2_mysql_demo-database_1 bash
/ mysql -uiamuser -piampass
/opt/spark/bin/spark-submit --master spark://spark-master:7077 \ --jars /opt/spark-apps/mysql-connector-java-8.0.28.jar \ --driver-memory 1G \ --executor-memory 1G \ /opt/spark-apps/
/opt/spark/bin/spark-submit --master spark://spark-master:7077 --jars /opt/spark-apps/mysql-connector-java-8.0.28.jar --driver-memory 1G --executor-memory 1G /opt/spark-apps/
http://localhost:9090 will also list the job in Running Application section.
Simple select * from titanic_db.titanic_stats;
will retreive the data.
Once finished wrap up with:
docker container stop $(docker container ls --all --quiet --filter name=apache_spark_2_mysql-)
docker system prune --all --force --volumes