apache_spark_2_mysql

Slightly modified solution by Marco (https://dev.to/mvillarrealb/creating-a-spark-standalone-cluster-with-docker-and-docker-compose-2021-update-6l4)

After you git clone...

Run docker build -t cluster-apache-spark:3.0.2 ., remember to be in the directory where Dockerfile resides. Lets bring the cluster to life with docker-compose up -d and verify http://localhost:9090 works.

Deeper dive into python code

function.py is made out of few functions which are referenced in main(). Data is downloaded from url, transformed in Spark dataframe and written into MySQL. SQL CREATE statement is embedded in prep_table() method.

Run Spark job

Below code will submit the job. Run once logged in into the master node (docker exec -it apache_spark_2_mysql_spark-master_1 bash) Simultaneously, open MySQL instance for querying data (docker exec -it apache_spark_2_mysql_demo-database_1 bash / mysql -uiamuser -piampass)

/opt/spark/bin/spark-submit --master spark://spark-master:7077 \ --jars /opt/spark-apps/mysql-connector-java-8.0.28.jar \ --driver-memory 1G \ --executor-memory 1G \ /opt/spark-apps/function.py

or

/opt/spark/bin/spark-submit --master spark://spark-master:7077 --jars /opt/spark-apps/mysql-connector-java-8.0.28.jar --driver-memory 1G --executor-memory 1G /opt/spark-apps/function.py

http://localhost:9090 will also list the job in Running Application section.

Verify data

Simple select * from titanic_db.titanic_stats; will retreive the data.

Once finished wrap up with:

docker container stop $(docker container ls --all --quiet --filter name=apache_spark_2_mysql-)

docker system prune --all --force --volumes

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
apps		apps
data		data
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
start-spark.sh		start-spark.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

apache_spark_2_mysql

After you git clone...

Deeper dive into python code

Run Spark job

Verify data

About

Releases

Packages

Languages

marcin2x4/apache_spark_2_mysql

Folders and files

Latest commit

History

Repository files navigation

apache_spark_2_mysql

After you git clone...

Deeper dive into python code

Run Spark job

Verify data

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages