Spark on Kubernetes - from Zero to Hero

With Hadoop, a lot of companies have succeeded to handle huge amount of data and create added value for their business. With Spark, data analysing becomes easier and faster. Nowadays, more and more enterprises are trying to migrate also on Cloud based services for several reasons which we will not present here.

Based on Cloud, HDFS of Hadoop plays a much less important role in the ecosystem or whole architecture, because data are stored on Cloud provided deep storage services, such as S3 on AWS, ADLS on Azure and cloud storage on GCP.

So people are trying to bring Spark out of the ecosystem of Hadoop by using other resource managers, such as Apache Mesos or Kubernetes.

In this project, we will try to make Spark work on Kubernetes cluster which is highly searched recently.

Doesn't like other blogs or projects on Github who prepare a all-in-one script, I will try to do it manually step by step in order to show you how to realize it and help you to understand how it works !

We will try to work it out by the following chapters:

For each one (in local PC), I will work on both Spark v2.4.5 and Spark v3.0.0 I suppose you understand the basics of Kubernetes, Docker and Spark

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Chapter1		Chapter1
Chapter2		Chapter2
Chapter3		Chapter3
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark on Kubernetes - from Zero to Hero

About

Releases

Packages

Languages

renxunsaky/spark-on-kubernetes

Folders and files

Latest commit

History

Repository files navigation

Spark on Kubernetes - from Zero to Hero

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages