🎇 Spark ecosystem docker example

Custom Spark-Kafka Cluster

This project sets up a custom Spark-Kafka cluster using Docker and Docker Compose. It includes Apache Spark, Apache Kafka, PostgreSQL, Hive Metastore, LocalStack for S3, Prometheus, and Grafana.

Project Structure

hive/conf/.hiverc: Hive initialization script to add necessary JAR files.
hive/conf/hive-site.xml: Configuration file for Hive Metastore.
hive/conf/jars/: Directory containing necessary JAR files for Hive.
prometheus/prometheus.yml: Configuration file for Prometheus.
spark/metrics.properties: Configuration file for Spark metrics.
spark/run.sh: Script to start Spark Master, Worker, or Submit jobs.
docker-compose.yml: Docker Compose configuration file to set up the entire cluster.
Dockerfile: Dockerfile to build the Spark cluster image.

Setup Instructions

Prerequisites

Docker
Docker Compose

Building the Docker Image

First, build the Docker image for the Spark cluster:

docker build -t my-spark-cluster:3.5.0 .

Running the Cluster

Start the cluster using Docker Compose:

docker-compose up

This command will start all the services defined in the docker-compose.yml file.

Accessing Services

Spark Master: http://localhost:9090
Spark Worker A: http://localhost:9091
Spark Worker B: http://localhost:9093
Kafka: Accessible on port 9092
S3 (LocalStack): Accessible on port 4566
PostgreSQL: Accessible on port 5432
Hive Metastore: Accessible on port 9083
Spark Thrift Server: Accessible on port 10000
Grafana: http://localhost:3000
Prometheus: http://localhost:19090

Monitoring and Metrics

Prometheus

Prometheus is configured to scrape metrics from the Spark Master, Workers, and Executors.

Grafana

Grafana is set up to visualize the metrics collected by Prometheus. Access it at http://localhost:3000.

Additional Notes

Ensure that the specified volumes and paths exist and are accessible by Docker.
Customize the provided configurations as needed for your specific use case.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
hive/conf		hive/conf
prometheus		prometheus
spark		spark
sqs		sqs
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎇 Spark ecosystem docker example

Project Structure

Setup Instructions

Prerequisites

Building the Docker Image

Running the Cluster

Accessing Services

Monitoring and Metrics

Prometheus

Grafana

Additional Notes

About

Releases

Packages

Languages

CodelyTV/spark-ecosystem_docker-example

Folders and files

Latest commit

History

Repository files navigation

🎇 Spark ecosystem docker example

Project Structure

Setup Instructions

Prerequisites

Building the Docker Image

Running the Cluster

Accessing Services

Monitoring and Metrics

Prometheus

Grafana

Additional Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages