Skip to content

CodelyTV/spark-ecosystem_docker-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Codely logo

🎇 Spark ecosystem docker example

Codely Open Source projects Codely Pro courses

Custom Spark-Kafka Cluster

This project sets up a custom Spark-Kafka cluster using Docker and Docker Compose. It includes Apache Spark, Apache Kafka, PostgreSQL, Hive Metastore, LocalStack for S3, Prometheus, and Grafana.

Project Structure

  • hive/conf/.hiverc: Hive initialization script to add necessary JAR files.
  • hive/conf/hive-site.xml: Configuration file for Hive Metastore.
  • hive/conf/jars/: Directory containing necessary JAR files for Hive.
  • prometheus/prometheus.yml: Configuration file for Prometheus.
  • spark/metrics.properties: Configuration file for Spark metrics.
  • spark/run.sh: Script to start Spark Master, Worker, or Submit jobs.
  • docker-compose.yml: Docker Compose configuration file to set up the entire cluster.
  • Dockerfile: Dockerfile to build the Spark cluster image.

Setup Instructions

Prerequisites

  • Docker
  • Docker Compose

Building the Docker Image

First, build the Docker image for the Spark cluster:

docker build -t my-spark-cluster:3.5.0 .

Running the Cluster

Start the cluster using Docker Compose:

docker-compose up

This command will start all the services defined in the docker-compose.yml file.

Accessing Services

Monitoring and Metrics

Prometheus

Prometheus is configured to scrape metrics from the Spark Master, Workers, and Executors.

Grafana

Grafana is set up to visualize the metrics collected by Prometheus. Access it at http://localhost:3000.

Additional Notes

  • Ensure that the specified volumes and paths exist and are accessible by Docker.
  • Customize the provided configurations as needed for your specific use case.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published