End-to-End Data Engineering Pipeline

End-to-end data pipeline from data ingestion to processing and storage, using Apache Airflow, Python, Apache Kafka, Zookeeper, Apache Spark, and Cassandra. The setup is containerized with Docker. Automating data ingestion, processing, and storage, enhancing efficiency and scalability. It provides real-time insights with minimal manual intervention

Components

Data Source: Fetches random user data from randomuser.me API.
Apache Airflow: Orchestrates the pipeline and stores data in PostgreSQL.
Apache Kafka and Zookeeper: Streams data from PostgreSQL to Spark.
Control Center and Schema Registry: Manages Kafka stream monitoring and schema.
Apache Spark: Processes the data using master and worker nodes.
Cassandra: Stores the processed data.

Technologies

Apache Airflow

Manages workflow orchestration and task scheduling.
Simplifies complex workflows with dependency management.

Apache Kafka

Streams data between components.
Handles high-throughput, real-time data streaming.

Apache Zookeeper

Manages Kafka brokers.
Ensures distributed coordination and configuration management.

Apache Spark

Processes data in parallel.
Fast data processing across multiple nodes.

PostgreSQL

Stores initial ingested data.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
dags		dags
script		script
README.md		README.md
diag.gif		diag.gif
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
spark_stream.py		spark_stream.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

End-to-End Data Engineering Pipeline

Components

Technologies

Apache Airflow

Apache Kafka

Apache Zookeeper

Apache Spark

PostgreSQL

About

Releases

Packages

Languages

M4hf0d/DE-realtime-data-streaming

Folders and files

Latest commit

History

Repository files navigation

End-to-End Data Engineering Pipeline

Components

Technologies

Apache Airflow

Apache Kafka

Apache Zookeeper

Apache Spark

PostgreSQL

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages