sentiment-stream

An end-to-end real-time data streaming pipeline that leverages Kafka and Spark Streaming to analyze social media sentiment trends.

Project still in progress...

Architecture

Data Sources:
- Twitter and Reddit are the data sources. But more can be added.
Stream Data Ingestion:
- Apache Kafka is used to handle incoming data streams. Each source gets sent to its respective topic (twitter-topic and reddit-topic).
- Apache ZooKeeper manages Kafka brokers.
Stream Processing:
- Apache Spark processes the streaming data using Spark Structured Streaming.
- Processed data is stored in:
  - Object Storage for long-term persistence.
  - Redis for short-term persistence.
Real-Time Dashboard:
- Redis serves as a short-term cache for fast data retrieval.
- Flask handles backend operations for the dashboard.
- Frontend is in HTML, CSS, and JavaScript.
End Users:
- Users access the dashboard via a web interface served by Nginx.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
data		data
flask-dashboard		flask-dashboard
images		images
jobs		jobs
.gitignore		.gitignore
.template.env		.template.env
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt
submit_reddit.bat		submit_reddit.bat
submit_twitter.bat		submit_twitter.bat