Skip to content

HassanRady/stream_eda_text_analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Streaming Event Driven Microservice Architecture

Demo:

IMAGE ALT TEXT

What is it?

It is a real-time subreddit text analysis dashboard.

Architecture

Implemented Event Driven Microservice Architecture to handle the streaming of subreddit's data ingested by Kafka, then to Spark to be processed, then stored in Cassandra as the batch storage, and to Redis as the speed layer to be analyzed in Dash. Each component is its own microservice.

Idea

To be able to keep up with trending hashtags and topics, a dashboard is used to get keywords, entities, subreddit' sentiment, subreddit' emotions, and frequent words from a given hashtag/topic.

Microservices

  • SparkStream is a python package (SparkStream-pypi). A simple spark streaming handler; it listens to a kafka topic, process the data, and store it into cassandra and redis. Accessible via an API and deployed in a docker container. SparkStream-github

  • Named-Entity-Recognition is a service for extracting NERs from text by spacy. Accessible via an API and deployed in a docker container. NER-github

  • Keyword-Extraction is a service for extracting keywords from text by yake. Accessible via an API and deployed in a docker container. Keyword-github

  • Sentiment-Model is a service for predicting tweet's sentiment. Developed by tensorflow extended and deployed with tensorflow-serving. Sentiment-github

  • Emotion-Model is a service for predicting tweet's emotions. Developed by tensorflow extended and deployed with tensorflow-serving. Emotion-github

  • Dashboard GUI for graphs and text analysis by Dash. Dashboard-github


Technologies:

  • Asyncpraw
  • Apache Kafka
  • Apache Spark
  • Redis
  • Dash
  • TenorFlow extended
  • FastAPI
  • Spacy
  • NLTK
  • Yake
  • Docker

Data:

  • Trending subreddits are from the trend places endpoint of the Praw API.
  • Subreddit's streaming data are from the stream endpoint of the Asyncpraw API.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published