Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
UsamaI000 authored Jul 9, 2020
1 parent 58edb3a commit b6eb268
Showing 1 changed file with 84 additions and 2 deletions.
86 changes: 84 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,92 @@ In this project, sentiment analysis is done on Covid19 related tweets from diffe
<b> Link: https://drive.google.com/drive/folders/1dVr4yYlptJefiooO_lyvGzyPSha44QNF?usp=sharing </b>

## Proposed Solution
The first step is to clean up the raw text data. In tweets,several stop words needs to be removed e.g. prepositions,mentions, hashtags, URLs, etc. After a cleanup, the dataneeds to be converted in vector form to feed to a DeepNeural Network. For the word2vec conversion, skip grammodel is used. This model learns the vector representa-tion from the raw data using the similarity between severalwords based in their context. The objective of the Skip-gram model is to learn word representations that are usefulfor predicting the nearby words in a document. Formally, given a sequence of training words/sentence, the objective of the Skip-gram model is to maximize the average log probability.
The first step is to clean up the raw text data. In tweets,several stop words needs to be removed e.g. prepositions,mentions, hashtags, URLs, etc. After a cleanup, the dataneeds to be converted in vector form to feed to a DeepNeural Network. For the word2vec conversion, skip grammodel is used. This model learns the vector representation from the raw data using the similarity between severalwords based in their context. The objective of the Skip-gram model is to learn word representations that are usefulfor predicting the nearby words in a document. Formally, given a sequence of training words/sentence, the objective of the Skip-gram model is to maximize the average log probability.

<p align="center">
<img src="https://github.com/UsamaI000/G2H_Project_DLSpring2020/blob/master/images/w2v.png">
</p>

After the conversion of words to usable representation,the next step is to feed it to a classifier. RNN and LSTM are commonly used to extract the global information fromthe data. RCNN on the other hand, maintains the local in-formation which signifies the prominent features within thelimited context of the document. In this way, an overallresponse can be pooled at the end which can better helpduring the classifications. In this model, we use a recurrentarchitecture, which is a bidirectional recurrent network, tocapture the contexts. The recurrent structure can obtain allclin a forward scan of the text andcrin a backward scan of the text. After we obtain the representation of the word,
After the conversion of words to usable representation,the next step is to feed it to a classifier. RNN and LSTM are commonly used to extract the global information fromthe data. RCNN on the other hand, maintains the local in-formation which signifies the prominent features within thelimited context of the document. In this way, an overall response can be pooled at the end which can better helpduring the classifications. In this model, we use a recurrent architecture, which is a bidirectional recurrent network, to capture the contexts. The recurrent structure can obtain all context in a forward scan of the text and context in a backward scan of the text. After we obtain the representation of the word, we pass it to the Max-pool layer which gets the most dominant features which are then passed to the FC layer to get classified.

<p align="center">
<img src="https://github.com/UsamaI000/G2H_Project_DLSpring2020/blob/master/images/architecture.jpeg">
</p>

## Training Setup
We used this RCNN model to train it on the Covid-19 UCD dataset which had five emotion classes i.e. anger, fear, sadness, confident and analytical. We performed a total of 5 experiments. Initial two experiments were to make comparison of LSTM and RCNN on a twitter sentiment dataset i.e. Sentiment140. Other experiments were done of Covid-19 UCD data with two focal losses which are Cross Entropy and Focal Loss. The last experiment is done using Weighted Cross Entropy to handle dataset imbalance.

Following configurations were used for final model training.
- Batch Size: 64
- Embedding Dimension: 300
- Embedding Layers: 3
- Learning rate: 0.005
- Optimizer: SGD
- Loss: Weighted Cross Entropy

### Experiments
- Experiment 1: Sentiment140 Dataset Performed on LSTM and RCNN
- With stop words
- Without stop words
- Experiment 2: Covid UCD Challenge Performed on LSTM and RCNN
- With stop words
- Without stop words
- Experiment 3: Training model on Covid-19 UCD data using Focal Loss
- Experiment 4: Training model on Covid-19 UCD data using Weighted Cross Entropy and Focal Loss to handle imbalanced data.
- With stop words
- Without stop words
- Experiment 5: Training on Best performing model.

## Results
<p align="center">
<img src="https://github.com/UsamaI000/G2H_Project_DLSpring2020/blob/master/images/Capture.PNG">
</p>

## Analysis

### Date wise trend
We analyzed the predicted tweets data to get information on how people felt (anger, sadness, fear etc) in different countries during Covid-19. The tweets were gathered from Fabruary to July. Below are the figures that shows the trend.

<p align="center"> <img src="https://github.com/UsamaI000/G2H_Project_DLSpring2020/blob/master/images/datewise_country_emotion_Pakistan.png"> </p>


<p align="center"> <img src="https://github.com/UsamaI000/G2H_Project_DLSpring2020/blob/master/images/datewise_country_emotion_Canada.png"> </p>


<p align="center"> <img src="https://github.com/UsamaI000/G2H_Project_DLSpring2020/blob/master/images/datewise_country_emotion_India.png"> </p>


<p align="center"> <img src="https://github.com/UsamaI000/G2H_Project_DLSpring2020/blob/master/images/datewise_country_emotion_Nigeria.png"> </p>


<p align="center"> <img src="https://github.com/UsamaI000/G2H_Project_DLSpring2020/blob/master/images/datewise_country_emotion_United Kingdom.png"> </p>


<p align="center"> <img src="https://github.com/UsamaI000/G2H_Project_DLSpring2020/blob/master/images/datewise_country_emotion_United States.png"> </p>


### Emotion in different countries
Below is the plot that explain the emotion of people in different countries towards Covid-19. The plot shows that the most of the people in different countries were confident during this time of Global pandemic. There was an emotion of fear which kept changing during the timeline.

<p align="center"> <img src="https://github.com/UsamaI000/G2H_Project_DLSpring2020/blob/master/images/country_emotion.png"> </p>


### Deaths in countries
We also performed analysis of deaths per day due to Covid-19 in different countries

<p align="center"> <img src="https://github.com/UsamaI000/G2H_Project_DLSpring2020/blob/master/images/pakistan.PNG"> </p>


<p align="center"> <img src="https://github.com/UsamaI000/G2H_Project_DLSpring2020/blob/master/images/canada.PNG"> </p>


<p align="center"> <img src="https://github.com/UsamaI000/G2H_Project_DLSpring2020/blob/master/images/india.PNG"> </p>


<p align="center"> <img src="https://github.com/UsamaI000/G2H_Project_DLSpring2020/blob/master/images/nigeria.PNG"> </p>


<p align="center"> <img src="https://github.com/UsamaI000/G2H_Project_DLSpring2020/blob/master/images/uk.PNG"> </p>


<p align="center"> <img src="https://github.com/UsamaI000/G2H_Project_DLSpring2020/blob/master/images/us.PNG"> </p>

0 comments on commit b6eb268

Please sign in to comment.