This project aims to create an Arabic sentiment analysis system, that takes advantage of the different text representation models like TF-IDF, Bag of words, and Bag of concepts in addition to exploring newer methods such as appraisal theory.
The general architecture of the system is the following:
The code for the system is organized into the following branches:
- AJGT: Code for Arabic sentiment analysis using classical machine learning models built using the AJGT dataset.
- ASTC: Code for Arabic sentiment analysis using classical machine learning models built using the ASTC dataset.
- ASTD: Code for Arabic sentiment analysis using classical machine learning models built using the ASTC dataset.
- LABR: Code for Arabic sentiment analysis using classical machine learning models built using the ASTC dataset.
- DL: Code for Arabic sentiment analysis using deep learning.
- Appraisal: Code for Arabic sentiment analysis using appraisal features.
- Deployment: Deployment of the system using Streamlit.
Each branch contains a details overview of the dataset used, as well as all the performance metrics.
- LABR (Large scale Arabic Book Reviews)
- AJGT (The Arabic Jordanian General Tweets)
- ASTC (Arabic Sentiment Twitter Corpus)
- ASTD (Arabic Sentiment Tweets Dataset)
The system supports the following text representation models:
- Bow (Bag of Words)
- TF-IDF (Term frequency, inverse document frequency)
- LSA (Latent semantic analysis)
- LDA (Latent Dirichlet allocation)
- BoC (Bag of Concepts)
- Appraisal groups
The previous text representation modes are used to create features for the following models:
- Naive bayes
- Logistic regression
- Support Vector Machine
- Random forest
As for deep learning, we opted for the BERT (Bidirectional Encoder Representations) model and its variants.