Project overview

This project aims to create an Arabic sentiment analysis system, that takes advantage of the different text representation models like TF-IDF, Bag of words, and Bag of concepts in addition to exploring newer methods such as appraisal theory.

Architecture

The general architecture of the system is the following:

Repository layout

The code for the system is organized into the following branches:

AJGT: Code for Arabic sentiment analysis using classical machine learning models built using the AJGT dataset.
ASTC: Code for Arabic sentiment analysis using classical machine learning models built using the ASTC dataset.
ASTD: Code for Arabic sentiment analysis using classical machine learning models built using the ASTC dataset.
LABR: Code for Arabic sentiment analysis using classical machine learning models built using the ASTC dataset.
DL: Code for Arabic sentiment analysis using deep learning.
Appraisal: Code for Arabic sentiment analysis using appraisal features.
Deployment: Deployment of the system using Streamlit.

Each branch contains a details overview of the dataset used, as well as all the performance metrics.

Datasets

LABR (Large scale Arabic Book Reviews)
AJGT (The Arabic Jordanian General Tweets)
ASTC (Arabic Sentiment Twitter Corpus)
ASTD (Arabic Sentiment Tweets Dataset)

Text representation models

The system supports the following text representation models:

Bow (Bag of Words)
TF-IDF (Term frequency, inverse document frequency)
LSA (Latent semantic analysis)
LDA (Latent Dirichlet allocation)
BoC (Bag of Concepts)
Appraisal groups

Machine learning models

The previous text representation modes are used to create features for the following models:

Naive bayes
Logistic regression
Support Vector Machine
Random forest

As for deep learning, we opted for the BERT (Bidirectional Encoder Representations) model and its variants.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Project overview

Architecture

Repository layout

Datasets

Text representation models

Machine learning models

Files

README.md

Latest commit

History

README.md

File metadata and controls

Project overview

Architecture

Repository layout

Datasets

Text representation models

Machine learning models