Execution Steps

1. Setting up the Python 3.10 environment: Run pip install -r requirements.txt to install all required packages.

2. Initialization

Download word embedding files and place them in /input_data/glove_embeddings.
Download lexicon files and place them in /lexicon_files/.
Insert the dataset ('sentiment', 'text') data.csv into /datasets/raw. After preprocessing, relocate data-processed.csv to /datasets/processed.

3. Evaluation Process

For machine learning analysis:
- 3.1. Adjust config_ml.json as necessary.
- 3.2. Execute models_evaluation.py.
- 3.3. A folder will appear in /experiment_outputs/exp_ID1.
For lexicon-based analysis:
- 3.4. Adjust config_lex.json accordingly.
- 3.5. Execute lexicon_evaluation_norm.py.

4. Post-processing Steps

To standardize machine learning regression metrics, place the metrics folder in /post_processing/machine_learning_data/metrics_ml and run normalize_continuous_metrics.py.

5. Visualization Execution

For machine learning visualizations:
- Ensure the metrics are located in /post_processing/machine_learning_data/metrics_ml.
- Execute either ML_plots_and_subplots.py or ML_subplots_violin_mean_median.py.
For lexicon visualizations:
- Ensure the metrics are positioned in /post_processing/lexicon_data/metrics_lex.
- Run LEX_plots_and_subplots.py.

End-to-end Sentiment Analysis Process

Complete List of Methods

Lexicon methods

AFINN
WordNet (TextBlob, Pattern, SentiWordNet)
SenticNet
VADER
General Inquirer
MPQA
OpinionFinder

Machine Learning Methods

Traditional methods

Naive Bayes
Support Vector Machines (SVM)
Decision Trees
Random Forest
Logistic Regression / Linear Regression
Gradient Boosting Machines (GBMs)
K-Nearest Neighbours (KNN)
Artificial Neural Networks (ANN)

Deep learning methods

Multilayer Perceptron (MLP)
Convolutional Neural Networks (CNN)
Recurrent Neural Networks (RNN)
Long Short-Term Memory (LSTM)
Gated Recurrent Units (GRU)

Transformer models

BERT
DistilBERT
ELECTRA

Results on the SST dataset

Dataset Information

SST Dataset:

Same textual content
Different text granularity (phrases/sentences)
Different sentiment granularity (2/5 classes)

Source: huggingface

Dataset	Labels	Content	Size/Data points
SST2p	2	Binary	Phrase
SST2	2	Binary	Sentence
SST5	5	Multiclass	Sentence

Classes (Classification Accuracy Score)

ML classification

Dataset	Best Model ML	Avg. ML Models	Best Model DL (MLP)	Avg. DL Models	Best Model TF (ELECTRA)	Avg. TF Models
SST2p	0.91 (Random Forest)	0.85	0.84	0.77	0.93	0.93
SST2	0.78 (Naïve Bayes)	0.72	0.75	0.70	0.85	0.83
SST5	0.40 (SVM)	0.34	0.37	0.35	0.44	0.43

Continuous Values (Normalized Mean Absolute error 0.0 - 1.0)

Lexicon

	AFINN	TextBlob	Pattern	SenticNet	VADER	General Inquirer	MPQA	OpinionFinder	SentiWordNet	Avg.
SST5	0.26	0.25	0.25	0.25	0.25	0.27	0.25	0.26	0.27	0.26
SST2	0.48	0.44	0.44	0.45	0.41	0.50	0.45	0.47	0.48	0.46
SST2p	0.47	0.43	0.43	0.40	0.40	0.50	0.40	0.44	0.47	0.44

ML regression

Traditional Machine Learning

	SVM	Decision Tree	Random Forest	Linear Regression	Gradient Boosting	KNN	ANN	Avg.
SST5	0.22	0.27	0.23	0.37	0.25	0.25	0.31	0.27
SST2	0.36	0.32	0.33	0.55	0.45	0.40	0.46	0.40
SST2p	0.18	0.11	0.14	0.26	0.45	0.19	0.26	0.21

Deep Learning

	MLP	CNN	RNN	LSTM	GRU	Avg.
SST5	0.23	0.25	0.23	0.25	0.26	0.24
SST2	0.36	0.41	0.36	0.41	0.43	0.39
SST2p	0.27	0.36	0.32	0.34	0.31	0.32

Transformers

	bert-base-uncased	distilbert-base-uncased	electra-base-discriminator	Avg.
SST5	0.20	0.20	0.20	0.20
SST2	0.22	0.26	0.22	0.23
SST2p	0.10	0.15	0.10	0.11

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
SentimentAnalysisPackage		SentimentAnalysisPackage
images		images
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Execution Steps

End-to-end Sentiment Analysis Process

Complete List of Methods

Lexicon methods

Machine Learning Methods

Traditional methods

Deep learning methods

Transformer models

Results on the SST dataset

Dataset Information

Classes (Classification Accuracy Score)

Continuous Values (Normalized Mean Absolute error 0.0 - 1.0)

Lexicon

Traditional Machine Learning

Deep Learning

Transformers

About

Releases

Packages

Languages

License

rohlay/sentiment-benchmarking

Folders and files

Latest commit

History

Repository files navigation

Execution Steps

End-to-end Sentiment Analysis Process

Complete List of Methods

Lexicon methods

Machine Learning Methods

Traditional methods

Deep learning methods

Transformer models

Results on the SST dataset

Dataset Information

Classes (Classification Accuracy Score)

Continuous Values (Normalized Mean Absolute error 0.0 - 1.0)

Lexicon

Traditional Machine Learning

Deep Learning

Transformers

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages