twitter-sentiment-analysis

Sentiment classifier for twitter, with predictions of positive, negative, or neutral.

Overview

Training data: 45,101 tweets (tab-separated as tweetID \t sentiment \t text)

Test data: 3,531 tweets (same format)

Best classifier: Logistic regression with word2vec embeddings and TF-IDF unigrams as features.

Corresponding result: macro-averaged F1-score of 0.649.

Note that the report in the pdf document was written before this result was observed.

Important points for marker

I use word2vec Twitter embeddings, pretrained on 400m tweets, which can be found here: https://www.fredericgodin.com/software/. The code written handles the word2vec embeddings trained on the GoogleNews dataset.

Usage instructions

In this repo, download the pre-trained word embeddings using the link above.
Run classification.py. This will run the best classifier for this dataset. Running this code will run every classifier in the list on line 50. For feature choices, these have to entered as boolean arguments in feature_pipeline().

The default feature arguments in feature_pipeline() are:

word2vec = False
lexicons = False
unigrams = False
bigrams = False
tfidf_unigrams = False
tfidf_bigrams = False

Files in repo

Here are some descriptions of the other files and folders in the submission:

feature_generation.py -- this includes the sklearn TransformerMixin classes for adding features. It also includes the feature_pipeline() function, which takes as inputs the Twitter data and the desired features, and returns a transformed array of features for classification.
preprocess.py -- this includes all of the preprocessing methods.
evaluation.py -- computes the macro-averaged F1-score for given test set results.

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
data		data
pickled_data		pickled_data
README.md		README.md
Twitter_Sentiment_Analysis_Report.pdf		Twitter_Sentiment_Analysis_Report.pdf
classification.py		classification.py
evaluation.py		evaluation.py
feature_generation.py		feature_generation.py
preprocess.py		preprocess.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

twitter-sentiment-analysis

Overview

Important points for marker

Usage instructions

Files in repo

About

Releases

Packages

Languages

o-P-o/twitter-sentiment-analysis

Folders and files

Latest commit

History

Repository files navigation

twitter-sentiment-analysis

Overview

Important points for marker

Usage instructions

Files in repo

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages