An Artificial Intelligence (AI) project for course CS5100 at Northeastern University
In this project we developed machine learning models that use movie reviews by users to classify the sentiment of reviews.
- Extract
dataset/raw_reviews.zip
anddataset/dataset.zip
inmain
directory. - Execute
python NBtrain.py '../main/train'
and thenpython NBtest.py '../main/test'
for the main implementation - Execute
python NaiveBayes_bigrams.py' and
python NaiveBayes_TFIDF.py' respectively
- Execute
python review_polarity.py
andpython review_polarity.py
We have extracted our custom datasets by implementing the DFS crawler. Refer dataset_generation/
for the code and dataset/
for the extracted dataset.
n = 12000 | Predicted: Positive | Predicted: Negative |
---|---|---|
Actual: Positive | 4803 | 5348 |
Actual: Negative | 1197 | 652 |
True Positives: 10,151 |
F1 Score: 0.8242 | Accuracy: 84.59%
Additional Datasets compatible with the project: Large Movie Review Dataset