This project is based on data from the Rotten Tomatoes movie review dataset. Our goal is to model the relationship between phrases in movie reviews and the sentiment score. We tested different supervised and unsupervised methods to determine what strategy produces the most accurate model.
The data is a mixture of labeled and unlabeled phrases from movie reviews. Each phrase has a sentiment [0,1,2,3,4] or is unlabeled [-100].
- Logistic Regression
- K Nearest Neighbor (KNN)
- Gaussian Mixture (GMM)
- K-means Clustering
To build and test each model, download all dependencies and execute main.py