Improving the Yelp Review Experience by Stardardizing Reviewer Sentiment

Team:

Angela Detweiler
Hee Kang
Alexander Lam
Behesteh Mostaghni

Dataset link: Yelp Dataset in Kaggle with a focus on Restaurants- https://www.kaggle.com/yelp-dataset/yelp-dataset

Problem: When you are researching restaurants on Yelp, do you look at the star rating or do you read the review? Do you look at both? Given that reviews are highly subjective, and star ratings can be influenced by various aspects of business performance, can we use machine learning to standardize the interpretation of reviews?

Goal: Our goal is to apply Natural Language Processing (NLP) and other features from the Yelp reviews into a model that outputs a new 5-star-rating, so that there is less discrepancy between reviews and star ratings. In order to make our model more robust, we will also incorporate new user star-ratings based on reviews read (meaning that someone who did not write the review gives a star-rating based on the review text alone) into our model so that it better reflects the review sentiment.

Hypothesis: We hypothesize that automating star ratings based on NLP of restaurant reviews will improve Yelp review experience by normalizing reviewer sentiment.

ML algorithms:

Naive Bayes
k-NN
K-Means
LSTM
N-Gram
TD-IDF
Linear Regression

Libraries:

Numpy
Scipy
Scikit_Learn
Pandas
Matplotlib
NLTK
PySpark
Keras
HTML/ CSS/ Bootstrap
Tableau

Sentiment Analysis Lexicon:

AFINN
VADER

Project components, steps, analyses, and final products:

Components and final products
- ML algorithms
- Game (user rates reviews)/HTML page
- Database with game data to be reincorporated into model
- Model output/vizualizations in JN
Steps and analyses
- Select and clean restaurant/food category data from Yelp
- Cluster reviews into 5 categories (5 star-rating)
- Use NLP to train model
- Test Yelp rating/review data (user inputs both)
- Incorporate new user star-rating from game into the model
- Other...

Questions/Topics of Interest:

(ML) Are yelp reviews highly correlated to restaurant quality (based on star rating) ? In other words, are the reviews useful?
What percentage of reviews talk about the quality of the food versus the quality of the service?
Correlate photo captions to reviews.
(ML) Is there consistency in review style for a particular user?
Distribution of ratings (stars)- Is it a bell curve or does it peak at both extremes (1 and/or 5 star ratings)?
(ML) Is there a pattern to Yelp Elite status? Elite vs non-elite.
Patterns in ratings/review sentiment correlated to business attributes? (Outdoor seating, live music, etc.)
Patterns in 'useful' reviews?
Use NLP to train model, test then have HUMANS rate as well and compare the difference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Improving the Yelp Review Experience by Stardardizing Reviewer Sentiment

Team:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Improving the Yelp Review Experience by Stardardizing Reviewer Sentiment

Team: