Our project is an extension of https://github.com/Sandipan99/amazonReview
Goal: study gender effect on perceived helpfuness of reviews depending on the gender of authors
- Yelp: https://www.yelp.com/dataset
- Reddit: http://files.pushshift.io/reddit/comments/
- StackExchange: https://data.stackexchange.com/
- Place your dataset under the folder
datasets/DATASETNAME
where DATASETNAME is the folder name you give. - Clean your dataset by one of ipynb files with suffix
Dataset.ipynb
, obtaining datasets for training, test and validation. - Go to the folder
models/HAN
, executingpreprocess.py
to preprocess the datasets - Go to the folder
models/GRU
, executingRNN_model_batch.py
to train the model. After the training is done, runinference.py
to infer the gender labels for the undisclosed dataset. - Open
MajorityVoting.ipynb
, apply majority voting on predicted undisclosed dataset - Open
SentimentReadabilityLengthCal.ipynb
to analyze review's sentiment, length, readability. Four datasets produced, Signaling Man (SM), Signaling Woman (SW), Performing Man (PM), Performing Woman (PW) - Open
NaturalExptCategory(not_pairwise).ipynb
to find matches for each pair group of (SM, SW), (SM, PM), (SW, PW). After matches of each pair group found, analyzing the helpfuness score to get a conclusion.