GitHub - cja5553/attention-driven-imitation-in-consumer-reviews: Codes for manuscript titled "Attention-driven imitation in consumer reviews" by Charles Alba, Mikhail Spektor and Lukasz Walasek

By Charles Alba, Drs. Lukasz Walasek, Mikhail Spektor

Cite as: Alba, C., Walasek, L., Spektor M. (2024) "Attention-driven imitation in consumer reviews". Decision (Special Issue on interface between ML, AI and JDM research). doi: 10.1037/dec0000238

Abstract summary:

How do reviewers decide what to write about? How much do reviews written by others influence one’s own contribution. We predicted that reviews will be more semantically similar to the most successful, salient, and readily accessible reviews written by others. To investigate this hypothesis, we extracted over 3 million reviews from a STEAM. We reversed-engineer and traced the reviews that were displayed at the time to each reviewer at the time each review was being composed. Using word embeddings from fast-text, we quantified the cosine similarity between a given review and other reviews that were visible (or not) to a user. We found that reviewers imitate the most helpful reviews written by others, especially those that are visually salient. At the same time, reviewers avoid imitating content of the most recent (and not necessarily highly rated) reviews, even if these reviews are salient at the time when they compose their review. Our findings suggest that the default sorting and display format of reviews on online platforms will have a pronounced effect on the style and content of new reviews.

Code works in the following sequential order:

1. Data Scrapping

In this stage we scrapped the data from STEAM. Data was scrapped and gathered courtesy of STEAM API. Please refer and abide to their terms of use at all times. Refer to their terms of use here: https://steamcommunity.com/dev/apiterms

2. Validation of review sorting algorithms

As mentioned in the abstract, we aimed to trace the reviews that would have been made visible to each gamer when he/she was writing his/her review. We do not know exactly how STEAMs algorithm decides what reviews at displayed to each gamer when they write their reviews, but we can reverse-engineer and hypothesize them. So we tested and validated our reversed-engineered hypothesized algorithm.

3. Data Wrangling

Having tested and validated our hypothesized algorithm, we then trace what each reviewer will have seen when writing his/her own review.

4. Text Mining

Here we perform text pre-processing, and implement fast-text embeddings with cosine-similarity matrices to determine how close each review is to the reviews that the reviewer will have seen.

5. Statistical Analysis

Performed statistical analysis to test our effect sizes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

By Charles Alba, Drs. Lukasz Walasek, Mikhail Spektor

Abstract summary:

Code works in the following sequential order:

1. Data Scrapping

2. Validation of review sorting algorithms

3. Data Wrangling

4. Text Mining

5. Statistical Analysis

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
01_data_scraping.ipynb		01_data_scraping.ipynb
01b_inspecting_distribution_of_games_with_no_reviews_returned.ipynb		01b_inspecting_distribution_of_games_with_no_reviews_returned.ipynb
02_algorith_validation_set01.ipynb		02_algorith_validation_set01.ipynb
02_algorith_validation_set02.ipynb		02_algorith_validation_set02.ipynb
02_algorith_validation_set03.ipynb		02_algorith_validation_set03.ipynb
02_algorith_validation_set04.ipynb		02_algorith_validation_set04.ipynb
03a_data_wrangling_main_bar.ipynb		03a_data_wrangling_main_bar.ipynb
03b_data_wrangling_sidebar_reviews.ipynb		03b_data_wrangling_sidebar_reviews.ipynb
03c_data_wrangling_control_reviews.ipynb		03c_data_wrangling_control_reviews.ipynb
04a_text_mining_preprocessing.ipynb		04a_text_mining_preprocessing.ipynb
04b_text_mining_lematization.ipynb		04b_text_mining_lematization.ipynb
04c_text_mining_fast_text_and_cosine_similarity_embeddings_for_main_and_sidebar.ipynb		04c_text_mining_fast_text_and_cosine_similarity_embeddings_for_main_and_sidebar.ipynb
05_preparing_data_for_analysis.ipynb		05_preparing_data_for_analysis.ipynb
06_summary_statistics.ipynb		06_summary_statistics.ipynb
07_main_analysis.ipynb		07_main_analysis.ipynb
README.md		README.md

cja5553/attention-driven-imitation-in-consumer-reviews

Folders and files

Latest commit

History

Repository files navigation

By Charles Alba, Drs. Lukasz Walasek, Mikhail Spektor

Abstract summary:

Code works in the following sequential order:

1. Data Scrapping

2. Validation of review sorting algorithms

3. Data Wrangling

4. Text Mining

5. Statistical Analysis

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages