The purpose of this analysis is to analyze Amazon reviews written by members of the paid Amazon Vine program.
The Amazon Vine program is a service that allows manufacturers and publishers to receive reviews for their products. Companies like SellBy pay a small fee to Amazon and provide products to Amazon Vine members, who are then required to publish a review.
I chose the Pet Products reviews dataset out of approximately 50 Amazon datasets. I used PySpark to extract the dataset, transform the data, connect to an Amazon Web Services (AWS) Relational Database Service (RDS) instance, and load the transformed data into pgAdmin (ETL). Then I used Pandas to determine if there is any bias toward favorable reviews from Vine members in this dataset.
I extracted the dataset and transformed it into four DataFrames.
The following image shows the extracted data for the customers_table
:
After renaming the column, the customers_table_df
now matches the schema for pgAdmin:
I uploaded the transformed data into the appropriate tables and ran queries in pgAdmin to confirm that the data was uploaded.
- How many Vine reviews and non-Vine reviews were there?
There were 10,215 Vine reviews and 2,633,399 non-Vine reviews for Pet Products in this dataset.
- How many Vine reviews were 5 stars? How many non-Vine reviews were 5 stars?
Out of all of the 5-star reviews, there were 4,343 Vine reviews and 1,641,210 non-Vine reviews.
- What percentage of Vine reviews were 5 stars? What percentage of non-Vine reviews were 5 stars?
42.52% of Vine reviews were 5 stars and 62.32% of non-Vine reviews were 5 stars.
Is there any positivity bias for reviews in the Vine program?
Since the percentage of 5-star unpaid reviews is higher than the percentage of paid 5-star reviews, there is not a large amount of evidence proving extreme positivity bias. That is not to say that there is no positivity bias.
Additional analyses that I could do with the dataset to support my statement include:
Analyzing only verified purchases using the verified_purchases
column would also help determine the certainty of a positivity bias.
I could also find the percentage of Vine and non-Vine reviews that are 3 and 4 star-reviews to see how those numbers compare.