Assess the difference between two stats methods #10

chrismbryant · 2020-03-23T03:45:54Z

For each Amazon product, there exists a distribution of star ratings (number of ratings per star value). To compute the probability of positive experience (i.e. satisfaction probability) with the product, we'd like to assign 4 and 5 stars the binary label of "positive" and {1, 2, 3} stars the binary label of "not positive". Given this assignment, we assume that each rating can be viewed as an independent Bernoulli trial with fixed (but unknown) probability of success. Taking this assumption, we can use the Beta distribution to compute a confidence interval on our measurement in a straightforward way.

However, it is not feasible to access the full histogram of star ratings for a full page of products immediately on page load (Amazon locks you out if you fetch these results too quickly, and rate limiting results in a poor user experience). Instead, the only variables we have readily accessible to us are the average star rating and number of ratings for each product. Given these two pieces of information, we can obtain an alternative satisfaction probability and confidence interval by linearly scaling the star rating in range [1, 5] to a success probability in range [0, 1], then building a confidence interval off of that proportion in the same way as before.

What are the consequences of choosing the "average star rating" as a proxy for "proportion of 4 or 5 star ratings"? Can the Beta-distribution-derived confidence interval still be trusted?

aeciorc · 2020-04-09T17:47:46Z

@chrismbryant @musicin3d , I just noticed this disclaimer under the reviews:

Could this mean that using the average star rating actually serves us better?

chrismbryant · 2020-04-09T19:43:22Z

Interesting! I had no idea. I think you might be right then that it’s probably better to use the average star rating rather than the raw star distribution.

musicin3d · 2020-04-24T03:46:03Z

I have returned from my sulking. I needed some time to accept that all my work was in vain. ;)

Since we've realized the averages serve us better, are we ready to close this issue or are there questions remaining?

chrismbryant added the question Further information is requested label Mar 23, 2020

musicin3d mentioned this issue Apr 24, 2020

Calculate CI with distributions when scrolled into view #16

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assess the difference between two stats methods #10

Assess the difference between two stats methods #10

chrismbryant commented Mar 23, 2020

aeciorc commented Apr 9, 2020

chrismbryant commented Apr 9, 2020

musicin3d commented Apr 24, 2020

Assess the difference between two stats methods #10

Assess the difference between two stats methods #10

Comments

chrismbryant commented Mar 23, 2020

aeciorc commented Apr 9, 2020

chrismbryant commented Apr 9, 2020

musicin3d commented Apr 24, 2020