Designing Better Restaurant Metrics – Pt 2

Another phenomenon that’s all-too-common on Yelp is the restaurant with a single five star review, and no other data. Was this review written by the owner? Was this experience average, or was this the only customer who had an experience adequate enough to rate it? We just don’t know.

only one yelp review! — What am I supposed to make of this?

Insufficient data

Combining data sources:

One of the easiest solutions to this problem is to combine data sources. Yelp, Tripadvisor, and Google may have small data sources by themselves, but when combined they’d provide reasonable data on 99% of restaurants. How to combine these sources properly is an open question.

Rewarding statistical confidence:

A slightly more complex way to account for insufficient data is to reward the higher statistical confidence that comes with additional reviews. There are a lot of ways to do this, but we’ll look at one little idea.

My first idea for this was some some sort of multiple. For example, multiply each restaurant’s score by (1 + log(numberOfReviews)/20)

So, let’s say we have three restaurants all rated 4.0. Restaurant 1 has 1 review. Restaurant 2 has 40 reviews. Restaurant 3 has 500 reviews.

Restaurant 1 will have a multiple of 1, and an adjusted rating of 4.

Restaurant 2 will have a multiple of 1.08 and a rating of 4.32.

Restaurant 3 will have a multiple of 1.13 and a rating of 4.52

In retrospect I might weaken this a bit, perhaps increasing the divisor from 20 to 30, but I do think there’s real value in being confident about the accuracy of a rating, and that is worth accounting for.