Learning to Cook (a little)

Posted on

I don’t know if I knew how to use the microwave until I was 11 or 12. Sad. Lately, I’ve been into cooking though. I bought a pot, and a pan, and actually use them. (Crazy, right)

I started cooking because the food options around me got a bit tiring, and sometimes I just don’t want to leave my apartment in 0 degree weather. It helps that I live above a target, so when I need a kitchen utensil and ingredients I’m within easy reach.

It turns out that cooking is fun, at least sometimes. It gives me the same sense of pride in building things that I get from programming, but I don’t have to stare at a screen (unless I’m looking up a recipe). It requires attention, but it’s also a bit mindless and free. I can listen to a podcast or an audiobook while I’m stirring a pot of sauce or cooking an onion.

The parts I don’t like are pretty guessable. Cleaning up (and washing dishes) is a pain. Cooking and cleanup require a good deal of time (and preplanning ingredients). Those two things detract from the spontinaety of it.

Still, it’s a really enjoyable use of time so far, and I look forward to exploring it more in 2017.

Here’s some fresh pasta I made from scratch (as in, from the flour to the bowl).

Made fresh pasta today and it was actually pretty good #italian #eeeeeats #pasta

A photo posted by Michael Sitver (@msitver) on

Designing Better Restaurant Metrics – Pt 4

Posted on

Promising results! Yesterday I wrote about an idea I had, using Bayesian filtering (the statistical technique they use to detect spam) to personalize restaurant preferences.

My hypothesis was that reviewers with similar taste preferences to me would share a similar vocabulary that was markedly different from reviewers with opposing views. By dividing reviews into two groups (restaurants I liked and restaurants I didn’t like), I could use statistics to calculate the probability of the wording of a certain review being from a restaurant I liked, rather than one I didn’t. That’s the basics of Bayes.

I ran a small-scale test today, focusing only on diners. I scraped the 20 most recent five-star reviews from six diners I’ve eaten at multiple times, and split each review-set into an array of words. Two of these diners were from my hometown in Connecticut, two are in downtown Chicago, and two are in Hyde Park. I like 3/6 of them.

I used the convenient “bayes” package from NPM to train and run my model. I trained my model on two diners I love and one I hate, and I tested my model on two I hate and one I love. It was 100% effective at judging my preferences for the diners.

Out of curiosity, I also tried applying my model to two non-diner restaurants, but it guessed wrong both times. This result wasn’t entirely surprising. I used two intentionally confusing restaurants. One was mediocre but served very similar menu items to a diner. The other was pretty good, but served very different items.

I think this idea has a lot of promise, but the flawed judgments it made with those two restaurants emphasize the importance of having an adequate dataset on each user’s preferences, including an adequate variety of cuisines. If a user only shares his favorite diners, the model is going to have a very strong preference towards diners.

Designing Better Restaurant Metrics – Pt 3

Posted on

So far on my quest to create better restaurant metrics I’ve tackled two issues: better weighting of reviews by star-count, and accounting for confidence level of any rating. There’s a bigger issue than that though: Who’s reviews should we trust, and how much should we trust them (i.e weight them)?

This is a really confusing question, because the answer varies from person to person and region to region. As I’ve learned after more than a year in the midwest, midwesterners and northeasterners have different tastes. Even within this region, there’s incredible variation. Taste buds vary immensely between Hyde Park, Peoria, and ‘the loop’. On top of that there’s a lot of review fraud, with restaurant-owners buying good reviews for their establishment, and bad reviews for their competitors.

At first I thought of using a Fletsch-Kincaid grade-level algorithm to give more weight to better-written reviews. That seems pretty elitist though, and I’m not sure there’s any real correlation past a certain point between review quality and what grade-level it’s written at.

Lately I’ve come to the conclusion that this should probably be individualized, and I’ve been desperately trying to figure out how this could be done. I’m still far from a solution, but I have some ideas.

I read a lot in recent weeks about Bayes filters (for email spam). These comb datasets of spam and non-spam emails to develop individual probabilities that individual words and phrases are in a spam email. I feel like this could be applicable in our situation too, and for more than just fraud detection.

What I’m envisioning is this:

  • Users input seven restaurants they love, and seven they hate.
  • Our code scrapes the reviews for these restaurants, and divides them into four datasets:
    • 1-3 star reviews at places the user loves
    • 4-5 star reviews at places the user hates.
    • 1-3 star reviews at places the user hates.
    • 4-5 star reviews at places the user loves.
  • Using the scraped reviews, we can create individual probabilities that certain words would be used in reviews the user would agree or disagree with. With these probabilities, we could infer for any restaurant (even outside our dataset) whether our user is more or less likely to like it than the average reviewer.
  • We can weight individual reviews based on the likelihood that they have similar tastes to our user, and create individual probabilities that our user will like a certain restaurant.

A lot has been done using Bayes to filter out spam. I think it can do more than that though.

I still have a lot to learn on different filter types, but this seems like a really interesting way to use data to personalize restaurant reviews.

Designing Better Restaurant Metrics – Pt 2

Posted on

Another phenomenon that’s all-too-common on Yelp is the restaurant with a single five star review, and no other data. Was this review written by the owner? Was this experience average, or was this the only customer who had an experience adequate enough to rate it? We just don’t know.

only one yelp review!
What am I supposed to make of this?

Insufficient data

  • Combining data sources:

One of the easiest solutions to this problem is to combine data sources. Yelp, Tripadvisor, and Google may have small data sources by themselves, but when combined they’d provide reasonable data on 99% of restaurants. How to combine these sources properly is an open question.

  • Rewarding statistical confidence:

A slightly more complex way to account for insufficient data is to reward the higher statistical confidence that comes with additional reviews. There are a lot of ways to do this, but we’ll look at one little idea.

My first idea for this was some some sort of multiple. For example, multiply each restaurant’s score by (1 + log(numberOfReviews)/20)

So, let’s say we have three restaurants all rated 4.0. Restaurant 1 has 1 review. Restaurant 2 has 40 reviews. Restaurant 3 has 500 reviews.

Restaurant 1 will have a multiple of 1, and an adjusted rating of 4.

Restaurant 2 will have a multiple of 1.08 and a rating of 4.32.

Restaurant 3 will have a multiple of 1.13 and a rating of 4.52

In retrospect I might weaken this a bit, perhaps increasing the divisor from 20 to 30, but I do think there’s real value in being confident about the accuracy of a rating, and that is worth accounting for.

Designing Better Restaurant Metrics – Pt I

Posted on

Let’s admit something: Star rating aren’t that helpful. We’ve all eaten at that terrible restaurant with four stars on Yelp and that delicious corner joint with two. Lately I’ve been very curious about more effective metrics, and I’ve been testing a few different ideas.Today is part one of a series where I’ll explore the potential of some of these ideas.

The end goal of this series is to define a system for finding restaurants where I’m most likely to have an enjoyable or very enjoyable meal in any given area.

When it comes to designing measurements, you always have to make choices, and I’ve decided that consistency should matter.

Consider the following two restaurants.

Restaurant 1 has two chefs who alternate days. Chef 1 is terrible. Expect food poison. Chef 2 is amazing. Unsurprisingly this restaurant’s 10 Yelp reviews are evenly divided between 1 and 5 star reviews.

Restaurant 2 has only one chef. He’s entirely adequate, and consistently so. His food will never wow you, but it will never make you sick. Every review of his restaurant gives it exactly 3 stars.

You have the option to go to either restaurant one night, but the only data you have availbale is their Yelp rating. Both restaurants are rated at a 3. (3*10 and [1*5]+[5*5]). You choose the restaurant where the top reviews read “amazing” and wind up with food poisoning.

Scenarios like this actually happen (Ok, Maybe a little less extreme than this).

So, we’re going to account for it with some middle-school-style math inspired by Reddit. Reddit uses a logarithmic voting scale to minimize the impact of outsized numbers on the system.

Rather than assign star ratings based on a simple average of star reviews, we’re going to assign each review the value of LOG(StarNumber + 1) (I added the constant because even 1-star reviews can have redeeming qualities). For now we’ll take the average of these figures and multiply by five.

Under this new rating system, it’s pretty easily apparent which restaurant you should choose. Restaurant 1 comes in with a score of 2.698. Restaurant 2 comes in with a score of 3.01.

Let’s say chef 1’s slightly-less-terrible sister works as the chef 2/3 of the time at another restaurant, restaurant 3. Chef 2’s also-amazing sister works the other 1/3 of the time, and their nine Yelp ratings include 6 two-star reviews and 3 five-star reviews.

Under our system, restaurant 3 achieves score of 2.887, reflecting the slightly lower risk of getting poisoned compared to restaurant 1.

It’s not a perfect system, but it improves on the simple average by a lot (assuming that it’s better not to be poisoned 50% of the time).

graphs of logarithmic restaurant rank