Unimportant Thoughts | Random musings from Michael Sitver

Designing Better Restaurant Metrics – Pt 3

Posted on January 19, 2017 by Michael Sitver

So far on my quest to create better restaurant metrics I’ve tackled two issues: better weighting of reviews by star-count, and accounting for confidence level of any rating. There’s a bigger issue than that though: Who’s reviews should we trust, and how much should we trust them (i.e weight them)?

This is a really confusing question, because the answer varies from person to person and region to region. As I’ve learned after more than a year in the midwest, midwesterners and northeasterners have different tastes. Even within this region, there’s incredible variation. Taste buds vary immensely between Hyde Park, Peoria, and ‘the loop’. On top of that there’s a lot of review fraud, with restaurant-owners buying good reviews for their establishment, and bad reviews for their competitors.

At first I thought of using a Fletsch-Kincaid grade-level algorithm to give more weight to better-written reviews. That seems pretty elitist though, and I’m not sure there’s any real correlation past a certain point between review quality and what grade-level it’s written at.

Lately I’ve come to the conclusion that this should probably be individualized, and I’ve been desperately trying to figure out how this could be done. I’m still far from a solution, but I have some ideas.

I read a lot in recent weeks about Bayes filters (for email spam). These comb datasets of spam and non-spam emails to develop individual probabilities that individual words and phrases are in a spam email. I feel like this could be applicable in our situation too, and for more than just fraud detection.

What I’m envisioning is this:

Users input seven restaurants they love, and seven they hate.
Our code scrapes the reviews for these restaurants, and divides them into four datasets:
- 1-3 star reviews at places the user loves
- 4-5 star reviews at places the user hates.
- 1-3 star reviews at places the user hates.
- 4-5 star reviews at places the user loves.
Using the scraped reviews, we can create individual probabilities that certain words would be used in reviews the user would agree or disagree with. With these probabilities, we could infer for any restaurant (even outside our dataset) whether our user is more or less likely to like it than the average reviewer.
We can weight individual reviews based on the likelihood that they have similar tastes to our user, and create individual probabilities that our user will like a certain restaurant.

A lot has been done using Bayes to filter out spam. I think it can do more than that though.

I still have a lot to learn on different filter types, but this seems like a really interesting way to use data to personalize restaurant reviews.

Designing Better Restaurant Metrics – Pt 2

Posted on January 19, 2017 by Michael Sitver

Another phenomenon that’s all-too-common on Yelp is the restaurant with a single five star review, and no other data. Was this review written by the owner? Was this experience average, or was this the only customer who had an experience adequate enough to rate it? We just don’t know.

only one yelp review! — What am I supposed to make of this?

Insufficient data

Combining data sources:

One of the easiest solutions to this problem is to combine data sources. Yelp, Tripadvisor, and Google may have small data sources by themselves, but when combined they’d provide reasonable data on 99% of restaurants. How to combine these sources properly is an open question.

Rewarding statistical confidence:

A slightly more complex way to account for insufficient data is to reward the higher statistical confidence that comes with additional reviews. There are a lot of ways to do this, but we’ll look at one little idea.

My first idea for this was some some sort of multiple. For example, multiply each restaurant’s score by (1 + log(numberOfReviews)/20)

So, let’s say we have three restaurants all rated 4.0. Restaurant 1 has 1 review. Restaurant 2 has 40 reviews. Restaurant 3 has 500 reviews.

Restaurant 1 will have a multiple of 1, and an adjusted rating of 4.

Restaurant 2 will have a multiple of 1.08 and a rating of 4.32.

Restaurant 3 will have a multiple of 1.13 and a rating of 4.52

In retrospect I might weaken this a bit, perhaps increasing the divisor from 20 to 30, but I do think there’s real value in being confident about the accuracy of a rating, and that is worth accounting for.

Designing Better Restaurant Metrics – Pt I

Posted on January 18, 2017January 18, 2017 by Michael Sitver

Let’s admit something: Star rating aren’t that helpful. We’ve all eaten at that terrible restaurant with four stars on Yelp and that delicious corner joint with two. Lately I’ve been very curious about more effective metrics, and I’ve been testing a few different ideas.Today is part one of a series where I’ll explore the potential of some of these ideas.

The end goal of this series is to define a system for finding restaurants where I’m most likely to have an enjoyable or very enjoyable meal in any given area.

When it comes to designing measurements, you always have to make choices, and I’ve decided that consistency should matter.

Consider the following two restaurants.

Restaurant 1 has two chefs who alternate days. Chef 1 is terrible. Expect food poison. Chef 2 is amazing. Unsurprisingly this restaurant’s 10 Yelp reviews are evenly divided between 1 and 5 star reviews.

Restaurant 2 has only one chef. He’s entirely adequate, and consistently so. His food will never wow you, but it will never make you sick. Every review of his restaurant gives it exactly 3 stars.

You have the option to go to either restaurant one night, but the only data you have availbale is their Yelp rating. Both restaurants are rated at a 3. (3*10 and [1*5]+[5*5]). You choose the restaurant where the top reviews read “amazing” and wind up with food poisoning.

Scenarios like this actually happen (Ok, Maybe a little less extreme than this).

So, we’re going to account for it with some middle-school-style math inspired by Reddit. Reddit uses a logarithmic voting scale to minimize the impact of outsized numbers on the system.

Rather than assign star ratings based on a simple average of star reviews, we’re going to assign each review the value of LOG(StarNumber + 1) (I added the constant because even 1-star reviews can have redeeming qualities). For now we’ll take the average of these figures and multiply by five.

Under this new rating system, it’s pretty easily apparent which restaurant you should choose. Restaurant 1 comes in with a score of 2.698. Restaurant 2 comes in with a score of 3.01.

Let’s say chef 1’s slightly-less-terrible sister works as the chef 2/3 of the time at another restaurant, restaurant 3. Chef 2’s also-amazing sister works the other 1/3 of the time, and their nine Yelp ratings include 6 two-star reviews and 3 five-star reviews.

Under our system, restaurant 3 achieves score of 2.887, reflecting the slightly lower risk of getting poisoned compared to restaurant 1.

It’s not a perfect system, but it improves on the simple average by a lot (assuming that it’s better not to be poisoned 50% of the time).

graphs of logarithmic restaurant rank

Building Pastabot #1: The idea

Posted on December 26, 2016 by Michael Sitver

What if you could have fresh pasta, cooked and ready, with the push of a button?

I learned the other day that Pancakebot is a real device. It prints and cooks custom pancakes from your photos . It’s on sale for $299, which is pricey, but a pretty amazing price for what it does. Today on the beach, in my usual hungry daydreaming, I wondered how difficult it would be to build the same thing for fresh pasta. Below is my short conceptualization of what it would actually take to build something for that.

Concept: Robot that makes and cooks fresh strand pasta on command.

Budget: $300.

Steps:

Robot takes ingredients (water, flour, eggs?, salt?) and mixes them into a dough.
Robot rolls dough out into a thin sheet.
Robot cuts dough into strands of a specified width.
Robot pushes strands into pot of boiling water and cooks to al-dente.
Robot removes and drains pasta.

For the purposes of this bot, we’ll confine the robot to a single shape of pasta, and ignore saucing and anything after the cooking of the pasta.

Let’s work this up and see how we might go about completing each of these five steps.

Trigger and centralized control: We could probably start the process using an API hooked up to an arduino, which would manage the whole process.

Cost: An off-brand Arduino and accompanying electronics might run us $40-50. We’ll budget $50.

1. Ingredient Mixing: The easiest way to build this would have the human pre-portion the incredients. If that happens, then we could probably do the mixing using a $25-30 simple mixer on Amazon. We could control this step using either an Arduino with some hardware modifications to the mixer, or some sort of smart-home electrical outlet (Maybe another $20).

Cost: Mixer plus smart-home or Arduino control. We’ll budget $50.

2. Rolling the dough: Traditional mechanical pasta-rollers can be had on Amazon for around $18. Automating this should be a matter of hooking some type of motor up to the crank on it.

Cost: Let’s budget $40 for the pasta roller plus whatever motor units we need.

3. Cutting the dough: My intuition on this is to use something like a bench knife ($5), hooked up to a servo motor (< $10).

Cost: Knife and motor should cost around $15.

4. Cooking the pasta: This is one of the more complex parts. For heating the water, I think some sort of hot plate should suffice. The problem is, very hot elements are not something I want to mess around with. This is the one area where I’d feel uncomfortable digging into the electrical innards to make it work programatically.

Cost: Not sure. Hot plates are available at just about any price-point, but for safety and heat-control I’d probably go for something in the $30 range. I’d also need a pot and a collander. This $16 collander looks promising and there are plenty of nice pots for $20 or less. Let’s budget $70 for this step.

5. Getting the pasta in and out of the pot: For getting the pasta in and out of the water, I’d use some sort of rig tied to the collander. The pasta cooks in a pot in the collander, and when it’s near done, the rig lifts the strainer out of the pot and into a ceramic bowl.

Cost: I’m not quite sure how I’d do this, but I think $30 should be plenty for this step.

We’ll also have to figure out some way of moving the dough from the mixer to the pasta-maker, and from the pasta-maker to the pot.

So, where does this leave us? Most of these ideas are totally doable. Getting the cooking time and temperature right, and getting the steps to work together would take some time and error. The only big challenge is finding a safe way to heat the pasta on command.

In this simple brainstorm I budgeted about $255 for the whole shebang. If we can figure out those safety issues, this should be buildable.

“Weird Bumps” and startup growth

Posted on December 15, 2016December 15, 2016 by Michael Sitver

There are a lot of hard things about starting media businesses. The one hardest thing I keep bumping into though is what I’m starting to call the “weird bump”. This is a weird, short-lived and unexplained growth spurt I’ve seen a lot of my sites/apps/bots experience.

About a week after launching Morning Short, in about a day, the list grew from 1,200 to 7,000. I did nothing to cause this. At the time, I had my system set up to send me every new user, so I woke up very confused. Even more confusing, all the new users were french.

For about two days I wondered whether these users were fake, but they were indeed real, and really interested in what I had to offer. Maddiningly though, I had no idea where they came from. I couldn’t find the source in France that had sent me thousands of French men and women looking to learn English.

Besides the challenge of attribution, sources like these often change the dynamic of your subscribe/customer base. I went from a list of mainly students and engineers, to a list of people from all over the world, with different goals. Some wanted to learn English, while others were fluent or native and just wanted a good, challenging story. This difference of goals led to months of confusion for me, as I tried to figure out how to please both groups.

I got another “weird bump” today in one of my latest experiments, Fiction Pal. Fiction Pal is a facebook bot for helping people read more. I started the day with a little over 100 users. Throughout the day, my phone kept buzzing with new signups. It took me until about 9PM tonight to realize that perhaps something unusual might be happening. I peeked at my stats today to notice that my active userbase had grown by 2-300 users in one day. This time they’re from largely the Phillipines and India. As of yet, I have no idea who sent them to my bot, or for what purpose.

A weird bump in action — This is one day when I experienced a typical “weird bump”

This is actually not the first “weird bump” I’ve seen from The Phillipines. My podcast, Morning Short, has seen a “weird bump” in traffic to one particular Youtube video from the country. There’s always the chance that both the weird bumps are fake, but they appear to be real. Perhaps the biggest reading startup out there, Wattpad, is known for having a disproportionately large base of users in the Phillipines. They’re so popular that many of the country’s films are based on stories created on their platform.

Still, if this growth continues, I’ll have to really rethink my plans for Fiction Pal. I was designing it as a platform for US readers, people I know. Designing for users you don’t know, and a country you don’t know, would seem to be a much harder task.

As challenging as “weird bumps” are though, they are a blessing, particularly when they don’t fade away. They are the invisible hand of the free market telling you “you’re targeting the wrong customer, but we’ve found an ideal customer for you”. They are a sign that you’re building a product that someone loves, even if that someone is not who you were originally planning for.

Kanye West, Art, and Peer Pressure

Posted on December 15, 2016December 15, 2016 by Michael Sitver

Throughout history artists have been rebels. They’ve fought for justice and exposed tyranny. Modern music is not wanting for activists. Beyonce caused a rucus this year for turning one of her acts into a politicized act. John Legend and Common have been active in calling for criminal justice reform, and Chance the Rapper has taken to the streets of Chicago to fight gun violence.

As good as it is that art has not lost its activism, there’s a bit of a mob mentality to it. At the top, hollywood and the recording industry is a small world, and there’s significant peer pressure to stay in line.

Causes gradually migrate to the status of “hollywood-approved”, and if they haven’t achieved that status yet, any activism towards them is often looked upon with scorn. Artists can lose deals. Other artists won’t record with them. There’s a modern-day Mccarthyesque (but mob-imposed) black list.

That’s why what Kanye West has done in these last few months is really bold. He is the first prominent rapper I’ve seen meet with President-elect Trump. He was candid about their differences, and the common ground they shared. His own wife endorsed Hillary Clinton, but, unlike the rest of Hollywood, he’s willing to work with whoever is in power to bring the issues he cares about to the forefront.

I feel it is important to have a direct line of communication with our future President if we truly want change.

— KANYE WEST (@kanyewest) December 13, 2016

That’s real activism. Activism is not crying about the result of an election long-over. It’s working within the system to bring about the changes you wish to see. It’s very early in the new administration, but I really hope to see Kanye continue his candid fight (and I hope he stays healthy too – It’s good to see him back out and about).

These issues included bullying, supporting teachers, modernizing curriculums, and violence in Chicago.

— KANYE WEST (@kanyewest) December 13, 2016

The Red Flag I Missed at a Tim Kaine Rally

Posted on December 13, 2016 by Michael Sitver

Ever since November, I’ve tried to figure out what signs I missed of the Trump victory. I was sitting on the couch, watching Showtime’s The Circus tonight when it hit me. The sign I missed. The one thing that should have told me “This media narrative of inevitability may not be quite right”.

It was late in the summer of 2016. The conventions were over, and Tim Kaine was holding an event for Hillary Clinton in Milwaukee, Wisconsin, the supposed epicenter of her midwestern support. I had snagged myself a press pass, and taken the train in from Chicago that morning.

There was nothing really wrong with the rally. It was well-organized. It was staged nicely. There was a crowd. When I think back to every political rally I’d ever been to though, I realize now it was missing one thing: a line.

This was the man who would supposedly be our vice president in three months (or at least vice president-elect), but the small space they’d set aside for the crowd was not even close to full.

At a small townhall to see Donald Trump months earlier, the line had stretched out the building and through the quadrangle of the campus we were on. At the two Bernie Sanders events I attended, the line stretches for what seemed like miles. I was told in the latter of the two that between six and twelve thousand people were online for a venue that seated less than a tenth of that.

In the dead of winter, I’d seen Bill Clinton and Caroline Kennedy draw crowds of hundreds that were willing to wait hours in the cold to see them. Yet, in a supposed Democratic stronghold on a beautiful day in July or August, one of the two principle candidates of the ticket couldn’t even draw two hundred.

I don’t want to insinuate that the size of the crowd of the VP candidate months before the election is the best indicator of eventual electoral performance, but I do feel that this was a warning sign I shouldn’t have missed.

Just Pushing Publish

Posted on December 12, 2016 by Michael Sitver

I’ve been having a really difficult time pushing publish lately. I’ve developed a chronic case of perfectionism (along with my acute sinus infection).

I’ve left a lot of articles unfinished over the past few months, and I’ve spent just about an entire quarter on one project, without releasing it. That’s really unlike me.

Thankfully I came across a really good saying today, and it’s already helped me regain some of my publishing willpower. It’s from a letter from Pete Docter, one of the filmmakers at Pixar.

“Our films don’t get finished, they just get released”. Even Pixar suffers from perfectionism, but they’ve learned to let go. I admire that (from my point of view their films are pretty amazing).

In the spirit of Pixar, I pressed publish on that quarter-long project today, Fiction Pal. I’ve come to terms with the fact that it might flop, but I’m just glad that it won’t languish away unreleased any longer. And now, I’ll push publish on this story.

How I Got on Vice for $12

Posted on December 12, 2016December 13, 2016 by Michael Sitver

A few weeks ago I heard an interesting talk from Showtime’s Mark McKinnon. McKinnon is a career political operative-turned documentary filmmaker, and one of the creators of Showtime’s “The Circus”

McKinnon claimed that the key to media coverage, and the key to Trump’s success was a simple storytelling framework.

It went a little like this (I’m paraphrasing):

Identify threat or solution: Terrorism, job losses, crime.

Identify the villian(s): Hillary Clinton, Illegal Immigrants, the media, China, and radical Islam.

Reveal the hero: Donald J. Trump.

Identify the opportunity: Make American Great Again. Bring back jobs. Secure the border.

Identify the victims: Middle-class Americans.

Identify the solution: Build a wall. Make great deals. Drain the swamp.

It seemed like a very simple model, but I was eager to test it out. Luckily, a good opportunity presented itself in early December.

I was studying for an exam, when I noticed an unusual spike in traffic on one of my sites. Further investigation led me to see it was fake traffic, spam by a fame-hungry Russian hacker. He was ruining the data of thousands of websites simply to make himself famous, and making himself an enemy of webmasters everywhere in the process.

I was annoyed, but then I remembered something I’d seen go viral a few times this electoral season:

Someone on the internet would find a domain name of a villain to be unregistered.
They’d register it to something that villain likes least.
As one fellow UChicagoan would put it, hilarity ensues.

I decided to give it a try.

I registered the .com domain for the name of the Russian spammer who’d become the enemy of the web ($12), redirected it to a not-so-nice urban dictionary definition. I then wrote an open letter (on the blog affected) to this spammer, offering to redirect the domain if he stops spamming. Lastly, I emailed a carefully worded story to a Vice reporter who’d previously written about the spammer, based on McKinnon’s framework. The whole process took 25-30 minutes.

The framework looked something like this:

Identify threat or solution:Spam, loss of valuable data.

Identify the villian(s): This one Russian spammer.

Reveal the hero:Me.

Identify the opportunity:Stop this spammer, once and for all.

Identify the victims:Website owners, small business owners, etc.

Identify the solution: Use domain arbitrage to pressure the spammer into stopping.

Which led to an email something like this:

Subject line: Shutting down [spammer name]

Hey [reporter],

I think I figured out how to take [spammer name] out of commision. I’m going to hit him where it hurts, the one thing he loves most: his name.

I bought his name’s .com domain, and I will redirect it to [not-so-nice thign] until he agrees to stop spamming people’s analytics. I wrote him an open-letter explaining the trade he can make.

What do you think?

Best,

Michael

And the next morning, my letter and personal website were linked to on Vice.com, and the reporter used precisely the narrative I gave him.

Just to be clear, this is not an article critiquing the media, or saying that this reporter did anything wrong. I was 100% honest with him, and he published a true story that had a pedigree of getting read, which is the incentive that writers work towards these days. I just think it’s worth noting how easy (and cheap) it is to game the system, and how well McKinnon’s framework works in practice.

In retrospect, I wish I’d handled this a bit differently. I don’t love having this article in my name. I think it was a really compelling experiment though.

Here’s a [slightly redacted] excerpt from the article:

As for why [spammer] is pushing analytics spam, part of it is about “personal glory. I like my name—[spammer name] and want that it was known,” he previously told Motherboard.

With that in mind, some victims of the spam want to stop [spammer] for good.

“I think I figured out how to take [spammer name] out of commission. I’m going to hit him where it hurts, the one thing he loves most: his name,” Michael Sitver, a “maker” and political science student at the University of Chicago, told Motherboard in an email.

Sitver has purchased [spammername].com, and in an open letter says he will redirect the domain to wherever Popov wants. As long as Popov agrees to stop spamming Google Analytics, that is.

“I’ll even redirect it to the website of your favourite person, [a certain controversial politician],” Sitver writes.

However, at the time of writing, [spammername].com redirects visitors through to the Urban Dictionary’s page on “[a certain word].”

“someone being arrogant, rude, obnoxious, or just a total [removed]…” the entry reads.

Another Blog?

Posted on December 12, 2016 by Michael Sitver

I started my first blog-like thing around nine years ago. Since then, I’ve started hundreds of little projects making it to various levels of completion. For some reason, I’m starting another one.

I’ve been feeling a bit unproductive lately. I haven’t published as much as I usually do, and with a nasty sinus infection (or something) I’m suffering from at the moment, I haven’t been able to work out or run around. That’s my usual outlet when I feel like I’m not doing enough.

Moreover, a lot of what I publish these days, I publish with an aim. When I write something for Huffington Post, or somewhere serious, I assign it a level of importance which means I have to be wary what I publish, what words I use, and in what order they’re in. When I write there, I write to be read, and to try to have some impact.

I was reminded today (I guess yesterday by this point) of a letter Kurt Vonnegut wrote to a high school class in 2006. “Practice any art, music, singing, dancing, acting, drawing, painting, sculpting, poetry, fiction, essays, reportage, no matter how well or badly, not to get money and fame, but to experience becoming, to find out what’s inside you, to make your soul grow.”

I read those words, and I was motivated once again to put my fingers to the keyboard and words to the page more for myself than for anyone else. I want to feel the flow, and the ease which comes from writing regularly once again, and I want to organize my thoughts more regularly. So, here comes my new blog. We’ll see where it takes me.