The Brilliance of Text-rank

Posted on

Lately, by coincidance, I’ve run into a lot of really cool algorithms (see my recent series on bayes filtering). I’m not a math or a CS major, but surprisingly often, I wonder how something is done, do a Google search, and come across a solution so simple and intuitive that even I could implement it in an afternoon.

One of these is text-rank. It’s what’s known as a “Graph-based ranking algorithm”, and it’s designed to summarize long texts. It can shorten a 7000-word article to 1300 words and retain most of the most important ideas in the piece. I would have thought algorithms like these to be very complex, but text-rank, and other summarization algorithms, are remarkably simply. Bascially, they break down a text into sentences and paragraphs, compare sentences to see which sentences overlap the most in content with other sentences (making them the most valuable), and then cut out all but the most valuable of sentences.

I don’t know what use I’ll ever make of this, but it’s cool to understand how such a simple little idea can be used so powerfully.

Shutting off Facebook For a Week

Posted on

It’s finals week here at UChicago, and I’m trying something I haven’t really done in years. I’m shutting off Facebook-proper. I uninstalled the Facebook app from my phone, and logged off on my laptop (I’ll still be using FB messenger and events, which are thankfully seperate apps).

Facebook has its merits. It’s an immensely powerful communications platform, and a great way to discover articles. It’s all-consuming though. The news feed just sucks time out of my day, and I can’t say I’m necessarily better for the constant stream of information available to me. Plus, it’s finals week, and I really shouldn’t be spending time on unneccesary communications, so I figured now was a great time to take a break. We’ll see how it goes.

The Journalism Machine

Posted on

As computing gets better and the average quality of professional writing gets worse/more formulaic, it seems inevitable for us to approach a cross-over, where journalism is entirely, or largely computer-generated, and we can’t tell the difference.

In fact, most of what the average millenial spends their time reading has been arranged for them by a computer. Facebook feeds, Instagram feeds, and Google search results may be sharing human-generated content, but at a certain point, which we’re approaching, the editorial voice of the algorithm overwhelms the voice of even the most powerful human influencer.

This scares me a little, but it also intrigues me, which is why I’ve started working on a new side-project I’m calling “Journalism Machine”. Journalism machine takes small snippets of human-generated content and larger templates for modern writing, and attempts to turn it into a successful, automated publication, complete with social media interactions. Social media generates a custom feed for every individual’s tastes. Why can’t we do the same thing with journalism? Why can’t we, in the near future, have articles customized to what we know, what we believe, and what we’d share?

Right now the mechanics of my project are pretty simple:

  • Build a database of content snippets.
  • Build a set of social and article templates.
  • Build a script for turning all of those database snippets into articles, posting them, and sharing them with the right audience.

In the future though, there are some even more interesting experimental features I’m thinking about adding.

First of all, I’m considering ways to use small, cheap human-interface tasks to generate more snippets, and more articles. APIs like Mechanical Turk and Scale would bring the cost of generating sentences down to just a few cents a piece, versus hiring a journalist to write an article for 10x that.

Second, I’m thinking about ways to make it less noticeable when content is re-used from article to article. Search engines don’t like duplicate content, and neither do readers.

Third, I’m thinking about how machine learning techniques could help produce 10x the content for each human-generated sentence. If journalism is dissolved into a series of patterns, what’s to stop me from using a pattern-matching algorithm to generate sentences without a human writer?

I don’t want journalism to dissolve like this. I want human journalists chasing leads, and writing stories for the public, not for one person. This is where the future is going though, so I’m determined to understand it.

10,000 is small.

Posted on

I remember when I launched my first websites. I was nine or ten years old. At the bottom of each site I installed a “view counter”.

Youngsters may not remember these, but they were the Google Analytics of their day. They measured every time a page loaded, and displayed the number for all to see. Every day I would check my counter, and every day it would creep up by four or five views.

When I got into blogging, the number of visitors went from a handful to dozens, and I thought I was at the top of the world. Then, it got into the hundreds, and the thousands. I come from a town of 10,000, so 10,000 always seemed like a big number to me. When one of my blogs hit 10,000, I thought wow, my audience is the size of my entire town.

Little did I know.

Working on the web gives you perspective. I’ve seen traffic sources (cough, cough, Reddit) send 150,000 views in a single day. I’ve seen the miniscule profits that a website serving 20,000 visitors a month can earn, and I’ve come to see even numbers like 1 million as small. The medium-sized publishing companies of 2016 bring in 100 million impressions per month. DeRay McKesson’s twitter account brings in 150 million impressions per month. 1 million people in one place is uncomprehendable in scale. The numbers are so large, and yet the value of all these hundreds of millions of views is so small.

So, I’ve had to reclassify how I define value. I take pride in having helped more than a million people with their tech problems. I take pride in saving people time, and providing millions of good short stories to eager readers. I still don’t know what value is though, what it means on the web, and how to classify each of my efforts. All I know is that 10,000, big enough to fill many arenas, is very small.

The subject line that anyone will open

Posted on

Cold-emailing has been a passion of mine for a long while. I’ve gotten pretty good at getting responses from everyone imaginable, from nobel laureates, to big tech CEOs and journalists.

One of the things I’ve learned a lot about is the headline. More specifically, the “headline tradeoff”.

This is the tradeoff between descriptiveness, honesty, and effectiveness. If you’re emailing a professor at smalltown-university, a headline like “Growth curve question” or “Your book” might get you  a response, but if you’re emailing the co-founder of the biggest startup in your city, you’re going to need something more potent.

The headline “Quick question” is both honest and effective. It’s gotten me interviews with Mark Cuban and Walt Mossberg, among others.

It’s descriptive, but short enough to not take up the entire subject line space. It leaves some “white space” around it, and white-space is incredibly helpful in attracting the eye. Still, sometimes you need something stronger to guarentee a response.

For those ultracompetitive inboxes I desperately need to connec with, I’ve developed somewhat of a “nuclear warhead” of a headline. I use it extremely sparingly, but it’s incredibly effective. It has not only a 100% open rate, but a 100% response rate from some very busy people.

Ready for it?

“SOS! Aliens taking over the world”

Now you see why I use it sparingly. It’s a tad dishonest, and a bit crazy… BUT it’s also impossible to ignore. There’s clearly a punchline, but the punchline is so non-obvious that the recipient feels a pathological need to read the email to find it.

This headline takes the so-called “curiosity gap” concept mastered by Upworthy to it’s illogical extreme. It’s implausible, but not spammy (I have yet to see a spammer use alien invasions in their emails).

In the body of any email using this subject line, I always immediately apologize, and acknowledge how busy they are. Then, I get to the point of the email.

I get to the point really quickly, to emphasize that I respect that person’s time. If I have something of value to them, I bring it up immediately. It’s a nuclear warhead, but with a dose of honey, and it has gotten me some great advice, and even a client or two.

I was not shocked by the open rate, but I’m still shocked at the response rate. That’s why I keep using it.

For most emails though, it’s better to be short and descriptive. An email to my 8,000 member email list with the subject line “Our bot” about our new Facebook bot saw an open rate of >40%.

Still, it’s always nice to have a nuke in your email arsenal.

Something to Say

Posted on

For a few months now, I’ve had this urge to write again. When a publication invited me to write something though, the words just wouldn’t come out. Every few weeks something would come up organically, and I would write about it, but otherwise my keyboard was silent.

Looking back, I’ve realized that this stems from how generally sour I’ve become on media. Most of it, op-eds, articles, videos, is just noise. Most of the assignments I’ve been given have amounted to “Make noise”, “React”, “Write a headline to waste someone’s time”, “Make them click”, and I’m just not for that anymore. I’ve only been able to write when I actually had something to say.

90% of online writing nowadays is superfluous. It’s driven by editors who demand their writers play catchup with competitors. It’s driven by Facebook ads, and meaningless word wars. Another 8% is people writing about things they don’t know about. 2% remains for original writing. That’s the 2% I love, and the 2% I want to continue doing.

Progressive Radio

Posted on

Today I took a long Uber ride, and my driver happened to be listening to some progressive radio station. Every few minutes he’d hear something and nod “Uh huh” or “Damn right”. Meanwhile, I sat and listened in stunned silence.

I couldn’t comment, for fear of it lowering my Uber rating (it matters), but this show was one of the most delusional and dissapointing things I’ve heard in my political life.

Even as a Conservative, I’ve always recognized that Conservative radio has its share of crazy conspiracy theories. This experience confirmed to me for the first time though that there’s full parity in crazyness. Both sides really do have crazy, delusional images of the other side.

On this show, I learned of “the defintive evidence that Donald Trump is being blackmailed by Vladimir Putin” and that’s it a certainty that we’ll be at war with Iran or North Korea within the year. Oh, and Donald Trump promised his boss Putin (through Rex Tillerson, “Putin’s righthand man”) the right to invade all the baltic states. I’m not kidding! This is what the radio host claimed, and my Uber driver believed every word.

This is the sort of discourse that I find dangerous. It sets up the other side as more than a political opponent. It sets them up as a devlish ideological enemy, never to be trusted. From a partisan standpoint, it makes sense. Scare your audience away from every leaving you. This sort of thing is damaging to Democracy though.

“How would you describe social media to your grandparents in 3 sentences?”

Posted on

I saw a very interesting question on an internship application today: “How would you describe social media to your grandparents in 3 sentences?

My first instincts for this were to make an analogy to a village. I don’t have a version of this answer saved, but it compared Facebook to a village square, and other social networks to common gathering places like the local school, the pub, and the home. It was an interesting analogy, but after reading it, all I could think is “Great. But what’s social media?”

While that answer sort of encompassed the mechanisms of social media, it wasn’t very helpful for introducing novices to social media.

So I started again.

This time I asked “What affordances can I use grandma knows and are most closely related to social media?”

Communication mediums. My grandma knows the telephone, and the letter, and the newspaper, and even email to an extent (though I felt this might be a stretch for grandmas, so I decided not to rely on them knowing what email was).

By using these, I was able to craft an answer that was less reliant on connecting some weird analogy and reality. I’m still going to work on it some (I’d like it to be 80 words or less), but I think it’s a lot better than it was.

Here are my three sentences:

  • “Social media” is nothing more than the internet’s combination of all the communication tools you’re used to, sped up and personalized to your tastes. The most popular “social network”, Facebook, is a bit like a real-time newspaper written for you, with photos and stories from your friends, family, and favorite celebrities. You can contribute to this by sharing your own photos or stories, or you can use one-to-one social networks ( which are more like a letter or a phone call) to chat privately with your friends.

Learning to Cook (a little)

Posted on

I don’t know if I knew how to use the microwave until I was 11 or 12. Sad. Lately, I’ve been into cooking though. I bought a pot, and a pan, and actually use them. (Crazy, right)

I started cooking because the food options around me got a bit tiring, and sometimes I just don’t want to leave my apartment in 0 degree weather. It helps that I live above a target, so when I need a kitchen utensil and ingredients I’m within easy reach.

It turns out that cooking is fun, at least sometimes. It gives me the same sense of pride in building things that I get from programming, but I don’t have to stare at a screen (unless I’m looking up a recipe). It requires attention, but it’s also a bit mindless and free. I can listen to a podcast or an audiobook while I’m stirring a pot of sauce or cooking an onion.

The parts I don’t like are pretty guessable. Cleaning up (and washing dishes) is a pain. Cooking and cleanup require a good deal of time (and preplanning ingredients). Those two things detract from the spontinaety of it.

Still, it’s a really enjoyable use of time so far, and I look forward to exploring it more in 2017.

Here’s some fresh pasta I made from scratch (as in, from the flour to the bowl).

Made fresh pasta today and it was actually pretty good #italian #eeeeeats #pasta

A photo posted by Michael Sitver (@msitver) on

Designing Better Restaurant Metrics – Pt 4

Posted on

Promising results! Yesterday I wrote about an idea I had, using Bayesian filtering (the statistical technique they use to detect spam) to personalize restaurant preferences.

My hypothesis was that reviewers with similar taste preferences to me would share a similar vocabulary that was markedly different from reviewers with opposing views. By dividing reviews into two groups (restaurants I liked and restaurants I didn’t like), I could use statistics to calculate the probability of the wording of a certain review being from a restaurant I liked, rather than one I didn’t. That’s the basics of Bayes.

I ran a small-scale test today, focusing only on diners. I scraped the 20 most recent five-star reviews from six diners I’ve eaten at multiple times, and split each review-set into an array of words. Two of these diners were from my hometown in Connecticut, two are in downtown Chicago, and two are in Hyde Park. I like 3/6 of them.

I used the convenient “bayes” package from NPM to train and run my model. I trained my model on two diners I love and one I hate, and I tested my model on two I hate and one I love. It was 100% effective at judging my preferences for the diners.

Out of curiosity, I also tried applying my model to two non-diner restaurants, but it guessed wrong both times. This result wasn’t entirely surprising. I used two intentionally confusing restaurants. One was mediocre but served very similar menu items to a diner. The other was pretty good, but served very different items.

I think this idea has a lot of promise, but the flawed judgments it made with those two restaurants emphasize the importance of having an adequate dataset on each user’s preferences, including an adequate variety of cuisines. If a user only shares his favorite diners, the model is going to have a very strong preference towards diners.