The Brilliance of Text-rank | Unimportant Thoughts

Lately, by coincidance, I’ve run into a lot of really cool algorithms (see my recent series on bayes filtering). I’m not a math or a CS major, but surprisingly often, I wonder how something is done, do a Google search, and come across a solution so simple and intuitive that even I could implement it in an afternoon.

One of these is text-rank. It’s what’s known as a “Graph-based ranking algorithm”, and it’s designed to summarize long texts. It can shorten a 7000-word article to 1300 words and retain most of the most important ideas in the piece. I would have thought algorithms like these to be very complex, but text-rank, and other summarization algorithms, are remarkably simply. Bascially, they break down a text into sentences and paragraphs, compare sentences to see which sentences overlap the most in content with other sentences (making them the most valuable), and then cut out all but the most valuable of sentences.

I don’t know what use I’ll ever make of this, but it’s cool to understand how such a simple little idea can be used so powerfully.