Here's an experiment: try thinking of a song not as a song but as a collection of distinct musical attributes. Maybe the song has political lyrics. That would be an attribute. Maybe it has a police siren in it, or a prominent banjo part, or paired vocal harmony, or punk roots. Any one of those would be an attribute. A song can have as many as 400 attributes those are just a few of the ones filed under p.
This curious idea originated with Tim Westergren, one of the founders of an Internet radio service based in Oakland, Calif., called Pandora. Every time a new song comes out, someone on Pandora's staff a specially trained musician or musicologist goes through a list of possible attributes and assigns the song a numerical rating for each one. Analyzing a song takes about 20 minutes.
The people at Pandora no relation to the alien planet analyze 10,000 songs a month. They've been doing it for 10 years now, and so far they've amassed a database containing detailed profiles of 740,000 different songs. Westergren calls this database the Music Genome Project.
There is a point to all this, apart from settling bar bets about which song has the most prominent banjo part ever. The purpose of the Music Genome Project is to make predictions about what kind of music you're going to like next. Pandora uses the Music Genome Project to power what's known in the business as a recommendation engine: one of those pieces of software that gives you advice about what you might enjoy listening to or watching or reading next, based on what you just listened to or watched or read. Tell Pandora you like Spoon and it'll play you Modest Mouse. Tell it you like Cajun accordion virtuoso Alphonse "Bois Sec" Ardoin and it'll try you out on some Iry LeJeune. Enough people like telling Pandora what they like that the service adds 2.5 million new users a month.
Over the past decade, recommendation engines have become quietly ubiquitous. At the appropriate moment generally when you're about to consummate a retail purchase they appear at your shoulder, whispering suggestively in your ear. Amazon was the pioneer of automated recommendations, but Netflix, Apple, YouTube and TiVo have them too. In the music space alone, Pandora has dozens of competitors. A good recommendation engine is worth a lot of money. According to a report by industry analyst Forrester, one-third of customers who notice recommendations on an e-commerce site wind up buying something based on them.
The trouble with recommendation engines is that they're really hard to build. They look simple on the outside if you liked X, you'll love Y! but they're actually doing something fiendishly complex. They're processing astounding quantities of data and doing so with seriously high-level math. That's because they're attempting to second-guess a mysterious, perverse and profoundly human form of behavior: the personal response to a work of art. They're trying to reverse-engineer the soul.
They're also changing the way our culture works. We used to learn about new works of art from friends and critics and video-store clerks from people, in other words. Now we learn about them from software. There's a new class of tastemakers, and they're not human.
Learning to Love Dolph Lundgren
Pandora makes recommendations the same way people do, more or less: by knowing something about the music it's recommending and something about your musical taste. But that's actually pretty unusual. It's a very labor-intensive approach. Most recommendation engines work backward instead, using information that comes not from the art but from its audience.
It's a technique called collaborative filtering, and it works on the principle that the behavior of a lot of people can be used to make educated guesses about the behavior of a single individual. Here's the idea: if, statistically speaking, most people who liked the first Sex and the City movie also like Mamma Mia!, then if we know that a particular individual liked Sex and the City, we can make an educated guess that that individual will also like Mamma Mia!
It sounds simple enough, but the closer you look, the weirder and more complicated it gets. Take Netflix's recommendation engine, which it has dubbed Cinematch. The algorithmic guts of a recommendation engine are usually a fiercely guarded trade secret, but in 2006 Netflix decided it wasn't completely happy with Cinematch, and it took an unusual approach to solving the problem. The company made public a portion of its database of movie ratings around 100 million of them and offered a prize of $1 million to anybody who could improve its engine by 10%.