Cambridge Analytica's Facebook Data Was Valuable, Worthless

March 22, 2018 2:40 PM EDT

A voter-profiling firm that worked with Donald Trump’s campaign improperly obtained information on 50 million Facebook users, but experts say the data was both extremely valuable and possibly worthless.

Cambridge Analytica, which worked with both Trump and Texas Sen. Ted Cruz’s presidential campaigns, got access to information on Facebook users and their friends through a researcher who misled the social network giant, saying the information he was gathering would be used strictly for academic research, according to recent reports.

The trove included information like people’s names, locations, genders and things users have “liked” on Facebook, which a company whistleblower said it planned to use to exploit “the mental vulnerabilities of people” with targeted political messages.

Because there would be no other way to get that data through Facebook, which does not sell personal information, it’s hard to put a price tag on how much it’s worth, sources in politics and tech say. While personal data is clearly valuable — especially for advertisers who want to reach specific mailboxes and inboxes — they also raised doubts about whether the profiles would really be that much more helpful than information already for sale in the public square.

Tom Bonier, CEO of political data and strategy firm TargetSmart, said that licensing a robust set of “voter files” at a similar scale might cost a company around $500,000. Voter files are snapshots that often combine government records on whether you’re registered to vote with consumer data about things you have bought, as well as educated guesses on your race and income. The commercial data brokers that sell it might even scrape together tidbits like how much you owe on your mortgage or whether you have a hunting license.

Voter files are commonly used by campaigns, though in some ways they’re arguably inferior to a Facebook profile: While a voter file might try to assess your religion or level of education based on other known factors, that kind of information is often explicit on social media. And better data is more valuable. Directly collecting information on 50 million people, rather than using models to predict it, would “certainly” run up a bill of millions of dollars, Bonier said.

While the revelations have sparked a controversy over how Facebook protects information on its users, Bonier said that the way Cambridge Analytica attained the information is more shocking than the fact that a political firm had personal data about individual U.S. voters. “What should be the most disconcerting component of this is really that breach of trust,” he said.

On Wednesday, Facebook CEO Mark Zuckerberg posted a response to the situation after days of silence, acknowledging that users might feel their privacy was compromised. “We have a responsibility to protect your data, and if we can’t then we don’t deserve to serve you,” he wrote on the platform.

The researcher working for Cambridge Analytica also captured a lot of signals that aren’t in voter files. According to news accounts, the researcher, Aleksandr Kogan, developed a Facebook app with a personality quiz. When users opted in, as about 300,000 eventually did, they gave him access to information in their profiles. At the time, circa 2013, Facebook also permitted developers to access information about such users’ friends, which is how the number of profiles in question ballooned into an enormous sampling of U.S. voters. (Facebook has since changed that policy and cut that access off.)

Facebook says there was no “data breach.” The problem, the company says, is that Kogan misled the company and users about how he would use the information, improperly sharing the data with a commercial firm. According to reports, Cambridge Analytica paid Kogan’s firm something in the neighborhood of $1 million — covering the costs to do the research as well as incentives for people to take the quiz — to get the information.

Cambridge Analytica has denied using the Facebook data, though it has admitting to possessing it. Reports from the New York Times and Guardian detail the firm’s intention to use “likes” to help build algorithms that can predict the personality traits of voters. Armed with that kind of data, a firm could in theory tailor political messages that play on factors like people’s neuroticism or openness.

Research has found that “likes” can predict many traits, often with better accuracy than the factors that voter files use to make guesses about people’s attributes. For example, “likes” can predict the race of a Facebook user with 95% accuracy. But when Pew recently did a study on voter files, seeing how accurate data brokers were in their records about individuals, some companies had accuracy rates on race closer to 75%. With education, the rates dipped to 27%.

The personality quiz, with added questions about political leanings, had the potential to make connections between “likes” and voters’ mental makeup. In theory, the researchers could assess which “likes” and other attributes are associated with people who scored high on neuroticism questions, for example, and use those correlations to figure out how other people think.

But a number of experts have cast doubts on whether this would all add up to something that could really change voter behavior. Antonio Garcia Martinez, a Silicon Valley veteran and former employee of Facebook who writes about these issues for Wired, likens the scheme to creating a “mental horoscope.” Even if a firm knew that a neurotic people could be persuaded to vote for a candidate with a certain type of ad, he says, how would a firm target ads to neurotic people? Would they buy ads on Facebook that target people who tend to “like” things that neurotic people tend to “like”?

“It’s too many leaps,” Martinez says. “You can’t target a state of mind on Facebook.”

In that case, is all this extra data really worth very little? Martinez argues that 50 million profiles Cambridge Analytica ended up with might actually be more noisy and “less useful” than traditional voter files. Bonier likens it to past fads in politics, like pollsters wanting to draw overblown conclusions about voters based on whether they own a Prius.

That’s not to say there aren’t lots of useful signals for companies like his in social media, he says. Political firms might use “social listening,” gathering publicly available information about who has used an #ImWithHer or #MAGA hashtag that can serve the same kind of predictive purpose.

And even if it’s not happening today, political operatives could find a way to psychologically profile the masses and use that to sway elections. One of the major lessons of the election was that brilliant, powerful tools like Facebook can be used in ways their founders never intended.

“Politics has always been about persuasion and if persuasion goes far enough it can veer into manipulation, playing on people’s greatest fears and desires. And I think the Internet’s making that explicit for the first time,” says Matt Mahan, CEO of Brigade, a startup that is trying to use data analytics to make it easier for people to engage in the political process. “People’s personal preferences and fears and hopes and neuroses, none of those things were ever explicit down to an individual basis,” he says, “before the rise of social media.”

The Facebook Data Cambridge Analytica Took Was Either Extremely Valuable or Totally Worthless

More Must-Reads From TIME