Data Mining for Personally Targeted Politics

Brexit vote mapRegular Lingua Franca readers may recall that I am a skeptic about both machine intelligence and the dangers of computers invading our privacy. But do not imagine that I am dismissive of all developments in the fields bracketed under the misnomer “artificial intelligence”: Some of the claims made about what computers can do are true, even a little scary.

I know I mocked the pathetic artificial stupidity exhibited by the devices that purport to communicate linguistically with us. I dissed 2013-vintage Google Translate for not really doing translation at all (things have improved since then). I floccinaucinihilipilificated the notion that Gmail threatens your privacy by reading your emails. I even dared to express the (very unpopular) view that government storage of phone call metadata is not a threat.

But some recent developments suggest that current technology can utilize data about your thoughts and preferences for political ends. This recent Guardian article and this New York Times article report on matters that I think we should be worried about.

robert Mercer

Robert Mercer receives his award from the Association for Computational Linguistics.

Robert Mercer worked with Fred Jelinek at IBM on speech and on machine translation, earning a lifetime-achievement award from the Association for Computational Linguistics in 2014. The IBM team pioneered “big data” approaches, which means building huge statistical models based on gigantic quantities of raw data, not on understanding anything about speech or language or meaning.

Jelinek was actually interested in linguistics, but he notoriously remarked on several occasions that it seemed each time he replaced a linguist with an engineer the performance of the speech system improved. (Some say Jelinek said it but didn’t really believe it; Mercer didn’t say it but freely admits to believing it.) Anyway, crucially, the brute-force statistics-and-engineering approach worked: IBM’s team pulled far ahead of other labs in performance.

And the mathematical techniques they employed turned out to be applicable elsewhere. In 1993 Mercer walked away from IBM to join Renaissance Technologies, where (under a heavy veil of secrecy) statistical techniques were applied to predicting the stock market. The results were staggering. A private hedge fund developed at Renaissance, the Medallion Fund, achieved 35 percent average annual return over 20 years. The employees of Renaissance became extraordinarily wealthy. Mercer himself was able to donate about $35 million to conservative political campaigns (first Cruz, then Trump), and to invest $11 million in Breitbart News.

Mercer also provided major funding for a private company called Cambridge Analytica (the American arm of a British firm 20 years older, Strategic Communication Laboratories Group), which applies data mining and information analysis to election campaigns. Its services were provided free to the (successful) campaign to get the British people to vote to leave the European Union. Despite its involvement with the Trump campaign, its influence on the 2016 American election remains unknown.

Cambridge Analytica utilizes the fact that huge numbers of voters now use social media, and sites like Facebook offer gigantic amounts of readily mined public information about them. From the Facebook posts, comments, or pictures that made you click the “like” button, valuable information about your preferences and attitudes can be deduced. Michael Kosinski, lead scientist at Cambridge Analytica (exaggerating perhaps a little), reckons that from 150 of someone’s likes you can make a personality profile that predicts them better than their spouse would, and from 300 you can understand them better than they understand themselves.

Even if you don’t have a Facebook account, traces of you will show up in the online activities of your spouse, children, friends, neighbors, and colleagues. And there’s more to be mined than just likes. A personality test that went viral on Facebook produced results on six million test-takers, which formed the basis for a psychometric model produced at Cambridge University’s Psychometrics Centre. Cambridge Analytica is thought to have obtained access to the data. They claim to have some sort of psychological profiling data on 220 million Americans. The goal is to be able to locate people with particular political views and pipe material to them that will be maximally likely to influence their vote in a specific direction.

I like to think I’ll be able to resist the influence of juicy ads or fake news stories piped at me, even if they have been sculpted to nudge people with exactly my sort of opinions toward some selected political decision. But who knows? I’m probably just as malleable as anyone else. The techniques emerging from IBM’s statistical methods in the 1980s, honed to a sharp edge at Renaissance, could probably identify my likes and dislikes accurately enough to permit a political campaign to punch my buttons with clinical precision. Yours too. As Veronica Quaife (Geena Davis) says after she learns a bit about “insect politics” in The Fly, be afraid. Be very afraid.

Update: See now this more detailed article on Cambridge Analytica and related companies.

Return to Top