In 2009, David Lazer sounded the call for a fresh approach to social science. By analyzing large-scale data about human behavior—from social-network profiles to transit-card swipes—researchers could “transform our understanding of our lives, organizations, and societies,” Mr. Lazer, a professor of political science and computer science at Northeastern University, wrote in Science. The professor, joined by 14 co-authors, dubbed this field “computational social science.”
This month Mr. Lazer published a new Science article that seemed to dump a bucket of cold water on such data-mining excitement. The paper dissected the failures of Google Flu Trends, a flu-monitoring system that became a Big Data poster child. The technology, which mines people’s flu-related search queries to detect outbreaks, had been “persistently overestimating” flu prevalence, Mr. Lazer and three colleagues wrote. Its creators suffered from “Big Data hubris.” An onslaught of headlines and tweets followed. The reaction, from some, boiled down to this: Aha! Big Data has been overhyped. It’s bunk.
Not so, says Mr. Lazer, who remains “hugely” bullish on Big Data. “I would be quite distressed if this resulted in less resources being invested in Big Data,” he says in an interview. Mr. Lazer calls the episode “a good moment for Big Data, because it reflects the fact that there’s some degree of maturing. Saying ‘Big Data’ isn’t enough. You gotta be about doing Big Data right.”
Among the academics reading and sharing it, Mr. Lazer’s article has fed a conversation about what it means, exactly, to “do Big Data right.” How do you study data gathered by companies that constantly tweak their services for business reasons? How do you make such data transparent? How do you train people who can bridge the divide between social science and computer science?
The conversation comes as other scholars, too, are puncturing inflated claims for Big Data. In a recent Pacific Standard article, two political-science professors, John Sides and Lynn Vavreck, debunked a media meme from President Obama’s 2012 campaign: that he won re-election thanks to his operatives’ data “wizardry.” And two other researchers have exposed flaws in studies that mine social-media behavior to discover people’s demographic traits.
Unstable Algorithm
Google Flu Trends made its debut in 2008. By analyzing the flu-related searches of Google users, the system estimates the prevalence of flu outbreaks in almost real time—information that, in theory, could be used to direct resources and save lives. By contrast, there is a lag of roughly two weeks in flu estimates produced by the Centers for Disease Control and Prevention, which are based on reports from labs across the country. “We can accurately estimate the current level of weekly influenza activity in each region of the United States, with a reporting lag of about one day,” Google scientists wrote in a widely cited 2009 Nature paper.
Last year, though, the Google project experienced what Mr. Lazer describes as a “Dewey Beats Truman” moment. Nature reported that Google Flu Trends was estimating more than twice the proportion of flu-related physician visits than the CDC. In reality, according to Mr. Lazer’s new research, Google Flu Trends has been “systematically overshooting by wide margins for three years.”
So how did it go off the rails? One likely explanation is the unstable nature of Google’s search algorithm, Mr. Lazer says, which was adjusted in various ways that probably increased the number of flu-related searches. (Google has tried to make it easier for its users to search for health-related information.) Google Flu Trends should have been modified accordingly. It wasn’t. And that reflects a core problem facing researchers who hope to use such data: “Most big data that have received popular attention,” Mr. Lazer wrote in Science, “are not the output of instruments designed to produce valid and reliable data amenable for scientific analysis.”
Related problems have emerged with efforts to use Twitter as a source of research data. Sociologists and market researchers, among others, are interested in developing tools to infer the demographic attributes—often not explicitly stated—of various online populations. Can you train a machine to glean a Twitter user’s political beliefs, for example, by his use of terms like “Obamacare” or "#p2,” a progressive hashtag? Early on, computer scientists reported that their automated tools could infer political orientation with upward of 95 percent accuracy.
Derek Ruths, an assistant professor of computer science at McGill University, wrote one of those optimistic papers. But then it dawned on him that these computer models, purportedly so accurate, were based on analysis of the most partisan, politically active Twitter users—a minority of the population. He and a master’s student, Raviv Cohen, conceived a study that corrected for that. What would happen when their models were tested on “normal” Twitter users, the kind of people who don’t tweet about politics so much and don’t use such partisan language?
The results were sobering, as Mr. Ruths and Mr. Cohen reported last year in a paper titled, “Classifying Political Orientation on Twitter: It’s Not Easy!” Past researchers had been “systematically overoptimistic” in the claims they made about machines’ ability to infer political orientation. When standard techniques were tested on the “normal” population of Twitter users, methods that had reported greater than 90 percent accuracy achieved barely 65 percent.
The emerging problems highlight another challenge: bridging the “Grand Canyon,” as Mr. Lazer calls it, between “social scientists who aren’t computationally talented and computer scientists who aren’t social-scientifically talented.” As universities are set up now, he says, “it would be very weird” for a computer scientist to teach courses to social-science doctoral students, or for a social scientist to teach research methods to information-science students. Both, he says, should be happening.
Mr. Lazer and others see the potential for enormous rewards if they do. Nicholas Christakis, a social scientist and physician who directs the Human Nature Lab at Yale University, ticks off a list of questions that Big Data can address: What are the origins of tastes and norms? Where do people’s desires come from? How do collective phenomena emerge from individual actions? “We’re witnessing the birth of a new kind of social science,” he says. Current struggles are “the birthing pains of that process.”
Erez Lieberman Aiden, a biologist and computer scientist who has mined Google Books to study linguistic and cultural evolution, points out what’s on the horizon. People are beginning to use Google Glass to record everything they see. The “Human Speechome Project"—an MIT Media Lab effort conceived to study language development by recording nearly everything a single child hears and sees from birth to age 3—provides a glimpse of how scientists might profit from such media records. Mr. Aiden speculates that someone will be born in the next 20 years whose life generates both a biography and a complete visual transcript.
In other words, Big Data techniques are here to stay. And they’re going to get bigger.