Big-Data Project on 1918 Flu Reflects Key Role of Humanists

U. of Kentucky

Soldiers with the Spanish flu are hospitalized inside the U. of Kentucky gym in 1918. In one prevention method examined in a new study, New Yorkers were advised to refrain from kissing “except through a handkerchief.”
February 27, 2015

A deadly virus arrives in America, carried by travelers from abroad. Health officials scramble to contain the threat, imposing quarantines and other strict measures even as they seek to reassure the public.

It sounds like the Ebola outbreak of 2014. But this scenario played out almost a hundred years ago, during the Spanish-influenza pandemic of 1918. Now a team of humanists and computer scientists has combined early-20th-century primary sources and 21st-century big-data analysis to better understand how America responded to the viral threat in 1918. It’s a study in the possibilities as well as the pitfalls of interdisciplinary work, and a model-in-progress for how data-driven analysis and close reading can enhance each other.

It’s also a historically minded project that speaks to the understandable contemporary obsession with fearsome diseases and how we respond to the threat they pose. That’s one reason the National Endowment for the Humanities helped support the work through its Digging Into Data grant program, administered by the agency’s Office of Digital Humanities.
The Spanish-flu project "really demonstrated how historical research in the humanities could address a very pertinent contemporary challenge in our society—namely, how public-health policies influence the spread of pandemic diseases," says Brett Bobley, director of the digital-humanities office, via email.

Essdras M. Suarez for The Chronicle

Thomas Ewing, a history professor at Virginia Tech, has been a leader of the project: "It’s changed the kinds of questions I ask."

The flu investigation has been led by E. Thomas Ewing, a professor of history at Virginia Tech, working with Naren Ramakrishnan, a professor of engineering in the university’s computer-science department and the director of its Discovery Analytics Center, and other researchers in the center, the English department, and the library, as well as the National Library of Medicine.

The team began with several questions: How did reporting on the Spanish flu spread in 1918? And how big a role did one influential person play in shaping how the outbreak was handled?

Royal S. Copeland was the health commissioner of New York City in August 1918, when a ship arrived in New York Harbor from Europe with flu victims aboard. Like Thomas R. Frieden, the current director of the Centers for Disease Control and Prevention and a central figure in the Ebola response, Copeland helped set the tone for how the nation reacted to a viral threat—and has been the subject of debate among historians ever since, with competing camps arguing about whether he did enough.

Courtesy of the National Library of Medicine

An influenza-prevention ad published in October 1918. The new study is using data-mining techniques to examine how word of such prevention techniques spread across the United States.

Copeland figures prominently in the unfolding of the pandemic, which lasted until 1920. Officials elsewhere looked to New York City for cues on how to respond to the outbreak.

To understand Copeland’s influence, historians studying the Spanish flu usually turn to his public statements and comments made about him. That kind of primary-source analysis, the historian’s version of close reading, is a tried-and-true method of investigation. But large-scale digital databases, adroitly mined, can help historians pinpoint specific sources that are worth that kind of close look.

As a subject for digitally enabled scholarship, the Spanish flu has advantages: It struck before 1922, which means relevant material is out of copyright, and it was well documented in the popular press of the day. Using the Library of Congress’s Chronicling America database of historical newspapers, the HathiTrust Digital Library, and other sources, the Virginia Tech researchers sought out direct and indirect evidence of Copeland’s role: mentions and quotations, references to flu-containment strategies he promoted. "You can see his influence even if his name’s not used," Mr. Ewing says.

At first the New York health commissioner was almost laconic about the flu’s arrival: "We have not felt and do not feel any anxiety about what people call ‘Spanish influenza,’" he told the New York Tribune on August 17. In order to avoid catching the flu, he said, New Yorkers should refrain from kissing "except through a handkerchief," advice that made its way into more than a dozen American newspapers by the end of September.

The researchers discovered traces of Copeland by identifying and then algorithmically searching for particular terms the health commissioner used ("influenza" and "kissing," for instance). Health officials in other locales, including St. Louis and Salt Lake City, transmitted Copeland’s antiflu advice to their citizens. By October, newspapers far from New York were sharing it. The Roanoke Times, for instance, ran an ad sharing Copeland’s assertion that theaters should be kept open as long as they were well ventilated and clean.

As the number of cases rose, however, the tone of coverage changed from reassuring or explanatory "to what we classify as warning or alarmist reports," Mr. Ewing says. By year’s end, more than 20,000 New Yorkers would be dead from the flu, with many more fatalities caused by flu-related illnesses such as pneumonia.

Copeland avoided some of the draconian measures adopted in other cities, such as the ineffective mandatory-vaccination policy embraced by Chicago. Given the high body count, though, was he wrong? "I’m inclined to think he made the right decision," Mr. Ewing says. "If you follow the statements step by step, you can see that he’s adjusting and responding to the changing circumstances, but also being more targeted in the recommendations."

Code and Context

To produce useful results, this kind of investigation depends on customized algorithms. But coming up with a good algorithm involves both code and context, a mingling of the complementary strengths of computer scientists and humanists.

A historian looks at a period newspaper and sees it as part of a broader cultural moment, but hardly straightforward as evidence. "The level of writing in a 1918 newspaper is actually pretty complicated," full of compound sentences and ironic shifts, Mr. Ewing says.

Essdras M. Suarez for The Chronicle

Naren Ramakrishnan, an engineering professor and director of Virginia Tech's Discovery Analytics Center: "We are very interested in working with any scholar, expert, or engineer who has access to interesting data sets and who has interesting questions to pose."

Human beings recognize tone. Algorithms are better suited to sifting through data in search of keywords—like "influenza" and "kissing." But "when we see a word or something being highlighted with an algorithm, we don’t know what it means," says Mr. Ramakrishnan.

Mr. Ewing came armed with a set of "tone categories" to focus on: Were newspaper reports alarming, reassuring, factual? The group talked through the analysis that members wanted to do. "Our goal was to mimic it in an algorithm," Mr. Ramakrishnan says.

Mr. Ewing learned to adapt as well. "It’s changed the kinds of questions I ask," he says. "The questions that are familiar and comfortable and exciting to me as a historian don’t always translate into what computer scientists are interested in." And there were basic, practical problems to deal with: hard-to-read scans, important databases that were prohibitively expensive. Researchers "have to think about what we’re leaving out," Mr. Ewing says.

Getting the right blend of questions and algorithm "takes some practice," says Mr. Ramakrishnan. It’s a set of skills that can be applied to contemporary problems as well as historical ones. Mr. Ramakrishnan and his group at the Discovery Analytics Center ("a data-mining shop," he calls it) have dug through medical data in tandem with doctors, and worked with political scientists on election data.

"We are very interested in working with any scholar, expert, or engineer who has access to interesting data sets and who has interesting questions to pose," Mr. Ramakrishnan says.

His group recently performed a quantitative analysis of tweets from the Ebola crisis and how false rumors travel like real news does.

The hybrid, trial-and-error nature of the Spanish-flu investigation may say something about the current state of computer-assisted humanities work. Mr. Bobley of the NEH says he has been impressed with the flu researchers’ "candid thoughts on how computational approaches like data mining are no magic bullet," even as they expand what humanists can do. The work is a reminder, he says, that "historical documents like newspapers are rich, messy, nuanced, and complex documents that defy easy computational analysis."

Jennifer Howard writes about research in the humanities, publishing, and other topics. Follow her on Twitter @JenHoward, or email her at