Digging Into Data, Day 2: Making Tools and Using Them

Washington—The age of innovation in digital tool-making is slowing down, making room for users of those tools to take the creative lead. That’s what participants in the 2011 Digging Into Data Challenge Conference heard from one keynote speaker, Tom Jenkins, executive chairman and chief strategy officer of the software company Open Text. Throughout history, he said, “the application is far more profound” than the original idea.

The meeting, which concluded here Friday, featured eight digital-humanities research projects that showed how creative scholars have gotten in combining big data and digital tools. The eight projects were winners in the first Digging Into Data competition, held in 2009 to stimulate innovative digital work in the humanities and social sciences. (A second competition has just been announced.) An international group of grant-making organizations, including the National Endowment for the Humanities and research councils in Canada and Europe, supports the program.

During the two-day conference, linguists, musicologists, historians, and computer scientists showed off the results of playing, in a serious way, with enormous amounts of data. All the projects work on a scale that used to be seen only in the sciences.

One project, “Mining a Year of Speech,” pulled together year’s worth of spoken English—at least two terabytes’ worth—collected “in the wild,” meaning not recorded in labs but taken from real-world conversations, news broadcasts, and other sources. The researchers demonstrated ways in which the huge corpus can be mined: to track varying emphases in pronunciations of a certain phrase, for instance, on a scale and at a speed that would be impossible in traditional linguistic research.

Another project, “With Criminal Intent,” works with the records of nearly 200,000 trials held at the Old Bailey, London’s central criminal court, between 1674 and 1913. The proceedings have now been digitized as the Old Bailey Proceedings Online. The “Criminal Intent” presenters showed how the database—”127 million words of trial accounts,” according to the project Web site—can be mined to answer all kinds of research questions: about the incidence of certain charges, for instance, and the rise of plea bargaining. A main focus of their work has been on using the Old Bailey records with Zotero, which enables researchers to collect and share their research sources,  and TAPoR, a “text analysis portal” that helps users collect and analyze texts using different digital tools.

Stephen Ramsay, an associate professor of English at the University of Nebraska at Lincoln, gave a talk in response to the “Criminal Intent” project. He used the occasion to talk about how far digital-humanities work has evolved since 2002, when “text analysis was a minor act” in the digital humanities and the revolution in tool-using was in its infancy.

Back then, Mr. Ramsay said, it wasn’t clear that scholars in the digital humanities even knew how to build code. “We do now,” he said. “Or, at least, these guys do.”

Mr. Ramsay’s talk celebrated how this kind of Big Data work can enhance rather than diminish the humanities’ traditional engagement with human experience. “The Old Bailey, like the Naked City, has eight million stories. Accessing those stories involves understanding trial length, numbers of instances of poisoning, and rates of bigamy,” he said in his response. “But being stories, they find their more salient expression in the weightier motifs of the human condition: justice, revenge, dishonor, loss, trial. This is what the humanities are about. This is the only reason for an historian to fire up Mathematica or for a student trained in French literature to get into Java.”

Return to Top