The Professor Who Declared, It's J.K. Rowling

Duquesne U.

"Nothing that we do is magic," says Patrick Juola, a computer scientist at Duquesne U., whose computer program analyzed linguistic style to identify a detective novel as having been written, pseudonymously, by Harry Potter's creator.
July 29, 2013

Patrick Juola has practiced stylometry, the science of linguistic style, for decades. But he was never famous for it until this month, when he helped unmask the world's best-known living author.

Mr. Juola, an associate professor of computer science at Duquesne University, was one of two academics enlisted by London's Sunday Times to confirm a tip that J.K. Rowling, creator of the Harry Potter series, had written a new detective novel under a nom de plume.

After Ms. Rowling acknowledged the ruse, Mr. Juola found himself caught up in a tale sweeping through the literary world. A digital-humanities field that bridges the gap between academic research and practical science came to sudden prominence. At a time when the Internet has made anonymous writing prevalent and potentially powerful, stylometry could make it easier for officials and others to unmask the authors behind unsigned texts. Conversely, the method could also make it easier for those writers to keep their identities hidden.

The story of how Mr. Juola helped foil Ms. Rowling's attempt to publish a new novel without the baggage of her boy-wizard legacy suggests a novelistic saga of its own—one of obsession and tireless sleuthing. In fact, Mr. Juola didn't break a sweat. The heavy lifting was done by a computer program, the Java Graphical Authorship Attribution Program (Jgaap), that he had designed to recognize writing tics undetectable by human readers.

The whole adventure unfolded rather quickly. On July 11, Mr. Juola got an e-mail from Cal Flyn, at The Sunday Times. The professor had been contacted by journalists before, but this time was different. Ms. Flyn did not want an interview; she wanted Mr. Juola to verify that Ms. Rowling was the author of The Cuckoo's Calling, a novel published this year, to some acclaim, under the name Robert Galbraith.

The case appealed to Mr. Juola, a fan of fantasy literature who, in a recent paper involving anonymous subjects, eschewed the customary John Doe-style aliases in favor of names from J.R.R. Tolkien's Lord of the Rings. The professor agreed to help. He loaded an electronic version of Cuckoo into Jgaap, along with several other texts, including The Casual Vacancy, Ms. Rowling's first post-Potter novel. (He chose that rather than a Harry Potter book because Cuckoo, like Vacancy, was written for grown-ups.)

The Duquesne professor then instructed the program to compare the sample texts to the Galbraith text using four variables: word-length distribution; the use of common words like "the" and "of"; recurring-word pairings; and the distribution of "character 4-grams," or groups of four adjacent characters, words, or parts of words.

The computer analysis took only about a half an hour, says Mr. Juola.

The findings were not unequivocal, but they made a pretty strong case for Ms. Rowling as the author of Cuckoo—at least compared with the three other authors of the sample texts. (There was, of course, no available writing sample from the fictional Mr. Galbraith.) Ms. Rowling's sample from Vacancy returned either the closest or second-closest match in each of the tests.

Emboldened by Mr. Juola's findings—and the results of a second stylometric analysis, conducted by Peter Millican, a philosophy professor at the University of Oxford—The Sunday Times challenged Ms. Rowling. She confessed quickly, though reluctantly. The newspaper ran its article that weekend, and by Monday, Mr. Juola had become known as the computer scientist who had found out the fantasy-author-turned-mystery-novelist by doing some detective work of his own.

Stylometry for Everyone

Stylometric analyses have not always been such a breeze. In the early 1960s, the statisticians Frederick Mosteller and David Wallace set out to determine who—Alexander Hamilton, James Madison, or John Jay—had written 12 of the Federalist Papers. That project took three years, despite assistance from what was then considered a high-speed computer at the Massachusetts Institute of Technology.

Advances in stylometric theory and computing have made things easier since then. With a tool like Jgaap, a program Mr. Juola wrote and developed over the past decade, "it takes little more time to analyze a set of novels than it does to download them," he says.

In an era when the Internet has encouraged the proliferation of both digital publishing and anonymity, stylometric analysis stands to become increasingly relevant outside of academe. Stylometric analysis has become not only faster in recent years, says Mr. Juola, but also something that anybody, in theory, can do. The professor has made the Jgaap program available for download, though it requires some expertise to use well. (Mr. Millican, of Oxford, has also made a version of his own stylometric software, called Signature, freely available.)

Literary historians aren't the only group interested in revealing the authors of unsigned texts. Lawyers are another. Juola & Associates, the Duquesne professor's private consulting firm, pitches its services to attorneys looking to bring scientific evidence to bear on cases involving disputed documents. Mr. Juola says the firm's hourly rate runs in the hundreds of dollars.

Applying stylometry outside the realm of literary forensics, however, can be knotty. Mr. Juola and his partners recently worked with a lawyer for a foreign national who had sought political asylum in the United States. To persuade a judge that his client faced government retribution in his home country, the lawyer wanted to prove that the man was the author of several politically charged articles that had been published anonymously on the Web.

The analysis proved difficult. Literary whodunits tend to come with a rich, comparable data set and a limited array of possible suspects, while real-world scenarios tend to be less "clean," says Mr. Juola. In this case, he and his team had to determine—without a list of other possible authors—whether or not the asylum-seeker had written the anonymous articles. For the Cuckoo inquiry, the news reporters had done their homework, compiling a lineup of possible ghostwriters in addition to Ms. Rowling. For the asylum case, the stylometrists had to patch together additional suspects, and samples, by hand.

In the end, Juola & Associates offered what it considered statistically valid evidence that the man had, in fact, written the anonymous essays, and the judge let him stay in the country. But the stylometric detectives were reminded of an important lesson: "Authorship analysis in the field can pose substantially different challenges than in the lab," wrote Mr. Juola in a summary of the case.

Often, anonymous writers will take pains to avoid being found out, and for them, too, modern stylometry offers help. Even as Mr. Juola has been working to fine-tune technology for unmasking anonymous authors, some of his colleagues are building technology for keeping authors' identities hidden.

Rachel Greenstadt, an assistant professor of computer science at Drexel University, is one such researcher. She works in "adversarial" stylometry, which exploits blind spots in programs like Jgaap. She is developing a tool, called Anonymouth, that strips an author's writing of its stylistic markers. Such a tool could be used to give cover to authors who have good reason to remain anonymous, like whistle-blowers and political dissidents.

Scientifically, Ms. Greenstadt plays for the same team as Mr. Juola, but the Drexel professor is eager to discuss the limitations of the technology. "Stylometry works really well if nobody's trying to fool it," she says. "But if they are, it can be fooled quite effectively, and by people who are not particularly trained."

Mr. Juola, for his part, harbors no fantasies about his work.

"Nothing that we're doing is magic," he says. "What we are doing is the same type of judgment that experts have always done about reading documents and figuring out something about the author—just a lot faster, and more accurate than most."