On November 8, 1800, fire ravaged the federal War Office, in Washington. The agency’s files went up in smoke, leaving a gaping hole in the nation’s historical record.
“The most important window into the early republic had basically been boarded up,” says Christopher H. Hamner, a military historian at George Mason University.
Not anymore. Through years of shoe-leather detective work, scholars have recreated much of the archive by tracking down copies of nearly 45,000 documents. But now they face another challenge: transcribing them from digital images.
Their solution is to enlist the public to help, free. The experiment, run by George Mason’s Center for History and New Media, tests an increasingly important question: How will the Wikipedia model of open participation change humanities scholarship?
Many people have taken part in crowdsourced science research, volunteering to classify galaxies, fold proteins, or transcribe old weather information from wartime ship logs for use in climate modeling. These days humanists are increasingly throwing open the digital gates, too. Civil War-era diaries, historical menus, the papers of the English philosopher Jeremy Bentham—all have been made available to volunteer transcribers in recent years. In January the National Archives released its own cache of documents to the crowd via its Citizen Archivist Dashboard, a collection that includes letters to a Civil War spy, suffrage petitions, and fugitive-slave case files.
The crowdsourcing boom is opening the ivory tower to people like Jaré Cardinal. Ms. Cardinal runs the Seneca-Iroquois National Museum, in western New York. Working in her home office overlooking the Allegheny River, she joins the 760 volunteers who have answered the call to transcribe War Office records. Their only official training is a short set of guidelines.
What motivates Ms. Cardinal is the chance to find material to use in the small museum she directs, on a reservation about 70 miles south of Buffalo. One character of particular interest is Farmer’s Brother, a Seneca leader who befriended George Washington and made an alliance with the United States in the War of 1812. She calls the George Mason archive “a dream site.”
“What better thing to do than to sit in a nice cozy office at home, with your computer, and find out what the guys from 200 years ago were saying about this or that?” Ms. Cardinal says. “Some would say that’s an awfully lonely life. But I’m a grandmother of four. … It isn’t like I’m a recluse or anything. It’s just I’m very excited about history and the fact that you can access all this stuff that you couldn’t before.”
Crowdsourcing advocates see that kind of excitement as a powerful force to improve access to material, build an engaged audience for collections, and perhaps save money. They speak of democratizing the publication of historical documents, allowing people to produce an online archive about any subject, be it a World War II regiment or a small-town mayor.
As intellectual influences, they cite thinkers like Clay Shirky, whose book Cognitive Surplus argued that technology is changing people “from consumers to collaborators, unleashing a torrent of creative production that will transform our world.” And they’re building tools to help realize Mr. Shirky’s vision. George Mason recently released Scripto, a free community-transcription program, developed using the War Office archive as a test case.
Crowdsourcing looks increasingly attractive in part because of the fiscal pressures facing historical-documents projects. The National Historical Publications and Records Commission, a key line of support housed within the National Archives, has seen its grant-making budget cut from $9.9-million in 2010 to less than $5.4-million in 2012. Some lawmakers want to eliminate it altogether. In the future, volunteers may become essential colleagues.
But is the crowd up to the task? Some raise concerns, including Edward G. Lengel, editor in chief of the Papers of George Washington at the University of Virginia, who calls crowdsourcing “an unproven concept.” He points to the Bentham crowdsourcing work as an example of “significant issues” surrounding the “cost-effectiveness, speed, and viability” of this new method.
The Bentham Project, based at University College London, has been working on a 70-volume Collected Works of Jeremy Bentham since 1958. As of 2010, some 40,000 of the philosopher’s manuscripts remained untranscribed. So, starting that September, scholars invited online volunteers to help. To date, more than 1,700 people have signed up to participate in the project, “Transcribe Bentham,” and more than 4,000 transcriptions have been completed.
Yet in a recent article in the journal Literary & Linguistic Computing, members of the Bentham team reported that crowdsourcing had seemingly not succeeded at speeding the pace of transcription, at least so far. Had team members been dedicated solely to transcribing manuscripts full time, rather than moderating submissions and other tasks, they could have completed two and a half times as many records as the volunteers, according to the article’s analysis of early data from the project.
Philip Schofield, director of the Bentham Project, points out, however, that “it would be virtually impossible to get significant funding for transcription alone.” Over all, the experience has been “fantastic,” he says, both in pushing the transcription forward and raising the Bentham Project’s profile.
Crowdsourcing contrasts with the meticulous, closely controlled process Mr. Lengel and others use to edit and publish historical records. The Papers of George Washington project began in 1968. A team of historians, with an annual budget of nearly $1-million, publishes two carefully annotated volumes each year. They’ve completed 64 to date and will finish with roughly 88 volumes by 2024. Mr. Lengel’s staff comes to the work with deep training in how to recognize different types of handwriting, determine the dates of documents, and situate the materials in context. They know which record to use, for example, when confronted with multiple copies of a document, like a draft, a letterbook copy (people used to copy their letters into bound books of blank pages), and a receiver’s copy.
Crowdsourcing is worth trying “if you view it as an experiment,” Mr. Lengel says. “But because members of the public who have not been trained in documentary editing are never going to be able to produce complete editions to the same level of accuracy that trained professionals will do,” he says, “I just think it can never be an alternative to traditional documentary editing for a major project.”
Other questions remain as well. Will enough volunteers participate to sustain these projects? Will the crowd care about less-sexy subjects, beyond war and famous individuals? And could transcribers’ political beliefs skew their work on documents related to sensitive historical topics?
‘A Hole in Our History’
The War Office archive now lives here, in the Fairfax, Va., headquarters of the Center for History and New Media, one of the country’s pre-eminent laboratories for digital-humanities projects. Sitting in a common room on a recent afternoon, Mr. Hamner presses his face close to his laptop, deciphering the scrawl of a January 4, 1790, letter dispatched from London to Henry Knox, the first U.S. secretary of war. The letter’s subject, in part, is a contemporary news event: the French Revolution. It’s one of the thousands of letters, speeches, and logbooks in an archive that stretches from 1784 to 1800 and features dozens of key historical figures.
During the early years of the new republic, between the administrations of George Washington and John Adams, the War Office managed Indian, veteran, army, and naval affairs. It ran what George Mason calls “the nation’s only federal social-welfare program.” It was involved in settling the West. It spent 70 percent of the federal budget.
The agency “really was the federal government,” says Mr. Hamner, editor in chief of the Papers of the War Department. It was “the main way in which citizens interacted with the new government.”
In his own research, Mr. Hamner has drawn on the archive to write about pension plans for widows and orphans of the War of Independence, and a George Mason Ph.D. student is also mining it for a dissertation on George Washington’s changing attitudes toward American Indians. Mr. Hamner expects the archive will eventually contribute to research on a variety of other topics, like early foreign policy, the history of the early navy, and economic issues in the early republic.
The collection recreates what was burned in that fire on November 8, 1800, when the War Office was in temporary rented quarters on Pennsylvania Avenue in Washington, D.C. “All the papers in my office” have been destroyed, Secretary of War Samuel Dexter wrote in a letter soon after, according to George Mason’s history of the War Office archive. “For the past two centuries,” the history says, “the official records of the War Department effectively began with Dexter’s letter.”
That situation started to change in the early 1990s, thanks to the determined sleuthing of a retired U.S. Army lieutenant colonel and military historian, Theodore J. Crackel.
Mr. Crackel had studied Thomas Jefferson’s relationship with the military, and he knew many War Office records had survived. Many people kept letters they had received from the agency. And, in the case of correspondence sent to the War Office, many senders kept copies or drafts of their original letters. Over the centuries, these documents seeped into hundreds of repositories around the world, including libraries, historical societies, and the National Archives.
By hunting down and copying those scattered records, Mr. Crackel rebuilt a substantial portion of what had been destroyed in the fire. The project involved visiting more than 200 repositories and consulting more than 3,000 collections—a quest that took him to major institutions like the British Library and obscure ones like a county museum in Maysville, Ky.
“This was a hole in our history,” says Mr. Crackel, who left the project in 2004 to become Mr. Lengel’s predecessor leading the Papers of George Washington. “It was a great challenge … finding these documents, doing something no one had ever done before.”
Citizen Archivists
The challenge now for scholars is navigating the new world of crowdsourcing. Knowing it would never get grant support to transcribe all the records, the center at George Mason opened up the process to volunteers last March. Since then, the crowd has completed 1,429 transcriptions, a fraction of the total.
And while 760 people have signed up for accounts, only 125 actively transcribed in the past 90 days. As with other projects of its kind, a handful of people do much of the work. Finding participants has been the biggest challenge, says Sharon M. Leon, director of public projects at the Center for History and New Media.
Those contributors make for an eclectic mix: people interested in American Indian issues, like Ms. Cardinal; genealogists; academics; women looking for documents to support applications to join the Daughters of the American Revolution. One active transcriber, Nicole Salomone, is an independent scholar focusing on George Washington and his advisers. She participates in part because she loves to leave comments explaining documents whose context she knows, making them more useful to others.
George Mason editors spend about 30 minutes a day managing the work of these volunteers—comparing transcriptions to the original images, creating accounts, and answering questions. The transcriptions have been “quite clean,” Ms. Leon says.
“Lots of the fear about crowdsourcing is, Oh, it’s just going to be a mess, and people are just going to deface things,” she says. “We haven’t seen a hint of any malicious use.”
And the biggest benefit? Crowdsourcing engages scholarly projects with a larger audience, Ms. Leon says, giving volunteers a feeling of investment and participation in the work of history. In helping to prove the public value of humanities projects, that’s no small thing.