Civil War Project Shows Pros and Cons of Crowdsourcing

Volunteers transcribe handwritten diary entries such as this one shown here.

[This story was updated and some facts about the Iowa project were corrected on June 21, 2011]

The University of Iowa is the latest to use crowdsourcing to let anyone online help do work once reserved for scholars and archivists—in this case, inviting volunteers to transcribe a trove of Civil War-era diaries. Like other recent attempts to tap the crowd for scholarship, leaders of the Iowa project are learning that free online help can be a major boon, though it isn’t completely free.

The new effort, called the Civil War Diaries Transcription Project, hit the Web last month, with scanned images of more than 3,000 diary pages from the university’s archives. The library wants to convert the handwritten entries into a typed format that can be searched, and so visitors are asked to carefully read entries on the site and type the content into a Web-based form to be used by the library. Staffers check the submissions for accuracy.

Greg Prickman, the assistant head of the University of Iowa’s special collections, said that the purpose of crowdsourcing was to keep costs low, because the project did not receive any direct support. “The idea behind crowdsourcing is that if you have a lot of people doing a little bit, you get a lot of progress,” he said.

At first the university put the word out to Civil War societies and other historical organizations to recruit volunteers, and some history buffs stepped forward. But traffic spiked suddenly last week, when the project was featured prominently on Reddit, a popular blog in which users post and vote on interesting Internet links. The site received over 32,000 unique hits—30 times its usual traffic for the week—said Nicole Saylor, the head of Digital Library Services at the University of Iowa.

The good news: Volunteers have now completed transcriptions of more than 1,400 documents.

But the rush of users crippled the Web site for a day. “Once the site started to get that much traffic, pretty much you couldn’t get to anything in the digital library,” Mr. Prickman said.

Staff members have spent more time checking the work of volunteers than they would have had to do if they had hired professional transcribers, Mr. Prickman says. But it has not been an excessive amount of time, and the cost-savings have made the project possible, according to Mr. Prickman. “I think we’ve also come to recognize some ‘power users’ who transcribe in great quantity with high accuracy,” he wrote in an email.

The quality of volunteer transcriptions has been of greater concern to Sharon Leon, director of public projects at the Center for History and New Media at George Mason University, who is conducting a similar crowdsourcing project in which volunteers transcribe handwritten documents from the now-defunct U.S. War Department of the 1800s.

“I don’t think anyone believes that there’s going to be a wholesale replacement of an awful lot of paid staff labor by crowdsourcing projects,” Ms. Leon said. “It gets the public involved, but it makes new kinds of work for existing staff.”

She also said that projects will have to be marketed in a way that makes it clear that paid staff is not being supplanted by free labor in order for crowdsourcing to be fully embraced by the professional community. Ms. Leon stressed that crowdsourcing in universities is still relatively new and that no project should go untended. “I think the accuracy question and the management burden are a lot,” said Ms. Leon. Ultimately, though,”I just think that it’s worth it.”

“We’ve developed a really nice community of folks who are interested in the content and willing to contribute to the work,” Ms. Leon said of the project she is leading. She is also developing Scripto, an open-source tool that will enable others to carry out crowdsourcing transcription projects without investing in costly software.

University College London was among the first to utilize transcription crowdsourcing with Transcribe Bentham in 2010, which solicited the public to go through 40,000 pages of the philosopher Jeremy Bentham’s unpublished manuscripts. Earlier this year, the project was scaled back due to a lack of funding when the government grant initially used to pay for vetting and computer programmers ended.

Return to Top