Crowdsourcing Transcription: FromThePage and Scripto

The 2012 Annual Meeting of the American Historical Association was packed with sessions on digital history and the digital humanities. In one time slot I counted four sessions related to digital archives, online tools, or other technology related panels. One of the panels I especially enjoyed was Crowdsourcing History: Collaborative Online Transcription and Archives (Tweets available at #session138 #aha2012), talking about some of the projects and tools out there which involve massive crowdsourcing of the transcription of handwritten documents. Presenters included Valerie Wallace who talked about the Transcribe Bentham project, James Ginther introducing T-PEN, Tim Sherratt on Invisible Australians, and Chris Lintott on Zooniverse and the Old Weather project. You can read more about the panel at the website created for it: Crowdsourcing History.

These were all inspiring projects of an impressive scale, and all of them are success stories in terms of the rewards they have reaped from crowdsourcing. For those of us in the possession of our own stack of handwritten documents needing transcription, the question is how we might go about creating our own online interface for the hosting of the images of the documents and facilitating transcription by various users in a way that allows us to maintain some control over the results. Two tools introduced during the session, FromThePage and Scripto, both meet this need.


The documentary transcription tool Scripto is the newest creation from the wonderful developers of Zotero and Omeka at the Center for History and New Media. Instead of creating a fully self-contained content management system to manage the documents which are to be transcribed they have, very wisely in my opinion, decided to make a library of scripts which can tie into existing platforms such as Omeka, Drupal, and soon WordPress. This allows a division of labor which enables them to focus on the key task, the transcription features, while allowing it to smoothly tie in with a much richer platform for the hosting of documents and other materials that can serve as the front end for a project. Like some of the other CHNM projects such as Omeka, Scripto as a tool was extracted from an existing application of the tool, the Papers of the War Department where you can see an earlier version of its features in action.

The project is in its early stages, but I was impressed at the two demos I have seen Sharon Leon give of Scripto at ThatCamp and this time at AHA. They have already incorporated some great features for the manipulation of the original document, and a wiki like interface that permits discussion on the transcription. I think it is safe to say that with the professionalism and solid institutional backing of CHNM behind it, Scripto is here to stay and will develop into a focused tool that is easy to install, use, and maintain. Learn more about the tool on their home page, or download the source from github.


FromThePage first came to my attention on the DPLA mailing list, where it received a number of compliments. The developer of this collaborative transcription tool, Ben Brumfield, did a great job at demonstrating his platform, which provides a clean and simple interface for viewing, transcribing, and text coding of keywords, people, and places in a collection of documents. The open source software, which can be hosted directly at, or set up on your own Ruby on Rails-friendly server, is clearly a work of love built by someone who set out to solve his own problem by developing a tool which many of us have only dreamed of. I understand that he is looking for collaborators and institutional support for the software going forward, and with an already powerful and functioning tool to offer, if I had to recommend one new project from the past year, I can’t think of one better than FromThePage.

The killer feature that takes FromThePage well beyond other transcription interfaces I have seen, including Scripto, is the powerful yet simple wiki-like annotation and indexing feature. Using simple double brackets, or an optional automatic suggested markup feature, simple transcriptions immediately become “tagged” (though only for items explicit in a text, rather than arbitrary tags) in a truly powerful way, allowing visitors to find documents through a index of subjects, places, or people – or jump immediately from linked subjects within a text to others like it in what becomes a rich hypertext environment.

Together with version control and integration with documents hosted in the Internet Archive, I’m truly amazed at what Ben has put together with what seems to be relatively little outside support. I would love to see this as a simple “one-click install” tool that anyone with an off-the-shelf hosting setup could add to their own project. With a few more developers working with him or some solid institutional support, this project has huge potential. Visit the project homepage for some examples of projects that have used the platform, or watch Ben’s screencast describing some of the features.

Ben put together a great list with a number of online transcription projects out there here, but are there other crowdsourced transcription tools and services out there, beyond the panel participants, which deserve mention?

Image: Diary, a Creative Commons Attribution (2.0) image from bdorfman’s photostream

Return to Top