[This is a guest post by Michelle Moravec, a historian currently working on the politics of women’s culture, which you can read about at michellemoravec.com. Follow her on Twitter at @professmoravec.--@JBJ]
I think I need to write How not to be a tool about your tool: tales of an indiscriminate tool adopter #digitalhumanities #dhist
— M.M. (@ProfessMoravec) January 14, 2014
If you participate in social media and do digital humanities work, this situation may sound familiar. Trawling through Twitter, someone mentions a bright, shiny tool and off you go, down the rabbit hole. Repeat, frequently, and the hours add up. Over the past three years, relying heavily on the hive mind of social media, I’ve adapted and discarded a wide range of tools.
This is how I became an indiscriminate tool adopter.
Confession time: my project, Visualizing Schneemann, is a hack inspired by Mapping the Republic of Letters. Visualizing Schneemann, which explores the artist’s edited correspondence, became a sort of proof of concept project for me. How much could I do using (mostly free) off-the--shelf tools in the very short, twelve week, timeframe I had to complete the project?
Insight #1 spend money wisely
My projects almost always begin with converting pdf to plain text files. I initially attempted to use Adobe Export PDF (Tool #1, $19.99), but the files required extensive hand cleaning due to poor OCR. A tweet from Josh Honn directed me to ABBYY FineReader (Tool #2, $99). I ran the free trial and saw it produced excellent results. The time saved more than justified the money spent. I only wish I had not tried to cheap out with the lower cost option as that work all had to be re-done.
Insight #2 Don’t just use the popular tool
To visualize various relationships within the letters, I started with Gephi (Tool #3, free). Gephi is very cool, but it has a steep learning curve. I persevered, until I saw a tweet from Elijah Meeks, about Raw. Cursory investigation revealed that it was more than sufficient for my needs. I abandoned Gephi because Raw (Tool #4, free) was far quicker.
Insight #3 Google or ask for help when you need it
To get to deeper into the letters, I experimented with Stanford University’s natural language processing named entity recognition software (Tool #5, free). The program appears daunting, but it has a very user-friendly interface. I tagged the names, places, and organizations mentioned in letters. Extracting those tags proved more difficult. Googling found me William G. Turkel’s excellent process for extracting them (Tool #6, free). At this point, time was getting very short, so I asked someone else more familiar with linux to do it for me. I quickly hand cleaned those results, yielding spreadsheets.
Insight #4 A better tool will always come along
To display the data from the NER, I could use Raw, but the locations cried out for a map. I went with Google Map Engine pro (Tool #7, $5/month) because it was the first cheap and easy solution I found. It worked, although I had to convert the locations to latitude/longitude and feed them into the program. However, I later learned, again via a tweet from Liz Timbs, about a far more visually sophisticated tool TimeMapper (Tool #8, free) that creates lovely spatial/temporal visualization and includes conversion to latitude and longitude.
Insight #5 Invest your time in learning methods not tools
However that still hadn’t got me to the deeper content of the letters. For that I used corpus linguistics, which I learned about from Heather Froehlich several years ago. Antconc, the tool I use for corpus linguistics, is not hard and there are many excellent YouTube video’s by toolmaker Laurence Anthony (Tool #9, free) but the methodology is complex. Two years into doing corpus linguistics and I am still learning. There is currently an excellent MOOC that builds on the many open resources of University Centre for Computer Corpus Research on Language at Lancaster University.
I find it encouraging that an indiscriminate tool adopter like me can produce a project in just a little under three months using less than ten tools. Due to the small sample size of the letters, the project isn’t complete, but it worked well enough to justify the investment of more time, not learning more tools, but to go back into the archives to digitize more letters myself.
Do you have strategies for quickly assimilating tools for quick projects? Please share in comments!
Photo “Down the Rabbit Hole” by Flickr user pasukaru76 / Creative Commons licensed BY-2.0