My recent post about using Amazon Mechanical Turk for the transcription of digital audio (a practice which may, or may not, be ethical) has left me thinking about other options for getting audio of spoken words transcribed into written words.
There are many reasons why you might not want to use the keyboard for composing text. You might suffer from carpal tunnel syndrome. You might want to add something to your to-do list but not happen to be next to your computer. You might have a conversation recorded already and want it to be available as text. And finally, you might be a geek like me who likes to see what’s possible with new hardware and software tools.
If you’re looking for ways to have humans do the transcription, then uploading your jobs to Amazon Mechanical Turk is not the only route: you could go with online services like CastingWords (which, it turns out, uses AMT), Purple Shark, or Kedrowski Transcription. These services are not cheap — much of this kind of work is driven by the fields of law and medicine, where clients can afford higher rates — but they promise accuracy and convenience.
There are also machine-based solutions, where software does the work of transcription. In this post, I cover a few of those options. Now I wish that I could say that there is a wide variety of companies working on the issue of speech recognition. However, it appears that one company — Nuance — has dominated the market with two desktop applications called Dragon NaturallySpeaking and MacSpeech Dictate, the iPhone app Dragon Dictation, and last year’s acquisition of web service Jott. All of the products from Nuance make use of the same speech recognition engine, which can be run on a server (to which you connect via your mobile device through Dragon Dictation or Jott) or on a desktop computer (using Dragon Naturally Speaking or MacSpeech Dictate).
First, I’ll continue the ProfHacker love affair with Google
before moving on to the offerings by Nuance. This service is free, which is great, but the accuracy of the transcriptions leaves something to be desired
. I’d expect the accuracy to improve as they continue to tweak the service.
With a Google Voice account
, you’ll find that any voicemail left for you is transcribed and then sent to your Gmail inbox. Need to write a paragraph or two? Call your Google Voice number and leave yourself a message. Need to transcribe a long audio clip? This is probably not the way to do it.
(Here’s the company’s 00:44 YouTube video
Like Google Voice, Jott
allows you to call a number and speak what you want transcribed. Unlike Google Voice, Jott is not free. Subscriptions are available in different plans. For $3.95 a month, subscribers get unlimited transcriptions of 15-second audio recordings. For $12.95 a month, you get unlimited transcriptions of 30-second audio clips. Finally, the pay-as-you-go option allows you to get a total of 5 minutes of audio transcribed (in 30-second clips) for $6.95.
Jott will connect to a wide variety of web services
, so if you want to add to your Remember the Milk
list, update your status on a social media site, or create an appointment on your Google Calendar, you can do so through a call to Jott. If what you’re looking for is something to allow you to dictate a long (or long-ish) message, though, then this is not the solution for you.
With this free iPhone app from Nuance
, you can speak into your iPhone or iPod Touch (for what appears to be a maximum of about 30 seconds) and your audio is sent to the company’s servers, where it’s processed and sent back to your device as written text. The time that it takes to do this is, in my limited experience, negligible. This is a good solution for composing a brief email, text message, or social media status update.
If you’re concerned about privacy, Dragon Dictation might make you a little nervous. Do they store your audio on their servers? What do they do with your list of contacts, which are uploaded to their servers in order to improve the accuracy of transcription? Mel Martin reports
that the company has assured him users’ data is safe, but it would be nice if the company were a little more explicit in what they tell users about this issue.
(Here’s the company’s 01:02 YouTube video
about Dragon Dictation.)
This desktop application
for the Mac environment works surprisingly well. I’ve used it occasionally and been impressed by the accuracy of its speech recognition. However, I’ve found it awkward to speak out loud the punctuation and paragraph breaks necessary for proper formatting with such a tool. The awkwardness would probably diminish with continued use, but I haven’t gotten there, yet.
MacSpeech Dictate is not cheap: it retails for $199 with a headset microphone, but you can buy it from Amazon for a less
(and probably from other online vendors, too).
Dragon Naturally Speaking
Because I’m a Mac user, I’ve never tried this desktop application
for the Windows environment, but my understanding is that it works in essentially the same way as MacSpeech Dictate.
The standard edition of Dragon NaturallySpeaking is more affordable than MacSpeech Dictate: without a headset, it retails for $99 but is available on Amazon
for a less; with a headset, it retails for $199 but you’ll pay less if you make your purchase from Amazon
or other online vendors.
What about you?
Do you use speech recognition tools? What’s been your experience?
[Creative Commons licensed photo by Flickr user Duchamp]