If audio content were as easy to search as text, linguists could analyze all the speech that has been recorded for the past 100 years with little hassle. No Google-like technology exists for searching all recorded speech yet, but linguists at the Universities of Oxford and of Pennsylvania are starting by making the equivalent of one year of speech easily searchable.
With quick access to such a large and varied collection of speech, linguists could more thoroughly analyze questions such as how social status affects dialect or what strategies people use to interrupt each other in arguments. A researcher could search for something as specific as “all examples of O’s in nouns spoken by women in Birmingham over the age of 40,” says John Coleman, director of the phonetics laboratory at Oxford.
He and Mark Y. Liberman, a professor of phonetics at Pennsylvania, have developed technology that will make 9,000 hours of British and American recorded speech searchable by queries such as context, phrase, vowel, or consonant. The sound bites include conversations among women talking about the day’s newspaper, people talking to their dogs, and political speeches. Mr. Coleman is using the British National Corpus, an audio collection recorded in the early 1990s, and Mr. Liberman is primarily working with selections from the Linguistic Data Consortium, a group of universities, companies, and government research labs that creates, collects, and distributes speech recordings.
Mr. Coleman and Mr. Liberman are modifying existing speech-recognition technology that aligns transcripts of the files with the audio component. Their version marks up words, or parts of words, and adds available context like the date of the recording, the setting, and details about the speakers. Using this kind of technology for research is unusual, and applying it to such a huge set of data is unprecedented.
The technology may also help achieve something more monumental, says Mr. Coleman: “It might help the field to settle upon standards for preparing audio material for large-scale distribution over the Web.”
Mr. Coleman explained that a large proportion of material on the Web is audio or visual, but that there is no efficient way to search for it. If you want to find a song, he says, you look up the name of the track or the artist with a text search. He envisions using segments of sound to identify other, similar segments. “What you want to do is use a bit of language to find another bit of language,” he says.