> Skip to content
FEATURED:
  • Student-Success Resource Center
Sign In
  • News
  • Advice
  • The Review
  • Data
  • Current Issue
  • Virtual Events
  • Store
    • Featured Products
    • Reports
    • Data
    • Collections
    • Back Issues
    • Featured Products
    • Reports
    • Data
    • Collections
    • Back Issues
  • Jobs
    • Find a Job
    • Post a Job
    • Career Resources
    • Find a Job
    • Post a Job
    • Career Resources
Sign In
  • News
  • Advice
  • The Review
  • Data
  • Current Issue
  • Virtual Events
  • Store
    • Featured Products
    • Reports
    • Data
    • Collections
    • Back Issues
    • Featured Products
    • Reports
    • Data
    • Collections
    • Back Issues
  • Jobs
    • Find a Job
    • Post a Job
    • Career Resources
    • Find a Job
    • Post a Job
    • Career Resources
  • News
  • Advice
  • The Review
  • Data
  • Current Issue
  • Virtual Events
  • Store
    • Featured Products
    • Reports
    • Data
    • Collections
    • Back Issues
    • Featured Products
    • Reports
    • Data
    • Collections
    • Back Issues
  • Jobs
    • Find a Job
    • Post a Job
    • Career Resources
    • Find a Job
    • Post a Job
    • Career Resources
Sign In
ADVERTISEMENT
News
  • Twitter
  • LinkedIn
  • Show more sharing options
Share
  • Twitter
  • LinkedIn
  • Facebook
  • Email
  • Copy Link URLCopied!
  • Print

The Spoken Word, Searchable for Scholarship

By  Mary Helen Miller
May 28, 2010

If audio content were as easy to search as text, linguists could analyze all the speech that has been recorded for the past 100 years with little hassle. No Google-like technology exists for searching all recorded speech yet, but linguists at the Universities of Oxford and of Pennsylvania are starting by making the equivalent of one year of speech easily searchable.

With quick access to such a large and varied collection of speech, linguists could more thoroughly analyze questions such as how social status affects dialect or what strategies people use to interrupt each other in arguments. A researcher could search for something as specific as “all examples of O’s in nouns spoken by women in Birmingham over the age of 40,” says John Coleman, director of the phonetics laboratory at Oxford.

We're sorry. Something went wrong.

We are unable to fully display the content of this page.

The most likely cause of this is a content blocker on your computer or network.

Please allow access to our site, and then refresh this page. You may then be asked to log in, create an account if you don't already have one, or subscribe.

If you continue to experience issues, please contact us at 202-466-1032 or help@chronicle.com

If audio content were as easy to search as text, linguists could analyze all the speech that has been recorded for the past 100 years with little hassle. No Google-like technology exists for searching all recorded speech yet, but linguists at the Universities of Oxford and of Pennsylvania are starting by making the equivalent of one year of speech easily searchable.

With quick access to such a large and varied collection of speech, linguists could more thoroughly analyze questions such as how social status affects dialect or what strategies people use to interrupt each other in arguments. A researcher could search for something as specific as “all examples of O’s in nouns spoken by women in Birmingham over the age of 40,” says John Coleman, director of the phonetics laboratory at Oxford.

He and Mark Y. Liberman, a professor of phonetics at Pennsylvania, have developed technology that will make 9,000 hours of British and American recorded speech searchable by queries such as context, phrase, vowel, or consonant. The sound bites include conversations among women talking about the day’s newspaper, people talking to their dogs, and political speeches. Mr. Coleman is using the British National Corpus, an audio collection recorded in the early 1990s, and Mr. Liberman is primarily working with selections from the Linguistic Data Consortium, a group of universities, companies, and government research labs that creates, collects, and distributes speech recordings.

Mr. Coleman and Mr. Liberman are modifying existing speech-recognition technology that aligns transcripts of the files with the audio component. Their version marks up words, or parts of words, and adds available context like the date of the recording, the setting, and details about the speakers. Using this kind of technology for research is unusual, and applying it to such a huge set of data is unprecedented.

The technology may also help achieve something more monumental, says Mr. Coleman: “It might help the field to settle upon standards for preparing audio material for large-scale distribution over the Web.”

ADVERTISEMENT

Mr. Coleman explained that a large proportion of material on the Web is audio or visual, but that there is no efficient way to search for it. If you want to find a song, he says, you look up the name of the track or the artist with a text search. He envisions using segments of sound to identify other, similar segments. “What you want to do is use a bit of language to find another bit of language,” he says.

We welcome your thoughts and questions about this article. Please email the editors or submit a letter for publication.
Technology
ADVERTISEMENT
ADVERTISEMENT

Related Content

  • The Humanities Go Google
  • Scholars Scale Up Music Studies
  • The Rise of Crowd Science
  • ‘Crowd Tracking’ the Gulf Oil Spill
  • Crowdsourcing, a Honey of an Idea
  • Explore Content
    • Latest News
    • Newsletters
    • Letters
    • Free Reports and Guides
    • Professional Development
    • Virtual Events
    • Chronicle Store
    • Chronicle Intelligence
    • Find a Job
    • Post a Job
    Explore Content
    • Latest News
    • Newsletters
    • Letters
    • Free Reports and Guides
    • Professional Development
    • Virtual Events
    • Chronicle Store
    • Chronicle Intelligence
    • Find a Job
    • Post a Job
  • Know The Chronicle
    • About Us
    • Write for Us
    • Work at The Chronicle
    • Our Reporting Process
    • Advertise With Us
    • Brand Studio
    • DEI Commitment Statement
    • Accessibility Statement
    Know The Chronicle
    • About Us
    • Write for Us
    • Work at The Chronicle
    • Our Reporting Process
    • Advertise With Us
    • Brand Studio
    • DEI Commitment Statement
    • Accessibility Statement
  • Account and Access
    • Manage Your Account
    • Manage Newsletters
    • Individual Subscriptions
    • Institutional Subscriptions
    • Subscription & Account FAQ
    Account and Access
    • Manage Your Account
    • Manage Newsletters
    • Individual Subscriptions
    • Institutional Subscriptions
    • Subscription & Account FAQ
  • Get Support
    • Contact Us
    • Reprints & Permissions
    • User Agreement
    • Terms and Conditions
    • Privacy Policy
    • California Privacy Policy
    • Do Not Sell My Personal Information
    Get Support
    • Contact Us
    • Reprints & Permissions
    • User Agreement
    • Terms and Conditions
    • Privacy Policy
    • California Privacy Policy
    • Do Not Sell My Personal Information
1255 23rd Street, N.W. Washington, D.C. 20037
© 2023 The Chronicle of Higher Education
  • twitter
  • instagram
  • youtube
  • facebook
  • linkedin