The Chronicle Review

In the Digital Era, Our Dictionaries Read Us


Peter Sokolowski, editor at large at Merriam-Webster Inc.
March 11, 2013
In the Digital Era, Our Dictionaries Read Us


Peter Sokolowski, editor at large at Merriam-Webster Inc.

For Peter Sokolowski, a high-profile event like the 9/11 attacks or the 2012 vice-presidential debate is not just news. It's a "vocabulary event" that sends readers racing to their dictionaries.

Sokolowski is editor at large for Merriam-Webster, whose red-and-blue-jacketed Collegiate Dictionary still sits on the desk of many a student and editor. In a print-only era, it would have been next to impossible for him to track vocabulary events. Samuel Johnson, the grand old man of the modern dictionary, "could have spent a week or a month writing a given word's definition and could never have known if anyone read it," he says.

Today, Sokolowski can and does monitor what visitors to the Merriam-Webster Web site look up—as they're doing it.

With the spread of digital technologies, dictionaries have become a two-way mirror, a record not just of words' meanings but of what we want to know. Digital dictionaries read us.

The days of displaying a thick Webster's in the parlor may be past, but dictionaries inhabit our daily lives more than we realize. "There are many more times during a day that you are interacting with a dictionary" now than ever before, says Katherine Connor Martin, head of U.S. dictionaries for Oxford University Press. Whenever you send a text or an e-mail, or read an e-book on your Nook, Kindle, or iPad, a dictionary is at your fingertips, whether or not you're aware of it.

For dictionary makers, going electronic opens up all kinds of possibilities. It's not just that digital dictionaries can be embedded in the operating systems of computers and e-readers so that they're always at hand. They can be updated far more easily and often than their print cousins, and they can incorporate material like audio pronunciations and thesauruses. Unsuccessful word "look-ups," or searches that don't produce satisfying results, can point lexicographers to terms that haven't yet made their way into a particular dictionary or whose definitions need to be amended or freshened. Online readers can click a button and contribute their own word lore, extending a tradition that dates back at least as far as the late 19th century, when James Murray and his team compiled the first Oxford English Dictionary with the help of thousands of word slips sent in by the public.

Merriam-Webster Inc. began to track what words readers search for in 1996, when it first moved some of its dictionary content online.

"The first thing we noticed were these enormous spikes of interest around a big news event," beginning with Princess Diana's death and funeral in 1997, Sokolowski says.

The royal tragedy triggered searches on the Merriam-Webster Web site for "paparazzi" and "cortege." When Michael Jackson died in 2009, "emaciated" became the most-looked-up word of the following month—July—and the second-most-looked-up word of the year. ("Admonish" took first place, Sokolowski recalls, after the White House said it would "admonish" Rep. Joe Wilson for interrupting a speech by President Obama.)

Look-ups during a major news event suggest cultural narratives. "There's something sort of poignant about what people were seeking in lexicographical terms after 9/11," he says. In the immediate aftermath, people looked up words associated with the direct, visceral nature of the event: "rubble," "triage." In the days and weeks after the attacks, as the country reacted to and tried to make sense of what had happened, users sought out more philosophical or abstract words like "surreal."

In the Digital Era, Our Dictionaries Read Us

Stuart Bradford for The Chronicle Review

More recently, when Vice President Joe Biden dropped "malarkey" ("insincere or foolish talk: bunkum") into his debate with Paul Ryan, the GOP vice presidential candidate in October 2012, look-ups of that colorful word surged on the Merriam-Webster site. The pattern reflects the public's strong interest in "public and pointed utterances," Sokolowski says.

To track that interest, he live-tweets major political debates as well as events like the National Spelling Bee. The dictionary also carries a weekly Trend Watch feature on its Web site that allows readers to see the most-searched-for words. "Plantagenet," for instance, has made a strong showing since the news broke that researchers in Britain had identified the bones of Richard III.

Sokolowski can't always figure out what specific event or public utterance causes look-ups of a particular term to soar. In such cases, he often asks on Twitter for clues: whether such-and-such a word just aired on a TV show, for instance, if he notes a spike in look-ups of it during prime time.

Some look-up patterns suggest what people are doing at the time. For instance, traffic to the mobile Merriam-Webster site "increases substantially after work hours," with the word "qi" among the most-looked-up words on the mobile site, Mr. Sokolowski says. "A reasonable conclusion is that people use their smartphones to look up words more often when they are away from work and that they play Scrabble or Words With Friends when not at the office during the day."

Word lovers need not fear for their privacy, though. Sokolowski does not track the identities of dictionary users. "I don't care at all who's looking it up," he says. "I'm simply looking at raw numbers."

The usage patterns Merriam-Webster's team tracks could be fascinating for language scholars to analyze, but so far those patterns have been examined only for in-house research. (Sokolowski points out that he and many of his colleagues are "academic refugees" with literature or linguistics backgrounds.) Outside researchers have not asked to use the look-up records, according to Sokolowski. It's not clear they'd be able to use the information if they did ask. Merriam-Webster considers it "valuable proprietary data, and we do not make it freely available to the public," he says. But, he adds, "it is conceivable that under the right circumstances we might try to find ways to work with qualified researchers and scholars."

No dictionary commands more respect than the Oxford English Dictionary. The second edition of the OED came out in 1989. A couple of years ago, a rumor spread that Oxford would not produce a print version of the third, which has been in the works for years. (Major new editions of dictionaries are not quick-turnaround projects.) According to Martin, the company's head of U.S. dictionaries, the third edition won't be completed for a decade or more, and it's far too early to say whether there will or won't be a print incarnation. "If you ask anyone who's working on a big dictionary project right now, it's the same," she says. By the time the third edition is finished, she jokes, "we may all be communicating through chips in our brains."

Oxford's dictionaries, including the OED, already have a strong online presence, though, and Martin is enthusiastic about the many possibilities digital dictionaries present.

"I can tell you it's a robustly growing business," she says. "There are some people who still really like their dictionaries in print," but that's not where the growth is. According to Martin and others, Oxford and other major dictionary publishers have been pursuing partnerships with Amazon, Apple, and other big players on the digital scene. For instance, American and British Kindle users probably don't know it, but their devices come with Oxford's New Oxford American Dictionary and its New English Dictionary embedded. "Right now the race among publishers is to have their product embedded in these platforms," says Steve Kleinedler, executive editor for the reference group at Houghton Mifflin Harcourt, which publishes the American Heritage Dictionary and recently acquired Webster's New World Dictionary.

Most dictionary publishers haven't yet gone as far as Macmillan Education, which announced in November that it would no longer make print dictionaries at all. "Exiting print is a moment of liberation, because at last our dictionaries have found their ideal medium," Editor in Chief Michael Rundell said when the news was announced.

This month Merriam-Webster unveils a new Web site for its subscription-only unabridged dictionary, a product largely supported by universities and libraries. According to Sokolowski, this marks the first time the company has really considered its print and online products as two distinct entities. "Some of the changes we're making are substantial, and there are really good reasons to do it," he says. The company will update the online Unabridged several times a year to keep it as timely as possible. He notes that the first release of updates took place March 1 and includes about 5,000 newly defined words, 100,000 new author quotations, and 200 new paragraphs on usage, not to mention "updates to thousands of existing entries." Beyond that, lexicographers have room to stretch online and add more usage guidelines and examples when they're writing new entries. That all means the online dictionary "will grow increasingly distinct" from its print relative, Sokolowski explains.

Despite the Macmillan editor in chief's argument that digital is ideal for dictionaries, no medium is perfect. Print offers pleasures that pixels don't. It's hard to electronically recreate the joy of browsing a printed page of definitions and "finding something you didn't know you were looking for," Martin says.

Dictionary makers are working on electronic simulacra. If you're using the Merriam-Webster phone app, for instance, you can turn your device horizontally and get a scrolling list of words that mimics browsing in the vicinity of a word in a print dictionary.

As Martin sees it, there are compensations for what's lost in the jump from page to screen. Online look-ups liberate dictionary users from the straitjacket of the alphabet. "For most of its life, the dictionary has been limited by alphabetical order," she says. "That was the default way to navigate through the text."

No longer. Now that the OED has an online presence, readers can explore its accumulated linguistic riches in ways that don't depend on A-B-C order. For example, the OED has played to its strength as a historical dictionary, which preserves the past uses and meanings of words, by integrating its historical thesaurus. That "puts all of the enormous content of the OED into a taxonomic structure," Martin explains. "So if you wanted to see all the terms for, say, a loose woman that were used in the 19th century, with a couple of clicks you could get all that information. It helps unlock the dictionary in a new way."

That can be a huge boon for historians, linguists, novelists, screenwriters, and anybody with an interest in how language shifts and changes.

Blending once-discrete references online creates a "kind of blossoming map of words and meaning" that readers can explore, says Ben Zimmer, a linguist and executive producer of the Web site Visual Thesaurus and its sister site He chairs the New Words Committee of the American Dialect Society and writes columns on language for The Boston Globe. "Dictionaries are not just static entities anymore," he says. "You have to be able to react to current events, how people are going to look things up."

On, Zimmer and his colleagues serve up not just a standard dictionary definition but what he calls "blurbs," chattier and sometimes whimsical explanations designed to help a reader understand and remember what he or she looks up. Look up "hirsute," for instance, and you get this: "What do Santa Claus, Bigfoot, and unicorns have in common? Aside from the fact that they're completely real, they're also hirsute: very, very hairy creatures," the site explains. "The word is pronounced 'HER-suit,' so if you see a woman wearing a furry jacket with matching pants, you could say, "Her suit is hirsute." Just make sure it's actually a suit and not her real hair."

Like online versions of print dictionaries, sites like also give users the sounds as well as the meanings of words. (Trained opera singers "are perfect for this kind of work," Zimmer says. "They know how to enunciate.") And in the handy bells-and-whistles category, quizzes and other extras reflect the enthusiasm for language-learning games that's taken hold among students and educators, he says. "You have to meet young learners on the terrain they're comfortable with."

The flexibility and expansiveness of digital dictionaries allow their makers to adapt more quickly to current usage, as well as to changes in science, technology, and culture. "The big advantage is that we issue updates" twice a year, says Kleinedler of Houghton Mifflin Harcourt.

Kleinedler's team undertakes periodic subject-area reviews to make sure that the dictionary reflects the latest terms and discoveries in certain fields. In the past six months, the publisher has updated a quarter of the biochemistry terms in the American Heritage Dictionary, he estimates. For instance, it added "aromatase inhibitor" and "docosahexaenoic acid" and will add "cathinone" and "prohormone" this spring.

The updates go beyond new words, though. "There's a lot of existing terminology that gets revised," Kleinedler says. Occasionally definitions need to be updated to do away with old cultural biases. Words like "tan" or "windburned," for instance, might have been defined with a white-skin bias in older dictionaries.

If the digital environment puts an abundance of material within easier reach of dictionary users, it channels useful information back to dictionary makers as well. "It's become a new way for us to identify gaps in our coverage," Oxford's Martin says. For instance, Oxford's lexicographers noticed that people were searching for compounds like "departure lounge" that spring up in everyday life before they make it into a dictionary. That kind of passive feedback "was never available to dictionary makers in the print age," she says.

Lexicographers have long counted on field research and what we now call crowdsourcing to collect examples of words and usages. Those practices can be continued and expanded online.

Most online dictionaries invite readers to nominate new words and slang. The OED maintains an "appeals" page where it asks readers to submit earlier records of words the editors are documenting. American Heritage's Open Dictionary Project calls on readers to suggest new words for consideration.

Beyond crowdsourcing, the digital era makes it easier to pull together "corpus data," large amounts of linguistic evidence that lexicographers draw on to analyze parts of speech and grammatical relationships between words. This is Big Data, dictionary style.

With good corpus data and the tools to analyze them, lexicographers can spot differences in idiomatic use from era to era and from place to place. Think of the differences between American English and British English. "To monitor changes like 'snuck' as the past tense of 'sneak,' which is historically irregular but has become very much accepted in American English, we're using very large segments of texts to make judgments about what's happening over time," Martin says. The texts used can be almost any form of written expression: specialized academic journals, blog posts, newspaper articles, and more.

Via e-mail, Ingrid Goldstein, Oxford's head of language technology, described the process of gathering linguistic data. Her team trawls or "spiders" the Web in search of fresh material for the corpus. "We remove foreign-language material, and remove repeated stretches of text," she explains. "We analyze the texts in order to make further information about each word available," like identifying which part of speech it is.

After these "preparatory processes," the gathered material is fed into what Goldstein calls "a corpus tool" that allows lexicographers to do many things with the words: conduct basic searches, retrieve concordances, and summarize the "collocational behavior" of each word—meaning how it's arranged with or works alongside other words. (From collocation, defined on the free Oxford Dictionaries Web site as "the habitual juxtaposition of a particular word with another word or words with a frequency greater than chance.")

For all the digital tools and enhancements they have to work with, dictionary makers still do a great deal of their work by hand and by eye. "It's like taking a census of the language," Sokolowski says. At Merriam-Webster, machine-reading of data comes after human reading. "We read everything. We read as much as we can," he says. "We do something which is a little anachronistic, which is we mark it by hand. We notice new words, new uses of old words."

What they notice gets entered into the Merriam-Webster database. Electronic corpora and tools support and enhance what the editors pick up on. "We just find that relying on editors to edit is the most efficient way," he says. "It also works for presenting the information, because the reader isn't an algorithm either."

The more embedded the dictionary is in our lives and devices, the more useful it will be to the casual reader—and the less likely we are to think about it. The differences among dictionaries are harder to see and appreciate when you're discovering words via a Google search.

Students, in particular, often don't discriminate between sources. An undergraduate who just wants a quick definition is much more likely to turn to than to the OED, and won't see much difference between the two, says Michael Hancher, a professor of English at the University of Minnesota-Twin Cities, who organized a panel on digital dictionaries at this year's Modern Language Association meeting. "The evidence is that they make casual use of the online resources," he says.

To anyone who has paged through a thick dictionary, that sounds like a radical departure from the look-up practices of the past. But to Lisa Berglund, a professor of English at Buffalo State College, dictionaries' shift online continues an expansive Anglo-American tradition that dates back to the 18th century, when Samuel Johnson helped regularize English spelling in his great dictionary, and the early 19th, when Noah Webster created a linguistic resource for a new country. "Webster's dictionary was the best-selling book in America, after the Bible, for decades," Berglund says. It helped provide "the basis for a shared community, a shared basis of knowledge, in forming a new nation."

Dictionaries are not created equal, though, and the most readily found definition will not always be the most robust or up-to-date. Who's behind the definition that turns up on a quick Google search or embedded in your digital device? Online it can be very hard to tell how reliable a source is. Berglund serves as the executive secretary of the Dictionary Society of North America, and she thinks it's more vital than ever to equip students with the literacy skills to be able to distinguish a good source from a mediocre one. Whatever form future dictionaries take, she wants professors as well as their students to take them seriously. "We tend to forget that the dictionary is one of the most valuable tools for humanistic study," she says.

Berglund sees many advantages to online dictionaries. But some aspects of the print experience will never make the leap online.

"You lose things, you gain things," Berglund says. "You can't use your computer dictionary as a doorstop. You can't press flowers in it. You can't necessarily insert your Valentine into the page next to the word 'Love,' which is the kind of thing people do."

Correction (3/11/2013, 12:46 p.m.): This article originally stated incorrectly what part of the American Heritage Dictionary has been updated in the past six months. It is a quarter of the biochemistry terms, not a quarter of the entire dictionary. The article has been updated to reflect this corrrection.

Words Frequently Searched in 'Merriam-Webster' Online

When Merriam-Webster Inc. first put some of its dictionary's content online, in 1996, the editors began to track what words readers searched for. Specific events appear to lead people to search for related words. Here are some words that saw spikes in searches, followed by the events that seem to have triggered those spikes.

cortege (Princess Diana's death)

rubble (September 11, 2001, attacks)

surreal (aftermath of September 11 attacks)

emaciated (Michael Jackson's death)

malarkey (Vice President Joe Biden's debate with Paul Ryan)

Plantagenet (discovery of Richard III's bones)

Jennifer Howard is a senior reporter for The Chronicle.