by

A Million Missing Words: The Search Is On

0643ab819a955508f89adf449cfd9dab_original

Medal for lexicographic valor would look something like this, only more Wordnik-y (Image courtesy of Wordnik)

They are the dark matter of the lexiverse — a million words of the English language not yet recorded in any dictionary.

Words like these: farecasting, deanling, domainer, hyperloop, unfuckulate, anachronym, smokescreening.

About a million words are already on record in works like the Oxford English Dictionary, with 600,000 words, and Merriam-Webster’s Unabridged, with 470,000 (many overlapping with the OED, of course). More are in online dictionaries like those of Vocabulary.com, “the world’s smartest, fastest dictionary,” and Wordnik, “the world’s biggest online English dictionary, by number of words.”

But just as many are missing. How do we know?

A few years ago, a sophisticated search of the 361 billion words in the digitized Google Books concluded “that 52 percent of the English lexicon — the majority of the words used in English books – consists of lexical ‘dark matter’ undocumented in standard references.” (The study, “Quantitative Analysis of Culture Using Millions of Digitized Books,” was published in Science in 2010.)

The study uncovered half a million undocumented words with a frequency of at least once in a billion, that is, 361 times or more in those Google Books. Wordnik is going after them, and not just those half a million; any word occurring even once in that 361 billion will be included in Wordnik’s search, making a total of about a million words to be sought and defined.

Wordnik looks particularly for instances of “self-defining words,” like this one for farecasting from a recent issue of The New York Times: “A handful of new and updated websites and apps are trying to perfect the art of what’s known as farecasting — predicting the best date to buy a ticket.”

In the venerable crowdsourced tradition of the Oxford English Dictionary, Wordnik has decided to ask the public’s help in finding these millions to make them lookupable. The nonprofit organization seeks $50,000 in donations through Kickstarter. The deadline for Kickstarter contributions is October 16, a.k.a. Dictionary Day in honor of Noah Webster’s (1758-1843) birthday.

Wordnik explains:

“Wordnik is not planning to write definitions, but is using data-mining and machine learning tools to find great, self-defining example sentences that make the meanings of these words clear.

“As rewards for our backers, we’re offering the opportunity to ‘adopt’ words, allowing the backer to publicly claim a little corner of English.

“Other rewards include being able to suggest words that Wordnik should research and add, the opportunity to be the voice of the audio pronunciation for the adopted word, and taking over the Wordnik Word of the Day email and feed for a week. (And if you’re really lost for words … Wordnik will even invent one for you!)

“Physical rewards include stickers, a poster, and an honest-to-God *medal* (for “lexicographic valor”).”

 

Return to Top