Google Begins to Scale Back Its Scanning of Books From University Libraries

March 09, 2012

Google has been quietly slowing down its book-scanning work with partner libraries, according to librarians involved with the vast Google Books digitization project. But what that means for the company's long-term investment in the work remains unclear.

Google was not willing to say much about its plans. "We've digitized more than 20 million books to date and continue to scan books with our library partners," a Google spokeswoman told The Chronicle in an e-mailed statement.

Librarians at several of Google's partner institutions, including the University of Michigan and the University of Wisconsin systems, confirmed that the pace has slowed. "They're still scanning. They're scanning at a lower rate than the peak," said Paul N. Courant, Michigan's dean of libraries.

At Wisconsin, the scanning pace is "something less than half of what it was" in 2006, the year the work started there, said Edward V. Van Gemert, the university's interim director of libraries.

Wisconsin's agreement with Google stipulated that the scanning would continue for at least six years or until half a million works had been digitized. "We anticipated this slowdown," he said.

It will be six years as of October 2012, and 600,000 volumes have been digitized so far, Mr. Van Gemertestimates. "It would have been next to impossible for the library to come up with the resources to digitize that amount of material," he said. "So I really cast the partnership as being highly successful at a time when digitization was highly needed." He credited Google's work with helping the partner libraries and others create the HathiTrust digital repository, which now contains more than 10 million scanned volumes. That, he said, "has allowed us to think differently about out-of-copyright material and the preservation of resources in our collections."

According to Mr. Courant at Michi­gan, the slackening pace reflects a natural maturation of the project. "They've done about 5.5 million volumes from our collections," he said. That means "the pickings are getting kind of slim if you're worried about duplication" with what Google has scanned from other library partners.

When the work began, "Google would come in and take things by the stack row," he said. Now they've switched to a book-by-book model, scanning only volumes that fill gaps in what's been digitized so far.

Some institutions struck agreements with Google to scan only specific collections. Much of that work has now wrapped up. The Univer­sity of Texas at Austin, for instance, signed on to have Google digitize its Latin American collection—about half a million volumes, said Fred Heath, vice provost and director of the University of Texas Libraries.

"We were not interested in a situation where we'd have to pick from the 10 million volumes in all of the libraries and have to ship them and then refile them," Mr. Heath said.

Google completed the work far more quickly than the university could have done by itself, according to Mr. Heath. "We figured we could do it in a hundred years," he said. Google did it in two. "They were in and out with method and efficiency and no loss" of materials, he said.

For now, the work has slowed down but continues at Michigan and Wisconsin and other institutions with whom Google has open-ended arrangements. Mr. Courant expects "it will continue for the indefinite future."

Google isn't saying whether it has pulled back from its longstanding goal of collecting all of the world's knowledge. Some of its digitization efforts have shifted to Europe. Much of the company's public focus lately has been not on mass digitization but on how to use individuals' data to create more focused advertising and online browsing. Meanwhile, a copyright-infringement lawsuit brought against it by authors' and publishers' groups drags on. Hathi­Trust and five universities, including Michigan's and Wisconsin's, face their own challenge from the Authors Guild and other groups over control of the scanned works.

The legacy of the Google scanning depends in part on what happens in court and "the ability of the libraries and the rights holders to come to agreement" on how best to use the wealth of digitized material, Mr. Courant said.