A group of major universities has been quietly working for the past two years to build one of the largest online collections of books ever assembled, by pooling the millions of volumes that Google has scanned in its partnership with university libraries.
One of the most important functions of the project, say its leaders, who plan to unveil the giant library today, is to create a stable backup of the digital books should Google go bankrupt or lose interest in the book-searching business.
The project is called HathiTrust, and so far it consists of the members of the Committee on Institutional Cooperation, a consortium of the 11 universities in the Big Ten Conference and the University of Chicago, and the 10 campuses in the University of California system. The University of Virginia is joining the project, it will be announced today, and officials hope to bring in other colleges as well.
All of the member universities participate in Google’s ambitious effort to work with major libraries and with publishers to scan all the world’s books. As part of the partnership, Google employees borrow and scan millions of volumes from each participating library to add them to their Google Book Search, and in return each library gets a digital copy of each of its scanned volumes. Google first announced the library program in 2005 (The Chronicle, January 7, 2005), and it has been steadily adding partners ever since.
Each university library originally planned to manage the digital copies of the scanned books on its own, but through HathiTrust, library officials are now working together to create a shared online collection.
Making Collections Last
“Google won’t be around forever,” said John P. Wilkin, an associate university librarian for the University of Michigan at Ann Arbor and executive director of HathiTrust. “This is a commitment to the permanence of the materials,” he said, noting that libraries have been around longer than any technology company has. “We’ve been doing this for a couple of hundred years, and we intend to continue doing it.”
Already HathiTrust contains the full text of more than two million books scanned by Google.
But there is an important catch. Because most of the millions of books are still under copyright protection, the libraries cannot offer the full text of the books to people off their campuses, though they can reveal details like how many pages of a given volume contain any passage that a user searches for.
Google follows a similar policy for books it scans, allowing only brief sections of copyrighted works to be displayed in search results. Even so, publishing groups have sued Google for making digital copies of books available without their permission (The Chronicle, October 28, 2005), though some experts now predict that the disputes will be settled out of court.
Only about 16 percent of the books in HathiTrust—or about 327,000 volumes—are out of copyright so that their full text can be delivered to all readers.
And even though the books are in a shared library, officials have not yet set up a global search feature, said Mr. Wilkin. So the only way to search the HathiTrust books now is through each participating university library’s search engine. And some participants have not even added their materials yet.
Mr. Wilkin said a search engine will be added to the project’s home page soon, and that members are quickly working to “ingest” their digital books into the shared library.
The librarians have already added one feature that some library leaders have been calling on Google to provide—a better sense of exactly what is in the collection. Google has refused to release such details, but HathiTrust publishes online a list, updated daily, of what is in its collection.
The librarians plan to work together to create new services to search and display the digital books that Google might not provide for its copies.
One such service will be to make content available to blind readers through screen-reading technology or Braille readers. And Mr. Wilkin said that planners hope to one day create new features that make it easier to browse the vast collection.
“Everyone bemoans the loss of the ability to browse the library” stacks in print, Mr. Wilkin said. “I think we can return to the user the browsing experience online.”
So why call the project “Hathi” (pronounced hah-TEE)—the Hindi word for elephant?
“The name resonated really well because elephants remember, elephants are large, and elephants are strong,” said Bradley C. Wheeler, chief information officer at Indiana University system. That system, along with Michigan, is taking on much of the initial costs of building the giant digital archive, though each member library will manage adding its materials to the collection.