Now that a judge has rejected the Google Books settlement, one of the unanswered questions is what will happen to universities dreams’ of conducting research on the huge archive that Google has created.
For humanists and others interested in such “Big Data” research, the answer got a little clearer this week. Several of Google’s university book-digitization partners announced plans to build a new center for computational research on millions of digitized texts, many of them scanned by Google.
The Google Books settlement, scuttled last month, would have permitted the use of millions of in-copyright works owned by universities for “nonconsumptive” computational research, meaning large-scale data analysis that is not focused on reading texts. For example, researchers can mine such databases to study how the English language has grown or how rapidly humanity is forgetting its history. Under the legal settlement, Google had pledged to invest $5-million on one or two centers created for this kind of research.
With the Google project in legal limbo, Indiana University and the University of Illinois are moving forward with plans to set up a similar research center built around the archive maintained by the HathiTrust Digital Library, which was created by a consortium of universities, in part, to establish a stable backup of the books that Google digitized from their libraries. The new research center will initially focus on works that are no longer protected by copyright—roughly 2.3 million books in HathiTrust’s 8-million-plus collection.
“Right now, the safe path is working with the public-domain materials,” said John Wilkin, executive director of HathiTrust. “That’s a phenomenally large amount of material.”
Researchers will not need to be affiliated with Hathi member institutions to access the center, Mr. Wilkin said.