The Review

Saving Texts From Oblivion: Oxford U. Press on the Google Book Settlement</I>

By Tim Barton

June 29, 2009

At a focus group in Oxford University Press’s offices in New York last month, we heard that in a recent essay assignment for a Columbia University classics class, 70 percent of the undergraduates had cited a book published in 1900, even though it had not been on any reading list and had long been overlooked in the world of classics scholarship. Why so many of the students had suddenly discovered a 109-year-old work and dragged it out of obscurity in preference to the excellent modern works on their reading lists is simple: The full text of the 1900 work is online, available on Google Book Search; the modern works are not.

We're sorry. Something went wrong.

We are unable to fully display the content of this page.

The most likely cause of this is a content blocker on your computer or network.

Please allow access to our site, and then refresh this page. You may then be asked to log in, create an account if you don't already have one, or subscribe.

If you continue to experience issues, please contact us at 202-466-1032 or help@chronicle.com

In describing books, the Scottish-American classicist Gilbert Arthur Highet once wrote, “These are not lumps of lifeless paper, but minds alive on the shelves.” In a world in which students consult not shelves but keyboards, too many of those lively minds remain out of sight, exiled to those shelves, where, every year, there is a virtual conflagration not unlike the fire at the ancient library at Alexandria, as last copies of precious books crumble slowly to dust, or are damaged, stolen, or lost.

What once seemed at least debatable has now become irrefutable: If it’s not online, it’s invisible. While increasing numbers of long-out-of-date, public-domain books are now fully and freely available to anyone with a browser, the vast majority of the scholarship published in book form over the last 80 years is today largely overlooked by students, who limit their research to what can be discovered on the Internet.

For most books published in the last 10 years or so, the picture is more heartening: University libraries provide students and scholars with access to a fair number of those works via services purchased directly from publishers and aggregators. Excerpts can often be viewed online free (but only as much as is allowed by publishers, with an eye toward generating sales). And many titles are available as e-books. Nonetheless, the vast majority of the scholarship published since 1923 (the date before which titles are in the public domain in the United States) is now effectively out of reach to the modern student.

As one of the world’s most prolific scholarly publishers, Oxford views as a core expression of its mission —and the responsibility of all scholarly publishers —the reactivation of publications long sidelined by the restrictions of a print-only existence. Five years ago, we published a complete, digitized back-archive of our journals, enabling access to four million pages spanning a century and a half of scholarship, and we recently began a project to extend that archive to include tens of thousands of our out-of-print books.

In doing so, we immediately found ourselves confronted with myriad issues of ever-mounting complexity and difficulty. Should we engage in destructive scanning, which destroys the original but yields better results less expensively, or nondestructive scanning, which is more expensive and less effective but spares the book? How should we best clear and clarify the rights, since older contracts understandably do not mention electronic rights? What should we do about the copyrighted materials from other sources that many of the books contain (a single edited volume can include the intellectual property of dozens of chapter contributors, the volume editor, series editors, and third parties whose work is featured in the form of photographs, tables, graphs, poetry, etc.)? What level and type of functionality and metadata (behind-the-scenes information about the content) is appropriate for such a product?

As publishers were grappling with those sorts of questions, so too was Google. For years, Google has been scanning the works found in some of the world’s best and largest scholarly libraries. Google’s stated plan was to allow “snippet” views of in-copyright works, which it believed constituted “fair use.” In the eyes of many authors, agents, and publishers, however, Google was doing so illegally. They complained vociferously, eventually filing two lawsuits, one a class action. Four years on, the parties to those lawsuits —the Association of American Publishers, the Author’s Guild, and Google —have proposed a settlement. Its fate, and the fate of the 10 to 20 million titles that Google is rumored to be scanning, will be decided by publishers, authors, and other “rights holders,” who have until September 4, 2009, to decide whether to be part of the settlement, and by U.S. District Court Judge Denny Chin, following a hearing on October 7. The Justice Department has indicated that it is looking into whether the deal violates antitrust laws, and no one knows what that bodes.

It has taken many months for the import of the settlement to become clear. It is exceedingly complex, and its design —the result of two years of negotiations, including not just the parties but libraries as well —is, not surprisingly, imperfect. It can and should be improved. But after long months of grappling with it, what has become clear to us is that it is a remarkable and remarkably ambitious achievement.

It provides a means whereby those lost books of the last century can be brought back to life and made searchable, discoverable, and citable. That aim aligns seamlessly with the aims of a university press. It is good for readers, authors, and publishers —and, yes, for Google. If it succeeds, readers will gain access to an unprecedented amount of previously lost material, publishers will get to disseminate their work —and earn a return from their past investments —and authors will find new readers (and royalties). If it fails, the majority of lost books will be unlikely ever to see the light of day, which would constitute an enormous setback for scholarly communication and education.

The settlement is a step forward in solving the problem of “orphan works,” titles that are in copyright but whose copyright holders are elusive, meaning that no rights holder can be found to grant permission for a title’s use. For such books, a professor cannot include a chapter in a course pack for students; a publisher cannot include an excerpt in an anthology; and no one can offer a print or an electronic copy for sale. Making those books available again is a clear public good. Google’s having exclusive rights to use them, as enshrined in the current settlement, however, is not.

If the parties to the settlement cannot themselves solve this major problem, then at a minimum Congress should pass orphan-works legislation that gives others the same rights as Google —an essential step if Google is not to gain an unfair advantage. Despite significant advocacy, Congress has failed to legislate on this issue for 20 years; we at Oxford hope the specter of Google having exclusive rights to use orphan works will spur heightened public debate and Congress to immediate action.

The majority of the lost and invisible titles of post-1923 scholarship, however, are in that state not because their copyright holders are unknown. They remain in limbo because of the enormous practical obstacles involved in bringing them back to life.

Given our own digitization project, Oxford may know more about the difficulties involved in rescuing such lost publications than most publishers. And Oxford is also better suited to undertake such a project than most: We have a full-time archivist who oversees an out-of-print library that has been in place for the last century, and we have a lot of experience in online academic publishing, with products ranging from the Oxford English Dictionary, online since 2000, to Oxford Scholarship Online, a service that allows Oxford to publish its frontlist in 16 disciplines more or less simultaneously in print and online. Even so, the task of tackling our long-out-of-print list has proved both formidable and daunting.

It is therefore not at all surprising that most publishers with smaller backlists have found it more fruitful to invest in new publishing, rather than in attempts to revive their older, inactive titles. At Oxford, our efforts flowed naturally from our mission: publishing works that further the University of Oxford’s objective of excellence in research, scholarship, and education. The returns to be made were highly uncertain, and we were unsure whether the revenue would repay the effort and expense.

This is the Good Book settlement, rather than the Google settlement. It is not solely Google’s agreement but is equally or more so the authors’, readers’, and publishers’ settlement. It promises revenue streams for Google, for sure, although it’s possible that one can exaggerate the money to be made from older backlist titles, and the split of the revenues are helpfully enshrined in the document. The pricing mechanisms and principles should ensure a reasonable approach, with the establishment of an independent Books Rights Registry, via which author and publisher representatives will set prices for the database of older titles, as well as decide about future business models.

Google’s core business is not e-book and database retailing, and it may be a reluctant entrant into this arena, having frequently stressed that it is not in the business of creating content. So why is Google willing to make a rumored $200-million investment in scanning and to tackle the practical issues involved in restoring to life so many books, when most publishers have eschewed that opportunity? Perhaps it is that Google is playing for advertising trillions rather than publishing billions. Investments that those seeking a return from publication could not make are more understandable when potential global-advertising revenue streams are at stake. We should note too that, in extending its business model in this way, Google provides authors, publishers, and readers with another important route to market.

For those not inclined to pay to access copyrighted material, Google will, per its original plan, serve up “snippets” from titles in the settlement, or more, if rights holders allow. The settlement will also permit anyone in a public or university library to have free and full access to the titles (albeit only at one computer terminal per library).

Some publishers will be unhappy about copyrighted material being made available for free: Publishers have a good argument about the need to protect copyright to secure revenues to support future publishing. But the settlement is a compromise for everyone, and publishers need to bear in mind that it exhibits a decent respect for copyright. In the same way that Apple’s iTunes created an alternative to the copyright theft of peer-to-peer software, the agreement establishes a framework in which intellectual-property rights will be acknowledged and respected, rather than ignored.

The settlement also allows for a great deal of flexibility about the participation of copyright holders. They lose only the right to sue Google, should they participate in the settlement; they can choose not to take part in any of its programs. Indeed, many publishers who decide to be part of the settlement may choose other means of electronically publishing their frontlist and/or backlist. The institutional database product will very likely end up resembling a Swiss cheese, with plenty of holes reflecting rights holders’ decisions to republish their work in other ways.

First and foremost, the settlement is about discovery: a basic restoration of books to our literary landscape that enables readers to find what they once would have missed. The database is unlikely to offer the functionality to which modern researchers have become accustomed: The scanning quality may be poor, and the capabilities for searching basic. Here at Oxford, for example, we are looking at our backlist archive project, and trying to work out what the settlement means for us. Many publishers will not have the mission nor the means to overcome the formidable obstacles involved in giving their print backlists an online life. But whether the lost scholarship is made available through the settlement or also through the activities of publishers, the means may be different, but the end is the same. The settlement gets authors, readers, and publishers farther and faster than if we had been left solely to our own devices.

To be clear, as noted above, the settlement is certainly not perfect and the solution to dealing with orphan works is particularly problematic: Google should not have the exclusive ability to exploit those works, and further refinement is needed to ensure that the Book Rights Registry can license those titles to others besides Google. Yet it also seems more likely that orphan-works legislation will be forthcoming if the settlement goes ahead. And it is important that all of the participants to the settlement, and especially Google, should now publicly commit themselves to supporting the needed new legislation in meaningful ways. We may also find the orphan-works issue diminishing in scale over time, as rights holders come forward, should the program be successful.

In any event, antitrust authorities must be both vigilant and responsive to any anticompetitive effects of the settlement and anticompetitive practices that may flow from it.

With the exception of orphan and any other unclaimed works, the Book Rights Registry will be able to license the scanned material to others, including Google’s competitors. Google is being unnecessarily cautious in restricting the registry from giving licensees better terms than the registry gives to Google. Google may consider the “most favored nations” clauses in the settlement not unreasonable, given the many millions it has spent on digitizing; while others who lost trust in Google when it announced its opt-out scanning project —requiring authors to ask not to participate —will see it as rewarding bad behavior. Such positioning aside, as the clear search engine of choice for the years that those clauses remain in force, Google does not need those provisions in order to protect its position and its investment. And it should trust the registry to do the right thing.

A lot depends on the Book Rights Registry, and there are justifiable concerns about the trust that the settlement places in that new institution. The choice of its first director, Michael Healy, executive director of the Book Industry Study Group, inspires confidence, but will the publishers and authors on the registry be sufficiently knowledgeable to represent the range of publishing for which they are responsible? Also, some formal means of securing the advice of the library community in the continuing operations of the registry is important and would be welcome.

One can make a case for the settlement by imagining how bad things will be if it fails. If the court rejects the settlement, the lawsuits could be abandoned. Or they will proceed, at great expense, and except for those on the extremes of the arguments, neither victory nor defeat is palatable. In both cases, the opportunity to bring back to life those rumored 10 to 20 million titles is lost. Victory for publishers and authors would halt Google’s scanning and use of in-copyright material, but neither would readily want to sue libraries who now possess the scanned files. Victory for Google would leave millions of scanned files at large, and authors and publishers more uncertain about investing time and money in new publishing.

We cannot now predict all of the places where the settlement will take us, which should make us understandably cautious. But even as we debate the important issues surrounding it, we must not shirk our responsibility to take forward-thinking, tangible steps now —today —by conjuring perilous futures and retreating to the safety of inaction and paralysis.

The settlement raises other interesting challenges: The scholarly world is drowning in information already, so we will need better paths through all this newly rediscovered older material. But what an enviable problem to face.

So we at Oxford University Press support the settlement, even as we recognize its imperfections and want it made better. As Voltaire said, “Le mieux est l’ennemi du bien,” the perfect is the enemy of the good. Let us not waste an opportunity to create so much good. Let us work together to solve the imperfections of the settlement. Let us work together to give students, scholars, and readers access to the written wisdom of previous generations. Let us keep those minds alive.