A library of 8.7 million digital volumes. A trove of 100,000 ocean-science photos. An archive of 57,000 Mexican-music recordings.
A common problem bedevils those different university collections. Wide online access is curtailed, in part because they contain “orphan works,” whose copyright owners can’t be found. And the institutions that hold the collections—a consortium of major research libraries and the University of California campuses at San Diego and Los Angeles—must deal with legal uncertainty in deciding how to share the works. A university that goes too far could end up facing a copyright-infringement lawsuit.
Many colleges now have the ability to digitize a wide variety of collections for broad use but frequently back away. And that reluctance harms scholarship, because researchers end up not using valuable documents if they can’t afford to fly to a distant archive to see them.
This spring academics, advocacy groups, and government officials are paying new attention to the issue. The fresh look comes after Google’s attempt to solve the problem for books ran off the rails in March, when a judge scuttled a proposed settlement that would have allowed the company to open up access to many orphan works through its book-digitization program. Now various groups with a stake in the debate are floating proposals for Congress to achieve what Google hasn’t.
A close look at one archive shows why the mass digitization of orphan works is creating such trouble.
The UCLA library is building a Web repository for the Arhoolie Foundation’s Strachwitz Frontera Collection of Mexican and Mexican American Recordings, an archive of rare 78- and 45-rpm records that date as far back as 1905. When many of the recordings became accessible to the public on the collection’s Web site, in 2009, UCLA bragged that it was largest online archive of its kind. And the digitizing is only about halfway done. The archive is important to students and scholars who want to learn about the musical heritage of North America and the cultural development of one of the largest minority groups in the United States.
The collection grew out of a love affair between a now-79-year-old German immigrant and the Mexican tunes he would hear on the radio in California and in cantinas every time he drove through the American Southwest. Chris Strachwitz was enamored by corridos, or narrative ballads. He combed record shops, distributors, jukebox companies, and even radio stations. Among the tunes he salvaged are recordings from small, regional labels that have dropped out of sight. Mr. Strachwitz donated his records to the Arhoolie Foundation, which he leads, and in 2001 the foundation started digitizing the songs with UCLA.
But the university is sharing only a fraction of that music with the world because it believes most of the collection is made up of orphans, still covered by copyright. Full access is restricted to computers connected to the campus network. Off-campus users can hear only 50-second snippets. UCLA chose that policy based on its reading of fair-use exceptions to copyright law, which may permit reproductions for teaching and research. Going further would introduce “a level of risk that, given the current status of copyright law, was really challenging,” says Sharon E. Farb, associate university librarian for collection management and scholarly communication.
(Her concern isn’t abstract: UCLA is defending itself in a separate copyright-infringement lawsuit over its use of streaming-video technology. See article on Page A4.)
Mr. Strachwitz, for his part, rejects the idea that most of his collection is orphaned. A quick scan of Frontera’s Web site shows that many of the recordings were issued by major labels like Columbia and Victor. Mr. Strachwitz would like to see full digital copies of the music available to the world. But “UCLA is chicken to do it,” he argues, because “they don’t want to raise the ire of the record business, who could possibly—but it’s very improbable—step in and say, ‘Hmmm ... we own this stuff. Why don’t you pay us?’”
Missing Documentation
Photographs are another copyright quagmire. The University of California at San Diego library system, for example, houses more than 100,000 photos from the Scripps Institution of Oceanography Archives. Ships, sharks, scientists: Those images document the past 100 years of marine science. But many of them were donated to Scripps without copyright documentation. That has limited the number of photos that the Scripps archives can share online.
The Scripps case is peanuts compared with the bibliographic orphanage run by the HathiTrust Digital Library. The 8.7-million-volume library pools digital copies of texts that Google scanned from universities. John P. Wilkin, its executive director, estimates that HathiTrust may contain 2.5 million orphan works. HathiTrust publishes the full text of works in the public domain, but not of those that are orphaned.
Other colleges just aren’t digitizing to begin with, because of the legal uncertainty around orphans. Many will look at collections they want to preserve—tapes crumbling into goo, papers fading—and “put them back into the box and hope someone decides what to do with them next year,” says Jessica Litman, a law professor and copyright expert at the University of Michigan.
How did this mess come about?
In part, the answer is legal changes that have made getting and keeping copyright much easier. Until 1978, to obtain federal copyright protection for most works, you had to put notices on any publicly distributed copies saying who owned the copyright, Ms. Litman says. After 28 years, if you wanted to retain your copyright, you had to apply for renewal with the U.S. Copyright Office—a step, she says, that most folks never took.
In 1978, the United States changed the law to make copyright automatic as soon as a work was fixed in tangible form. Protection was also extended to works that had not been publicly distributed, like diaries, pictures, and private papers. And the requirement to file for renewal was essentially eliminated. In 1989, to satisfy an international treaty, the U.S. nixed the notice requirement.
Bottom line: Lots of works that don’t have any marking on them are very likely under copyright, but we can’t say for sure, since there’s nowhere to go to look.
Overexposed Photographers
When the government previously considered the issue, supported by a hefty 2006 report from the Copyright Office, proposed legislation would have allowed people to use orphan works if they failed to find copyright owners after a reasonably diligent search.
But the legislative push aroused objections from groups that represent professional photographers. Individual photographs are almost all orphan works, Ms. Litman says. Photographers can rarely afford to register their snapshots individually, she explains, because there are just too many of them and it’s too expensive. They hope to get paid for use of their images, yet any “diligent search” would probably not lead you to the person who took a photo.
“They feel as if it would be simply giving permission to rip them off,” Ms. Litman says.
Enter Google. Impatient with the pace of change, the company decided to solve the orphan-works problem on its own. It digitized millions of books. After authors and publishers sued, Google negotiated a settlement that would have allowed it to sell institutional access to its digital library. It also agreed to pay 63 percent of digital-book revenue to a new collecting society that would distribute the money to registered rights holders and search for owners who had not claimed their works.
But a federal judge rejected that deal, in part because it would have given Google a “de facto monopoly” over the orphan books.
Which brings us to the situation brewing in Washington today.
On April 1, the librarian of Congress and the acting register of copyrights sent a letter apprising lawmakers of the post-Google landscape and inviting them to delve into various legal issues surrounding book digitization.
One idea being considered is a proposal for Congress to bring about broader digital access to out-of-print books through a system that copyright wonks call “extended collective licensing.” Pamela Samuelson, a professor of law at the University of California at Berkeley, lays out the concept in a new paper called “Legislative Alternatives to the Google Book Settlement.” Under such a regime, Congress could authorize a system that would permit granting broad licenses to use in-copyright books “for which it would be unduly expensive to clear rights on a work-by-work basis,” she writes. The framework in some ways resembles what Google hoped to do with its settlement—commercialize many in-copyright, out-of-print books without the cost of clearing rights book by book.
The system hinges on a collecting society that would negotiate licenses for works owned by both members and nonmembers. Unclaimed money from out-of-print books could be set aside for “a period of years,” Ms. Samuelson suggests. If efforts to find owners during that time were unsuccessful, she writes, “the works should be designated orphans and made available on an open-access basis.”
The University of Michigan, meanwhile, isn’t waiting for Congress. In May its library announced a new project to identify orphan works in the HathiTrust collection, an investigation that will ultimately require hand-checking millions of volumes. The copyright sleuthing is important because it may be a step on the path to opening up broader access to HathiTrust’s orphans.
Some in the library community worry that a new law could muck up another approach that universities have been pursuing quietly: putting orphans under the shelter of fair use. For example, the University of California at San Diego Libraries shared some of those Scripps photos having uncertain copyright status after conducting a fair-use analysis, says Brian Schottlaender, its university librarian.
“If you open up orphan-works legislation, you may not get what you hoped for,” he says. “It could in fact end up undermining the now five-plus years the community has spent developing best practices.”