As Libraries Go Digital, Sharing of Data Is at Odds With Tradition of Privacy

Brian Smith for The Chronicle

Kim Dulin helps lead Harvard's Library Innovation Lab, which for a time tweeted the titles of books being checked out from campus libraries.
November 05, 2012

Colleges share many things on Twitter, but one topic can be risky to broach: the reading habits of library patrons.

Harvard librarians learned that lesson when they set up Twitter feeds broadcasting titles of books being checked out from campus libraries. It seemed harmless enough—a typical tweet read, "Reconstructing American Law by Bruce A. Ackerman," with a link to the book's library catalog entry—but the social-media experiment turned out to be more provocative than library staffers imagined.

Harvard suspended the practice after privacy concerns were raised. Even though the Twitter stream randomized checkout times and did not disclose patrons' identities, the worry was that someone might somehow use other details to identify the borrowers.

The episode points to an emerging tension as libraries embrace digital services. Historically, libraries have been staunch defenders of patrons' privacy. Yet to embrace many aspects of the modern Internet, which has grown more social and personalized, libraries will need to "tap into and encourage increased flows of personal information from their patrons," says the privacy-and-social-media scholar Michael Zimmer.

Millions of people now share what they're reading through social-networking sites like Facebook, or smaller services including Goodreads and LibraryThing. They're accustomed to the personalized recommendations that Amazon provides by tracking customers' buying and browsing habits.

Libraries are following suit. They're beginning to share data to build tools for recommending and discovering books. They're lending e-books, even though Amazon monitors reading on Kindles, and they're enabling reviews and tags in the once-sacred realm of library catalogs.

But as librarians expand digital services, they face "a Faustian bargain," warns Mr. Zimmer, an assistant professor in the School of Information Studies at the University of Wisconsin at Milwaukee. In a forthcoming paper, he writes that librarians may decide that "the benefits of these advanced data-based services outweigh the traditional protection of patron privacy."

That tradition grows out of a core belief: People should be free to explore ideas without the government or anyone else watching.

In the 1970s and 80s, the FBI tried to figure out what some scholars were studying by enticing library clerks to disclose borrowing and reading habits, says Deborah Caldwell-Stone, deputy director of the Office for Intellectual Freedom at the American Library Association. In response, many states passed laws requiring libraries to keep those data private.

It's considered "good practice" to purge the records of who borrowed particular materials, she adds. "The best way to preserve privacy is not to have a record of what somebody read."

Now the Web has put privacy in flux, and the lines are fuzzy as to what trade-offs libraries should make. When should data be used? When should the information be shielded?

One option is to use systems that allow patrons to opt in to libraries' tracking such activities as their previous checkouts.

"The privacy that libraries traditionally have been preserving is not always valued by their patrons, especially in an age of social networking," says David Weinberger, co-director of the Harvard Library Innovation Lab, which was behind the Twitter experiment.

"We have the staunchest defenders of individual privacy in the nation now engaging a set of users who increasingly default to openness and sharing," he says. "It's going to take a while to work that through."

Other librarians are watching to see how he navigates those changes.

'The New Digital Disorder'

Mr. Weinberger is a well-known Internet thinker, with a Ph.D. in philosophy and an eclectic résumé. In 2000 he helped write a best-selling book, The Cluetrain Manifesto: The End of Business as Usual, which argued that the Web is mostly a "social place," not a publishing platform. Influenced by Cluetrain, Howard Dean's campaign hired Mr. Weinberger as "senior Internet adviser."

In 2007, Mr. Weinberger published a book of particular interest to librarians, Everything Is Miscellaneous: The Power of the New Digital Disorder. The Web upends "the rules of the physical world," where "everything has its place," the book said. Information is now "a social asset and should be made public, for anyone to link, organize, and make more valuable."

Sharing information is one focus of the Library Innovation Lab, which began three years ago as a place to think about the digital future of libraries. Staff members work in the basement of Harvard's law library, sharing space with yellowing legal texts from the Ottoman Empire. Over a lunch of sandwiches and cookies, team members discussed their recent projects—and the privacy constraints they face—with The Chronicle.

One effort, called LibraryCloud, aims to help libraries share a valuable resource: metadata, or information about information. Metadata are important for finding stuff as the amount of information rapidly increases. "The solution to information overload is more information," Mr. Weinberger says.

Metadata might include a book's page count; how often it has been checked out; and how frequently it has been checked out by particular types of people, such as undergraduates or faculty members. (In one novel method for generating metadata, the lab equipped some Harvard libraries with "Awesome Boxes." Someone who checks out an item can return it to the Awesome Box rather than the regular basket, creating a data trail about what library patrons consider great. Items that have been "awesomed" are publicized via Twitter and RSS and may also be built into online book-browsing software in the future.)

Until now, a lot of metadata have been inaccessible. The idea behind the LibraryCloud software is that, by gathering metadata from different libraries, developers could use them to build new services. Mr. Weinberger calls it "an attempt to make available everything that libraries know."

In another project now being developed, StackLife, Mr. Weinberger's group offers a flavor of how you can use some of what libraries know. StackLife is library-browsing software that guides patrons to relevant works in part by looking at how the university community has used them. Say you're searching for a book. Clicking on it in StackLife displays the volume on a virtual shelf, next to other texts sorted by call number. The software color-codes books by what it calls ShelfRank, a measure of their importance to the community. That's judged by things like how many libraries own the book and how often it's checked out or put on reserve.

The traditional library catalog "doesn't reflect the usage of the community at all," says Kim Dulin, also a co-director of the lab. StackLife changes that. It visually pops out works that are core to their fields, Mr. Weinberger says, showing what members of the Harvard community have "demonstrated through their actions are important."

But, while Amazon tracks your every move, privacy concerns prevent Mr. Weinberger's team from collecting some key data. It doesn't track books borrowed together, for example. You could imagine using such data to suggest other books checked out with a given title. For instance, it could be helpful to know that patrons who checked out Darwin's On the Origin of Species also borrowed a particular book of commentary about it.

So what's the potential danger of data on books checked out together? Mr. Weinberger offers a silly example that makes the general point. Say somebody checks out "How to Blow up Federal Office Buildings," along with a repair manual for 1957 DeSotos. There's only one person on campus who owns that vehicle. "That would be a pretty good indicator that maybe the FBI wants to pay a call," Mr. Weinberger says.

Finding Common Borrowing

A British library project goes further down the recommendation route. At the University of Hudders­field, the library mines historical circulation data to generate an Amazon-style "people who borrowed this book also borrowed these books" catalog feature. The effort dates to 2005, when library staff began thinking about how they might use the two million transaction records in their database. Their recommendations draw on anonymized and aggregated data, says Dave Pattern, library-systems manager.

"We're not interested in what one student borrows—we're interested in finding the common borrowing patterns of lots of students," he says. "In particular, we want to try and ensure that books borrowed for personal reasons would never appear as a recommendation. So we've drawn a line in the sand, and we need to see a specific borrowing pattern repeated by several students before it will appear as a recommendation."

Other libraries are turning to vendors to add some Web 2.0 gloss.

One such company is Library­Thing. Tim Spalding, who had dropped out of a Ph.D. program in Greek and Latin, started Lib­raryThing in 2005 as a pet project to catalog his books.

To his surprise, it became an online sensation, with 1.5 million members cataloging, discussing, and reviewing their books, too. Academic departments use the site to organize their books. Members have submitted more than 91 million book labels, called tags.

In other words, Mr. Spalding oversees a megarepository of book data. So he started to sell it. Libraries pay his company to enhance their catalogs with Amazon-like book suggestions from the LibraryThing database, plus reviews and tags.

Tags can be useful browsing tools because librarians don't know what books mean to individual readers. And it can take years for categories to make it into the Library of Congress system. Sociobiology, for instance, existed for about a decade before the Library of Congress realized it was a field, says Mr. Spalding.

It was initially seen as "a radical idea," he recalls, "that you would put regular, unwashed people commenting on books in the library catalog, which is the locus of truth and fact." Yet he now has 400 library customers, roughly one-third of them academic libraries.

Libraries that contract with Mr. Spalding's company soup up their catalogs with data largely generated by LibraryThing's ordinary users, not their own patrons. But another vendor arrangement does pay close attention to library users' reading habits, raising further concerns about privacy.

Under a change that began last year, Kindle owners can borrow books from libraries via an e-book distributor called OverDrive, which signed a deal with Amazon to offer the service. But as a result, the American Library Association's Ms. Caldwell-Stone began getting complaints that patrons were receiving marketing messages from Amazon. Those messages would say that a library patron's loan period was about to expire. Would they like to buy the borrowed book, complete with any notations they had made?

"It was clear that they were collecting and keeping a lot more information about individual users and their reading habits than what libraries traditionally do," she says. Amazon requires patrons to log in and is "keeping track of what they read."

Several universities have contracted with OverDrive to offer e-book lending, including Yale University, McGill University, and the University of Pittsburgh. Todd Gilman, Yale's librarian for literature in English, acknowledges that Amazon knows which books library patrons borrow, but he points out that those borrowers already own a Kindle and maintain a relationship with Amazon. The choice is up to them. If patrons have concerns, Mr. Gilman says, "they shouldn't read on devices that require them to log in to third-party vendor Web sites like Amazon."

"It's not like the library is giving out information to anybody," he says.