Long before Edward Snowden made his revelations about the National Security Agency, even before the popular rise of Julian Assange and WikiLeaks, the University of North Carolina at Chapel Hill discovered the pain of not securing sensitive data.
Back in 2009, the technical staff of Chapel Hill’s medical school discovered spyware on a server housing the medical records of some 180,000 women, participants in a study analyzing mammography results. Though no evidence existed that hackers copied the files, the breach caused a painful feud between the university and the project’s principal investigator, each blaming the other for failing to secure the private information.
Turns out that they were both right: No one was doing enough.
Academe is well into the Internet age. On any given day, UNC-Chapel Hill will be hit with 30,000 attacks on its firewall, something its IT staff knows well. That’s not an unusual number for a large institution. But when it comes to protecting research data from malicious parties, be they foreign spies, gangsters, or hackers—or, for that matter, a federal agency—universities often rely on policies that have changed little from the days when, to secure records, you put them in a locked cabinet behind a locked door.
“We’re really just all waking up as a community to both the power and challenges of dealing with this,” said Daniel K. Nelson, director of the university’s Office of Human Research Ethics.
A decade ago, the university’s institutional review boards, the internal committees created to prevent harm to human experimental subjects, used security policies that could be called lax. They could also be called common. Largely devoid of computer expertise, the review-board members asked researchers to describe the data protections of their experiments, trusting that investigators, on their own, knew how to keep sensitive records private in the digital age.
“It was a well-intended effort, but I don’t know if it was an efficient or effective effort,” said Mr. Nelson, who oversees the boards. “It presumed that investigators have the computer expertise to know what current standards are, and that IRBs have expertise to review those proposals.”
Essentially, the boards were presuming that an expert on epidemiology or medical genomics would also have the expertise to choose 256-bit over 128-bit encryption, or have the presence of mind to maintain data stores disconnected from the Internet. Ask even the most computational-fluent researcher about such matters, let alone many long-tenured faculty members, and that’s not the story they would tell.
“Most researchers, we’re not computer scientists either,” said Elizabeth A. Buchanan, director of the Center for Applied Ethics at the University of Wisconsin-Stout and a leading scholar of IRBs and Internet research. “They’re not trained to use encryption.”
Institutional review boards are traditionally the domain of doctors, social scientists, or bioethicists. In 2006, Ms. Buchanan queried several hundred such boards to see if they included a computer scientist, IT expert, or privacy officer. Few did. Instead, when boards had technical questions, they would find colleagues from IT or engineering who could help them out.
“Even 10 years ago, it wasn’t as common to say that an IRB was responsible for data protections,” Ms. Buchanan said. Today, while many more boards see that as part of their mission, they will still often rely on the same informal consultations, she added.
One exception is Dave Dittrich, a computer scientist who has served on a review board at the University of Washington for four years. (He is the type of professor who includes his encryption key in all e-mails.) His rarity is partially a self-inflicted problem, he said.
Whenever he hears computer scientists complaining about IRBs not getting it, he said, “I ask, ‘How many in the audience have spent time volunteering to be on their IRB?’ I normally see no hands raised.”
Health and Safety First
There’s also a cultural resistance that stems, understandably, from the health and safety of research subjects’ being the boards’ utmost concern. IRBs have required privacy protections for decades, but even today, if you take a standard training on conducting human research, data security is an afterthought, said David M.J. Lazer, a professor of political and computer science at Northeastern University.
“There’s nothing about IT practices in that training,” he said. “Not one slide devoted to that. That’s just sort of the general truth. You’re not getting trained either in the issues of data security or how to create an infrastructure to guard this stuff.”
That’s even before the complications that attend the large-scale data sets increasingly penetrating the medical and social sciences, containing torrents of whole-genome sequencing or thousands of criminal records. These boards aren’t just catching up with the Internet era. They’re catching up with Big Data, Mr. Lazer said.
“Generally IRBs weren’t built for the kinds of issues that are relevant to Big Data,” he said. That lack of capacity extends beyond the board, too. Most universities aren’t talking about what role they have to play in providing a home for large-scale, sensitive data, he said.
What would such a home look like? Last month Mr. Lazer met with several Northeastern officials to consider that question. It would most likely be a high-powered cluster of computers physically separated from the Internet—"air gapped,” for you nascent hackers out there—and under controlled access, perhaps at the library, though that’s an open question.
It’s a fortification. Because universities are targets.
“There needs to be a defensive mind-set here,” Mr. Lazer said. That “may be common if you’re a bank or Amazon. It’s not going to be the normal mind-set of a university.”
It’s not all doom for data security, however. As the drumbeat of breaches has grown ever louder, more universities have begun to add computer scientists and privacy officers to their review boards. And others, like the University of North Carolina and Harvard University, have found that a few policy changes can make a significant difference.
A Formula for Security
At Chapel Hill, the dispute over the mammography-data breach, which ultimately led to a settlement and the retirement of Bonnie C. Yankaskas, the lead scientist involved, had a silver lining. No longer could the university conduct its institutional reviews as it had. It had to stop trusting the researchers.
Rather than trying to evaluate the data security of each experiment coming across the transom—the university has some 4,500 active studies at one time—the boards now instead ask a few questions: Are you collecting protected health information? Genetic code? Surveys on illegal activity, substance abuse, or sexual behavior?
The boards plug those answers into a formula that generates a security threat level for the study. Given these parameters, the IRB then says, you have a Level 3 study. Go see your designated IT contact to establish proper security.
“At the end of the process,” Mr. Nelson said, “rather than the investigator telling us what they’re going to do, and us pretending we know how many bytes of encryption are up to standard, we flipped it.”
Harvard has adopted a similar system. Both are among the best practices that Ms. Buchanan has seen. “That’s been a good model,” she said. But she suspects that, given the sheer amount of online sharing people do these days, universities are also going to need to change the promises they make to subjects.
“Guaranteeing absolute confidentiality is a foolish approach,” she said. Rather, new forms should educate subjects on what the collected data could say about them, if intercepted. Promise to protect it, of course, but let them know the risks if it does escape.
Proper data security is not easy, in the end, but rewards will come with it. The volume of data, and the amount people share—intentionally and unintentionally—will seemingly only grow. But only universities that can dig their digital moats deep will reap those rewards, Mr. Lazer said, so they better fix their policies soon.