In a plain, brick, two-story office building near the University of Virginia, several dozen computer programmers are racing to define the future of science.
Members of the nonprofit Center for Open Science, they see a critical moment. Web-based services that researchers use to create, store, analyze, and share data are being rapidly built, bought, and sold by a handful of major publishing companies and their offspring.
If uninterrupted, the thinking goes, those cyber-consolidators could reinforce an expectation that scientific data is a private asset to be amassed and hoarded. But if redirected, they could enable a new world in which data is routinely and widely shared, speeding scientific discoveries and boosting their reliability.
We’re sorry, something went wrong.
We are unable to fully display the content of this page.
This is most likely due to a content blocker on your computer or network.
Please allow access to our site and then refresh this page.
You may then be asked to log in, create an account (if you don't already have one),
or subscribe.
If you continue to experience issues, please contact us at 202-466-1032 or help@chronicle.com.
In a plain, brick, two-story office building near the University of Virginia, several dozen computer programmers are racing to define the future of science.
Members of the nonprofit Center for Open Science, they see a critical moment. Web-based services that researchers use to create, store, analyze, and share data are being rapidly built, bought, and sold by a handful of major publishing companies and their offspring.
If uninterrupted, the thinking goes, those cyber-consolidators could reinforce an expectation that scientific data is a private asset to be amassed and hoarded. But if redirected, they could enable a new world in which data is routinely and widely shared, speeding scientific discoveries and boosting their reliability.
The looming potential for industry to exert control over the life cycle of the research process ‘is a threat to what we think science needs to be.’
“We feel a sense of urgency,” said the center’s co-founder and director, Brian A. Nosek, sitting alongside the row of Mac desktops that forms the spine of his 80-person shop. The looming potential for industry to exert control over the life cycle of the research process “is a threat to what we think science needs to be,” said Mr. Nosek, who is a professor of psychology at Virginia.
The battle for openness and sharing in science has long emphasized two questions: Should open-access journals be encouraged? And if so, how? But more recently, as publishing-industry giants, including Elsevier, have acquired services that help scientists throughout the research process, more attention has turned to the data itself.
ADVERTISEMENT
In the past few years, Elsevier has bought Mendeley, a program for managing and sharing research papers; Hivebench, a lab-management tool; and SSRN, a site for posting research articles before they appear in peer-reviewed journals. Thomson Reuters has spun off its science brands — including Web of Science, a journal-indexing service — into a new company known as Clarivate Analytics. And the owner of the Nature journals, Holtzbrinck Publishing Group, has created Digital Science, which has a suite of research tools, including Altmetric.
The Center for Open Science, meanwhile, just opened its doors in 2013. Its signature product is the Open Science Framework, a single web environment that connects a variety of the tools owned by Elsevier and others across the research life cycle. With it, a researcher who collects data with Open Sesame, stores it in Dropbox, and analyzes it with the open-source statistical package JASP, can seamlessly collaborate with a colleague who uses Mendeley, Google Drive, and GitHub to manage citations, data, and code.
The framework’s initial users include Adam P. Summers, a professor of aquatic and fisheries sciences at the University of Washington at Seattle. His lab on San Juan Island won a grant to buy a small-scale CT scanner for use in exploring fish anatomy, and he’s since been expanding its use to make three-dimensional scans of all types of vertebrates.
The Open Science Framework is proving invaluable to that effort, Mr. Summers said, because it’s flexible enough to accommodate a broad variety both in the types of data files that can be made from the scans and in the ways that users might enhance, annotate, and store them.
Members and visitors to his lab have now scanned and shared more than 1,000 species of vertebrates, saving what he estimates to be hundreds of thousands of federal dollars spent on needless repetitions.
ADVERTISEMENT
“We don’t live in a time when we’ve got the money for researchers to just sort of randomly re-scan a critter that’s out there because someone was convinced that next year they’d get to dealing with it and publish it,” he said. “To me that’s just infuriating.”
A ‘Wake-Up Call’?
Private companies say they largely share that vision of open science, and they don’t regard the Center for Open Science as a competitor or an obstacle. Elsevier operates journals that charge for access to published articles, but company officials said that should not be taken as a sign that it intends to do the same with research data.
Elsevier only charges for journal access to recoup the costs of producing journal articles, said Gabrielle Appleton, director of strategy, and it does not see the compilation of data as requiring the same level of investment.
An Elsevier official says the company intends to encourage the fuller sharing of data. ‘Locking it up is not very helpful.’
Instead, she said, Elsevier sees opportunity behind statistics suggesting that about three-fourths of the $1.7 trillion spent globally on research each year is wasted — by scientists studying the wrong questions and failing to realize what already has been tried and learned.
Encouraging fuller sharing of data is a key to fixing that problem, Ms. Appleton said. “Locking it up is not very helpful,” she said.
ADVERTISEMENT
Still, the company has its skeptics among longtime advocates of open access. Just a few months ago, after Elsevier’s purchase of SSRN, the pro-open-access Scholarly Publishing and Academic Resources Coalition called it a “wake-up call” to the threat of industry control over the research cycle.
Digital Science, a competitor, has also made that argument. Elsevier seems to be “trying to pull people into a one-stop shop ecosystem,” said Daniel Hook, the chief executive officer of Digital Science. Mr. Hook said his company seeks a more-open model that offers researchers its own products but allows other ones to be used interchangeably. “We think that you should use whatever product fits you,” he said.
Mr. Nosek, of the Center for Open Science, admits to some concern about Elsevier’s role. The industry leader posts annual revenue of about $2.5 billion, while the Center for Open Science has attracted some $30 million in grant support since its founding. The center now works with a budget of about $6 million a year.
But Mr. Nosek said he’s pursuing his open-science mission without checking on the long-term intentions that Elsevier or other companies have toward data sharing. And publishers aren’t his only concern. Leaders of funding agencies such as the National Institutes of Health are imposing rules to require data sharing as a condition of grants, Mr. Nosek said, but are having trouble with the nuts and bolts of implementation and enforcement.
A firmly imposed data-sharing requirement from a major funder such as NIH would have a huge impact across the scientific community, Mr. Nosek said. But with its decentralized structure, “they just don’t have the mechanisms easily to do so,” he said.
ADVERTISEMENT
Institutional Buy-In
The cooperation of institutions may ultimately be the biggest key to meaningful data sharing, Mr. Nosek said. The Open Science Framework will improve the ability of universities to see which faculty members are sharing data, and to reward them appropriately. But if the institutions refuse to make data-sharing part of their tenure-and-promotion processes, even requirements by journals and funders will fall flat, Mr. Nosek said.
The University of Southern California is already seeing that reluctance. USC leaders, in response to a faculty appeal, signed up as one of the first universities to offer its researchers a portal on the Open Science Framework.
But usage is “kind of low,” said Randolph W. Hall, USC’s vice president for research.
Getting widespread faculty support may take time. ‘The culture has been to hoard your data for as long as you can.’
The reason, said Morteza Dehghani, an assistant professor of psychology and computer science at USC, is that entering data into the system is simply another chore for researchers who don’t have collaborators who need quick access to it.
“The culture has been to hoard your data for as long as you can,” Mr. Dehghani said.
ADVERTISEMENT
The university recognizes that some long-term benefits of data sharing might justify extra labor in the short term, Mr. Hall said. But it does not intend to pressure its scientists, he said. “It’s up to the faculty member to decide what they want to do.”
The Center for Open Science has divisions that travel to institutions to explain how its systems work and why adopting them might make sense. But changing traditional attitudes has been a long slog, Mr. Nosek said. He first wrote grant proposals for the Open Science Framework back in 2006 but got rejected by both the NIH and National Science Foundation. The concept was revived a few years later when a student in his lab, Jeffrey R. Spies — now chief technology officer at the center — picked it up as a dissertation project.
The center attracted perhaps its greatest public attention last year when its Reproducibility Project reran 100 studies published in 2008 by three leading psychology journals and found that only 39 percent could be replicated.
Replication projects “are the short game” in the center’s strategic plan, designed mainly to highlight the problem, Mr. Nosek said. “The long game is building the infrastructure that every scientist relies on to do their research.”
One of those key elements will take shape next week, when the center plans to inaugurate three new sites for researchers to post pre-publication versions of their work, similar to SSRN. The sites will be called SocArXiv (for the social sciences), engrXiv (for engineering) and psyArXiv (for psychology).
ADVERTISEMENT
Such servers will let research be shared much faster, often years ahead of a peer-reviewed version in a journal, and more widely in cases when the journal requires a subscription for access. Some journals refuse to consider articles that have already appeared elsewhere, but Mr. Nosek said he’s confident that practice is waning. “There are a few that are resisting,” he said. “They will not last; they cannot last.”
Some Elsevier journals may have that requirement, Ms. Appleton said, but it is not a company policy. Elsevier bought SSRN largely to make it more efficient and better integrate with other services such as Mendeley, she said.
Establishing pre-print servers is among the final steps in the first of three expected phases of growth at the Center for Open Science. The center is now moving into the second phase, which involves rapid expansion. The Open Science Framework is adding about 500 new users a week, and it has 11 participating institutions with a backlog of another 30 or 40, Mr. Nosek said.
The final stage, which could take a decade to arrive, would involve transitioning the center into a long-term governance structure. That might even mean the end of the center, Mr. Nosek said, if its goal of widespread data sharing becomes routine in academic science.
“If it can meet its mission,” he said, “then it doesn’t need to survive.”
ADVERTISEMENT
Paul Basken covers university research and its intersection with government policy. He can be found on Twitter @pbasken, or reached by email at paul.basken@chronicle.com.
Correction (11/1/2016, 9:35 a.m.): This article originally attributed to Elsevier an annual revenue of $25 billion. The correct number is $2.5 billion, and the text has been corrected.
Paul Basken was a government policy and science reporter with The Chronicle of Higher Education, where he won an annual National Press Club award for exclusives.