Advice

Where Should You Keep Your Data?

Formal policies on information storage are finally beginning to emerge from federal grant agencies

Roy Niswanger / Creative Commons

June 23, 2015

Federal funding agencies have made it clear that grant proposals must include plans for sharing research data with other scientists. What has not been clear is how and where researchers should store their data, which can range from sensitive personal medical information to enormous troves of satellite imagery.

In the past, investigators have had to fend for themselves. Although data-sharing requirements have been in place for years at key grant agencies, universities have been slow to assist principal investigators in meeting those obligations. As a result, many keep data on individual hard drives and field requests for it on their own.

The good news is that formal policies — with recommendations for storage — are beginning to emerge from federal agencies. The bad news is that if you don’t comply with the new policies, you might be prohibited from receiving additional grant money.

Researchers have been reluctant to share data, fearing its misuse or their own loss of a competitive edge in the scientific workplace. But in 2003, in an effort to maximize the use of data collected with public dollars, the National Institutes of Health required that every large proposal include a data-sharing plan. The National Science Foundation followed suit in 2011, requiring data-management plans for all proposals. The big push began in 2013, when the White House Office of Science and Technology Policy directed all federal agencies with more than $100 million a year in research expenditures to develop plans to make the resulting data publicly accessible.

Few would argue with the idea of making data collected on the taxpayer’s dime widely available for scientific analysis. But plenty of investigators gripe about the need to prepare data-management plans. These represent yet another administrative component to an already bureaucratic proposal process.

Fortunately, the White House directive outlines the components that all public-access policies must include, and it urges the various federal agencies to make those policies compatible with one another. The policies apply to digitally formatted scientific data, not to laboratory specimens or other physical objects. Also excluded are lab notebooks, preliminary analyses, and similar working documents.

Federal agencies are making the quality of a project’s data-management plan part of the merit review of a grant proposal, which is a powerful motivator for a scientist to put serious effort into drafting that plan. Federal policies permit investigators to budget grant money for data management and storage.

Not all agency policies specify when the research data resulting from a grant needs to be made available. Several link the timing to publication of the resulting scholarly article, so that investigators aren’t scooped on their own data.

Program officers are responsible for keeping track of whether grant recipients are fulfilling their data-sharing obligations. To that end, program officers are giving the requirement some teeth: Funding can be withheld from researchers who don’t comply. Principal investigators are urged to place their data in existing publicly accessible repositories. The NIH has a list of repositories, which it plans to expand, to assist researchers with this task. The NSF directs researchers to specific repositories for oceanographic and climate data.

Some agencies are preparing templates to help applicants write a suitable data-management plan. Others have checklists of what needs to be included in that plan. Investigators beware: Despite mandated similarities, there are differences in plan requirements from agency to agency, and sometimes among departments within the same agency, so be sure you are using the correct guidelines. For example, some divisions of the NSF require providing access only to digital data, while the ocean-sciences division of the NSF requires the sharing of sediment, core, and dredge samples as well.

Of course, some data are not subject to public release. The mandate applies to unclassified research, and it allows agencies to take into account confidentiality and personal privacy, such as with biomedical research, as well as proprietary interests and intellectual-property rights.

In the meantime, support is available to frustrated investigators who can’t find an agency template. The "DMP Tool," hosted by the University of California, provides a free, interactive form that walks you through the preparation of a data-management plan for more than a dozen organizations. You will need to create an account, but you can use it even if your university is not a partner institution. The tool prompts you to provide the information necessary for a complete plan. What it doesn’t provide is advice on how to format, store, disseminate, and preserve the data. That is where your university library may be able to help.

Try starting with your campus library’s website. Many libraries have a page of "LibGuides," which contain research-related guidance on a variety of topics, including the preparation of data-management plans. Type "data-management plan" into your university website’s search engine and see what comes up. In my experience, those pages can be helpful but difficult to track down. If you can’t find the correct web page, ask a reference librarian.

The page probably will contain a primer on data-management plans, links to agency requirements, and links to data repositories in various disciplines. It may also have valuable information about any support your institution provides for writing a data-management plan and formatting, storing, disseminating, and preserving the data.

Many libraries are playing a leadership role in this effort. Data storage and preparation can get complicated quickly for the nonexpert, and it’s useful to have someone knowledgeable who can guide you through the intricacies of naming and formatting data for easier access by others. Federal agencies plan to establish standards for these so-called "metadata."

Investigators also can look for data repositories themselves. Two places to start are the Open Access Directory and Re3data.org. They list hundreds of repositories in an array of fields, from art history to zoology.

For investigators who are still resentful of the additional work that data sharing entails, consider it a way to enhance your reputation and that of your institution within the scientific community. You win prestige if people are using a data set you compiled. A data set also can be part of the broader impacts you are required to identify in your grant proposal. In future proposals, you can list the data set on your biosketch as a scholarly product.

Karen M. Markin is director of research development at the University of Rhode Island’s office of research development.