> Skip to content
FEATURED:
  • The Evolution of Race in Admissions
Sign In
  • News
  • Advice
  • The Review
  • Data
  • Current Issue
  • Virtual Events
  • Store
    • Featured Products
    • Reports
    • Data
    • Collections
    • Back Issues
    • Featured Products
    • Reports
    • Data
    • Collections
    • Back Issues
  • Jobs
    • Find a Job
    • Post a Job
    • Career Resources
    • Find a Job
    • Post a Job
    • Career Resources
Sign In
  • News
  • Advice
  • The Review
  • Data
  • Current Issue
  • Virtual Events
  • Store
    • Featured Products
    • Reports
    • Data
    • Collections
    • Back Issues
    • Featured Products
    • Reports
    • Data
    • Collections
    • Back Issues
  • Jobs
    • Find a Job
    • Post a Job
    • Career Resources
    • Find a Job
    • Post a Job
    • Career Resources
  • News
  • Advice
  • The Review
  • Data
  • Current Issue
  • Virtual Events
  • Store
    • Featured Products
    • Reports
    • Data
    • Collections
    • Back Issues
    • Featured Products
    • Reports
    • Data
    • Collections
    • Back Issues
  • Jobs
    • Find a Job
    • Post a Job
    • Career Resources
    • Find a Job
    • Post a Job
    • Career Resources
Sign In
ADVERTISEMENT
News
  • Twitter
  • LinkedIn
  • Show more sharing options
Share
  • Twitter
  • LinkedIn
  • Facebook
  • Email
  • Copy Link URLCopied!
  • Print

Dumped On by Data: Scientists Say a Deluge Is Drowning Research

By  Josh Fischman
February 10, 2011

Scientists are wasting much of the data they are creating. Worldwide computing capacity grew at 58 percent every year from 1986 to 2007, and people sent almost two quadrillion megabytes of data to one another, according to a study published on Thursday in Science. But scientists are losing a lot of the data, say researchers in a wide range of disciplines.

In 10 new articles, also published in Science, researchers in fields as diverse as paleontology and neuroscience say the lack of data libraries, insufficient support from federal research agencies, and the lack of academic credit for sharing data sets have created a situation in which money is wasted and information that could reveal better cancer treatments or the causes of climate change goes by the wayside.

We’re sorry. Something went wrong.

We are unable to fully display the content of this page.

The most likely cause of this is a content blocker on your computer or network. Please make sure your computer, VPN, or network allows javascript and allows content to be delivered from c950.chronicle.com and chronicle.blueconic.net.

Once javascript and access to those URLs are allowed, please refresh this page. You may then be asked to log in, create an account if you don't already have one, or subscribe.

If you continue to experience issues, contact us at 202-466-1032 or help@chronicle.com

Scientists are wasting much of the data they are creating. Worldwide computing capacity grew at 58 percent every year from 1986 to 2007, and people sent almost two quadrillion megabytes of data to one another, according to a study published on Thursday in Science. But scientists are losing a lot of the data, say researchers in a wide range of disciplines.

In 10 new articles, also published in Science, researchers in fields as diverse as paleontology and neuroscience say the lack of data libraries, insufficient support from federal research agencies, and the lack of academic credit for sharing data sets have created a situation in which money is wasted and information that could reveal better cancer treatments or the causes of climate change goes by the wayside.

“Everyone bears a certain amount of responsibility and blame for this situation,” said Timothy B. Rowe, a professor of geological sciences at the University of Texas at Austin, who wrote one of the articles.

A big problem is the many forms of data and the difficulty of comparing them. In neuroscience, for instance, researchers collect data on scales of time that range from nanoseconds, if they are looking at rates of neuron firing, to years, if they are looking at developmental changes. There are also difference in the kind of data that come from optical microscopes and those that come from electron microscopes, and data on a cellular scale and data from a whole organism.

“I have struggled to cope with this diversity of data,” said David C. Van Essen, chair of the department of anatomy and neurobiology at the Washington University School of Medicine, in St. Louis. Mr. Van Essen co-authored the Science article on the challenges data present to brain scientists. “For atmospheric scientists, they have one earth. We have billions of individual brains. How do we represent that? It’s precisely this diversity that we want to explore.”

ADVERTISEMENT

He added that he was limited by how data are published. “When I see a figure in a paper, it’s just the tip of the iceberg to me. I want to see it in a different form in order to do a different kind of analysis.” But the data are not available in a public, searchable format.

Ecologists also struggle with data diversity. “Some measurements, like temperature, can be taken in many places and in many ways, " said O.J. Reichman, a researcher at the National Center for Ecological Analysis and Synthesis, at the University of California at Santa Barbara. “It can be done with a thermometer, and also by how fast an organ grows in a crayfish” because growth is temperature-sensitive, said Mr. Reichman, a co-author of another of the Science articles.

A Big Success Story

The situation criticized in the Science articles contrasts with the big success story in scientific data libraries, GenBank, the gene-sequence repository, said Mr. Reichman and several other scientists. GenBank created a common format for data storage and made it easy for researchers to access it. But Mr. Reichman added that GenBank did not have to deal with the diversity issue.

“GenBank basically had four molecules in different arrangements,” he said. “We have way more than four things in ecology,” he continued, echoing Mr. Van Essen’s lament.

But even gene scientists today say they are struggling with the many permutations of those four molecules. In another Science article, Scott D. Kahn, chief information officer at Illumina, a leading maker of DNA-analysis equipment, notes that output from a single gene-sequencing machine has grown from 10 megabytes to 10 gigabytes per day, and 10 to 20 major labs now use 10 of those machines each. One solution being contemplated, he writes, is to store just one copy of a standard “reference genome” plus mutations that differ from the standard. That amounts to only 0.1 percent of the available data, possibly making it easier for researchers to store the information and analyze it.

ADVERTISEMENT

To cope with data diversity, Mr. Reichman said scientists should develop a common language for tagging their data. “If you record data from a particular location, the tags about that location—latitude and longitude, for instance—need to be consistent from researcher to researcher,” he said. Ecology has grown into a relatively idiosyncratic science, and all researchers have their own methods, so a common language will require a culture shift. “It’s become more urgent to do this because of the pressing environmental questions, like the effects of climate change, that we are being called on to answer,” he said. “And the ability to access more than one set of measurement or interactions will make the science better.”

Another factor that makes developing shared-data libraries urgent is that many scientists now store their own data. “And when they retire or die, their data goes with them,” said Mr. Rowe. In his field, using three-dimensional-imaging machines like CT scanners to analyze fossils, the first people to do that have already left the field, so there has already been a tremendous loss of data.

There is a financial cost to this, he added. “It costs money to do a CT scan, and the National Science Foundation pays for that with a grant. But if that scan isn’t curated, and disappears when the scientist retires or forgets about it, then the next scientist asks the NSF for money to do it again. That’s just a waste,” he said.

In all of the papers, scientists cited examples of small libraries of shared data that could be scaled up. Mr. Rowe helped to develop a project called DigiMorph, which contains three-dimensional scans of about 1,000 biological specimens and fossils. Those data sets have been viewed by about 80,000 visitors, he said, and have been used in 100 scientific papers. Sharing the data, he said, brings the cost to researchers, and their grant-giving agencies, way down. Another project, the Neuroscience Information Framework, contains many more data sets and has been used by even more scientists.

Mr. Rowe thinks agencies like the NSF and the National Institutes of Health should get behind efforts like this to a much greater extent than they have done. “Right now they are financing data generation, but not the release of that data, or the ability of other scientists to analyze it. I think, with all respect, that they are really missing the boat.”

ADVERTISEMENT

We welcome your thoughts and questions about this article. Please email the editors or submit a letter for publication.
Scholarship & Research
ADVERTISEMENT
ADVERTISEMENT

Related Content

  • Learning to Swim in the Rising Tide of Scientific Data
  • The Rise of Crowd Science
  • Explore
    • Get Newsletters
    • Letters
    • Free Reports and Guides
    • Blogs
    • Virtual Events
    • Chronicle Store
    • Find a Job
    Explore
    • Get Newsletters
    • Letters
    • Free Reports and Guides
    • Blogs
    • Virtual Events
    • Chronicle Store
    • Find a Job
  • The Chronicle
    • About Us
    • DEI Commitment Statement
    • Write for Us
    • Talk to Us
    • Work at The Chronicle
    • User Agreement
    • Privacy Policy
    • California Privacy Policy
    • Site Map
    • Accessibility Statement
    The Chronicle
    • About Us
    • DEI Commitment Statement
    • Write for Us
    • Talk to Us
    • Work at The Chronicle
    • User Agreement
    • Privacy Policy
    • California Privacy Policy
    • Site Map
    • Accessibility Statement
  • Customer Assistance
    • Contact Us
    • Advertise With Us
    • Post a Job
    • Advertising Terms and Conditions
    • Reprints & Permissions
    • Do Not Sell My Personal Information
    Customer Assistance
    • Contact Us
    • Advertise With Us
    • Post a Job
    • Advertising Terms and Conditions
    • Reprints & Permissions
    • Do Not Sell My Personal Information
  • Subscribe
    • Individual Subscriptions
    • Institutional Subscriptions
    • Subscription & Account FAQ
    • Manage Newsletters
    • Manage Your Account
    Subscribe
    • Individual Subscriptions
    • Institutional Subscriptions
    • Subscription & Account FAQ
    • Manage Newsletters
    • Manage Your Account
1255 23rd Street, N.W. Washington, D.C. 20037
© 2023 The Chronicle of Higher Education
  • twitter
  • instagram
  • youtube
  • facebook
  • linkedin