New Program to Help Scientists Share Large Data Sets

Andrea Chiba, a researcher at the University of California at San Diego, loves sharing data with colleagues in Australia. They can do the theoretical computer simulations; she can do the actual experiments on physical subjects.

The problem is that the collaboration often involves actual flights halfway around the world to load up hard drives filled with data, too much to send over conventional Internet connections.

That’s exactly the kind of issue a new $8-million program from the National Science Foundation hopes to solve. The program, the DataNet Federation Consortium, involves six different research centers in an effort to make it easier and faster to access and share large and complex data sets.

It is often said that 80 percent of a researcher’s work these days goes into “managing and manipulating the data,” said Reagan Moore, the DataNet Federation Consortium’s principal investigator and a professor at the School of Information and Library Science at the University of North Carolina at Chapel Hill. In February, the journal Science published articles on studies that found that scientists often waste large swaths of data, in part because of a lack of support from federal research agencies.

So the consortium will build an “infrastructure” aimed at reversing that trend. It will allow researchers to access data remotely, which is often impossible now when projects require large amounts of storage space. The new system will also create programs that automatically convert data from different fields of research—which are often in different formats—into a workable form. And it will create a set of rules and procedures for each organization to follow, to make the sharing process smoother.

The grant money from the NSF will come in over a span of five years, and it will benefit scientists from hundreds of universities working in biology, hydrology, oceanography, social science, and learning behavior, which is what Ms. Chiba studies.

“It can save a lot of air miles,” said Ms. Chiba, who is science director at San Diego’s Temporal Dynamics of Learning Center.

Ms. Chiba particularly hopes the system helps her work with data from researchers in other disciplines. Behavior, after all, can be examined using a number of different research approaches—theoretical, mathematical, and neural (the physical study of the brain), to name a few.

So having quick access to other types of research pays off. For example, scholars in Australia and at the Salk Institute, in California, came up with computer models that predicted how new neurons made in one part of the brain might help humans figure out where they are and what time it is. She built on that research by testing the prediction in rodents.

So “there’s a huge need to share data,” she said, but it’s often not done effectively, especially since “the data sets are enormous.”

Ms. Chiba will have to wait a couple of years before working with the new consortium, but even during that time, she thinks she will be able to use tools created by the other research centers. That is something that has happened before the DataNet grant—some algorithms that were created to study the human brain are used by plant biologists, for example.

Several of the consortium’s main projects will focus on facilitating that kind of research, Mr. Moore said. One of the organizations involved—the Consortium of Universities for the Advancement of Hydrologic Science—wants to be able to predict the impact that rainfall will have in a number of different places, and assess the risk of wildfires and landslides, he said.

That requires data from several different agencies, such as NASA and the U.S. Geological Survey. If all goes as planned, all that data will be accessible and readable, within clicks.

Return to Top