"Big data" is changing the sciences as well as the humanities (The Chronicle, June 4). We asked three experts to comment on the phenomenon. Here are their responses:
'In our case the data is not made publicly available'
Large data sets handled by large numbers of physicists has been the hallmark of particle physics. On my current experiment, ATLAS at the Large Hadron Collider, we have about 3,000 physicists analyzing petabytes of data, together with comparable amounts of simulated data, which will be stored and analyzed on a grid of computing sites across the globe.
However, in our case the data is not made publicly available, in part because the raw data require a number of corrections and calibrations be applied to it before they can be analyzed. The corrections can be subtle, such as knowing which detector elements are fully operational at any given time (these "conditions" information are stored in large databases).
Therefore the raw data would not only be hard to analyze, but might lead one to incorrect scientific conclusions. To avoid these type of potential errors, the data are kept private within the collaboration, and only fully analyzed and cross-checked results are published. It is not clear if these type of complex data will ever be able amenable to "crowd science."
Professor of Experimental High-Energy Physics
'I'm not wild about the term "crowdsourcing"'
There are several really important things happening simultaneously:
I'm not wild about the term "crowdsourcing" and I think it's actually important to disentangle the developments.
One is the harnessing of massive citizen or crowd observational capabilities through distributed IT, sensors, and networking technology, keeping in mind that the human sensory system and the brain are incredibly powerful and versatile data collectors.
A second is the renaissance of what might be called amateur science (science purely for the intellectual love of it, as distinct from the highly professionalized and career-oriented science of universities and industry) that's enabled by the deluge of highly accessible data, powerful low-cost observational, computational and experimental gear enabled in part by digital technologies, and the ability to collaborate and disseminate results globally using the web.
Citizen science is a helpful term used to characterize at least some activities in each category; often it has an implication of being orchestrated in some fashion (by professional scientists). Relationships between amateurs and "professionals" are delicate and I think evolving; this will be an important issue for society going foward.
Recently I've also been giving a lot of thought to parallel developments in humanistic disciplines—history, genealogy, archaeology, classics—that might reasonably be termed "citizen humanities." This is clearly happening though with much less academic recognition than for citizen or amateur science, and it's going to have big implications for the way society views and connects with the humanities in the future.
Clifford A. Lynch
Coalition for Networked information
'Citizens-as-data-analysts benefit ... all of us'
Alex Szalay gave a keynote address at the Joint Conference on Digital Libraries (JCDL) in 2008 entitled "Scientific Publishing in the Era of Petabyte Data." He noted that there were probably not enough astronomers on the planet to analyze the 100 terabytes of data generated by the Sloan Digital Sky Survey which aimed to "map the universe" beginning in the late 1990s.
At the same time he and others have pointed to a large and curious public increasingly interested in scientific observation and discovery as a type of social networking activity. As this article notes in reference to the Galaxy Zoo, where anyone can volunteer to classify images of far-flung galaxies from the SDSS, "The server caught fire a couple of hours after we opened it. ... More than 270,000 people have signed up to classify galaxies so far."
Citizens-as-data-analysts benefit science and education and all of us because they can find answers buried deep in data that would otherwise stay buried. As more scientists put their data "out there" for citizen scientists to work with, we may need to rename the "data deluge." How does "data renaissance" sound?
Carol Minton Morris
Director of Marketing and Communications