> Skip to content
FEATURED:
  • Student-Success Resource Center
Sign In
  • News
  • Advice
  • The Review
  • Data
  • Current Issue
  • Virtual Events
  • Store
    • Featured Products
    • Reports
    • Data
    • Collections
    • Back Issues
    • Featured Products
    • Reports
    • Data
    • Collections
    • Back Issues
  • Jobs
    • Find a Job
    • Post a Job
    • Career Resources
    • Find a Job
    • Post a Job
    • Career Resources
Sign In
  • News
  • Advice
  • The Review
  • Data
  • Current Issue
  • Virtual Events
  • Store
    • Featured Products
    • Reports
    • Data
    • Collections
    • Back Issues
    • Featured Products
    • Reports
    • Data
    • Collections
    • Back Issues
  • Jobs
    • Find a Job
    • Post a Job
    • Career Resources
    • Find a Job
    • Post a Job
    • Career Resources
  • News
  • Advice
  • The Review
  • Data
  • Current Issue
  • Virtual Events
  • Store
    • Featured Products
    • Reports
    • Data
    • Collections
    • Back Issues
    • Featured Products
    • Reports
    • Data
    • Collections
    • Back Issues
  • Jobs
    • Find a Job
    • Post a Job
    • Career Resources
    • Find a Job
    • Post a Job
    • Career Resources
Sign In
ADVERTISEMENT
News
  • Twitter
  • LinkedIn
  • Show more sharing options
Share
  • Twitter
  • LinkedIn
  • Facebook
  • Email
  • Copy Link URLCopied!
  • Print

Why Researchers Shouldn’t Share All Their Data

By  Nathan Schneider
April 8, 2018
Is Open-Source Research  Worth It? 1
Roland Sárkány for The Chronicle

In 1925, the year Gertrude Stein published her 1,000-page book The Making of Americans, she felt the need to explain, in lectures at the Universities of Cambridge and Oxford, why it was so long. Her aim, she said, was to depict a tense she described as the “continuous present.” Her method for doing so — and the culprit for her verbosity — was “using everything.”

The thought of “using everything” should send a chill down the spine of any researcher. Producing publishable results from an investigation typically requires managing far more material than can fit into the publication format. A secret realm of dark data resides in the notebooks and hard drives of the data-gatherers; they judge that data to be excess, but who knows? What if it is not? In this excess, in this “everything,” surely there are the ingredients of unrealized cures and upheavals. And in this excess, every researcher knows, are parts of our process we would rather not share. There, we are vulnerable.

We're sorry. Something went wrong.

We are unable to fully display the content of this page.

The most likely cause of this is a content blocker on your computer or network.

Please allow access to our site, and then refresh this page. You may then be asked to log in, create an account if you don't already have one, or subscribe.

If you continue to experience issues, please contact us at 202-466-1032 or help@chronicle.com

In 1925, the year Gertrude Stein published her 1,000-page book The Making of Americans, she felt the need to explain, in lectures at the Universities of Cambridge and Oxford, why it was so long. Her aim, she said, was to depict a tense she described as the “continuous present.” Her method for doing so — and the culprit for her verbosity — was “using everything.”

The thought of “using everything” should send a chill down the spine of any researcher. Producing publishable results from an investigation typically requires managing far more material than can fit into the publication format. A secret realm of dark data resides in the notebooks and hard drives of the data-gatherers; they judge that data to be excess, but who knows? What if it is not? In this excess, in this “everything,” surely there are the ingredients of unrealized cures and upheavals. And in this excess, every researcher knows, are parts of our process we would rather not share. There, we are vulnerable.

During the years when I worked primarily as a reporter, this excess haunted me. I have hours-long interview transcripts from which only a few words, if that, appeared in an article — not because those words were the only ones of value, but because of the needs and constraints of that particular article.

Eventually I started to collect my reporting notes into public notebooks, including one that is the basis of my next book and of this article. Doing so has become an easy way to share what I gather with people who want more than what the published work can hold. It has also inclined me to take better notes, and to notice more threads of connection among disparate projects. But I have also found myself holding back. I hide my detailed reading notes behind a password. To protect my sources, interview recordings and transcripts remain offline altogether. Field notes stay in paper notebooks.

ADVERTISEMENT

Predictably, the hard sciences have charged ahead on this curve. Far-flung research teams frequently collaborate in examining common data sets. Some government grants come with the requirement of publishing open data as well. The resulting demand has warranted open-notebook software like Jupyter, Observable, and Zenodo. Researchers frequently post their own code on platforms like GitHub or GitLab. These are based on Git, a tool designed for large groups collaborating on open-source software. Among other features, Git keeps a meticulous record of a given project’s version history. It remembers every change and every bug. Likewise, open-source software communities tend to regard maximal transparency as an intrinsic good.

Some humanists have followed suit. The Rice University historian W. Caleb McDaniel, for instance, has developed a system that feeds his research notes into a public wiki, thanks to a mix of open-source tools and scripts he had to code for himself. Scholars across many fields share their bibliographies online using tools like Zotero, which was developed through an academic collaboration. Hypothesis, a nonprofit platform, enables users to make, collect, and share annotations on nearly any website. Requiring my students to use it, I’ve found, is a handy way of checking that they’re doing their reading assignments and getting them to debate their interpretations.

A secret realm of dark data resides in the notebooks and hard drives of the data-gatherers. In this excess are parts of our process we would rather not share.

Among journalists, there has been talk at times of “open journalism” as a new paradigm for reportage that extends beyond just the polished report. In 2011, as an executive in residence at the University of Southern California, the former Sacramento Bee editor Melanie Sill published a report called “The Case for Open Journalism Now: A New Framework for Informing Communities.” Yet her call has not been widely answered, and it remains the definitive work on the subject. As editor in chief of the British daily The Guardian, Alan Rusbridger adopted open journalism as his strategy for the newspaper, but he left the job in 2015. Organizations such as BuzzFeed and ProPublica, at least, publish code and data sets on GitHub.

The emerging opportunities for self-exposure extend from research to the writing process. Kathleen Fitzpatrick, now a professor of English and director of digital humanities at Michigan State University, undertook a widely publicized “open review” process for her 2011 book, Planned Obsolescence: Publishing, Technology, and the Future of the Academy. She waited until she had finished a full draft, but one need not do so. I version-tracked the entire drafting of my latest book in Git, which means nearly my whole process of writing and revision could become immediately public if I simply pushed it to GitHub.

I don’t think I will do that.

ADVERTISEMENT

The “blockchain” technology underlying Bitcoin, which makes possible secure databases with no centralized authority, could open the doors of transparency still farther. Every Bitcoin transaction is recorded in the open, and the same mechanisms could record acts of scholarly research, writing, and certification. Natalie Smolenski, an anthropologist who works for the blockchain start-up Learning Machine, wants to use such tools to transform how we register academic achievements. Yet in her paper “Academic Decentralization in an Era of Digital Decentralization,” Smolenski reserves some of her most arresting words for transparency.

“Transparency,” she writes, “is socially pornographic and facilitates violence.” It can mean revealing data about ourselves without the context we might otherwise provide. It can objectify the researcher and the process, inviting viewers to feel a false sense of intimacy, of inside knowledge.

Without the requirement of transparency, one can try on ideas, see how they look and work, then take them off.

This is a sentiment I’ve sometimes come across as a minority opinion in hacker communities I’ve studied. It’s expressed most often by participants representing vulnerable identity groups, people for whom more self-exposure can mean more vulnerability. In the academy, I’ve heard it from those on “watch lists,” whose every move is scrutinized for political reasons, in search of what might be construed as a misstep. Graduate students are often taught to be careful what they publish, for fear of being pigeonholed too early. Too much self-exposure might compromise a career. It might also muddle one’s message.

“Meaning is not transparent,” Smolenski told me in an email; rather, she stresses that meaningful communication happens through context and time. She contrasts the exposure of radical transparency to what the more careful, intentional cultivation of intimacy allows: “provisionality.” Without the requirement of transparency, one can try on ideas, see how they look and work, then take them off.

Feminist techies, while sympathetic to calls for open-sourcing everything, have also recoiled at the most extreme demands to be transparent. As Ellen Marie Dash, a software developer, wrote in the magazine Model View Culture, for those accustomed to harassment online, the call for openness feels like a call to invite more harassment. “The only way to handle this sort of problem properly,” Dash contends, “is by explicitly placing consent and safety over openness and transparency.”

ADVERTISEMENT

Dash also questions whether dumping vast amounts of information online counts as transparency in the first place: “What you wind up with is a company that produces so much unorganized, uninteresting and irrelevant data that you can’t find meaningful information.”

It’s the old paradox of Jorge Luis Borges’s Library of Babel, which contains such multitudes that little of use can be found. And this is the trouble with reading Gertrude Stein, as soon as you’re ready to leave her bewildering “continuous present.” The tools that afford us new opportunities for openness and collaboration also come at the risk of obfuscation and danger.

Nathan Schneider is an assistant professor of media studies at the University of Colorado at Boulder. His forthcoming book, Everything for Everyone: The Radical Tradition That Is Shaping the Next Economy, will be published by Nation Books.

A version of this article appeared in the April 13, 2018, issue.
Read other items in this The Digital Campus: The Robot Has Arrived package.
We welcome your thoughts and questions about this article. Please email the editors or submit a letter for publication.
TechnologyInnovation & Transformation
ADVERTISEMENT
ADVERTISEMENT
  • Explore Content
    • Latest News
    • Newsletters
    • Letters
    • Free Reports and Guides
    • Professional Development
    • Virtual Events
    • Chronicle Store
    • Chronicle Intelligence
    • Find a Job
    • Post a Job
    Explore Content
    • Latest News
    • Newsletters
    • Letters
    • Free Reports and Guides
    • Professional Development
    • Virtual Events
    • Chronicle Store
    • Chronicle Intelligence
    • Find a Job
    • Post a Job
  • Know The Chronicle
    • About Us
    • Write for Us
    • Work at The Chronicle
    • Our Reporting Process
    • Advertise With Us
    • Brand Studio
    • DEI Commitment Statement
    • Accessibility Statement
    Know The Chronicle
    • About Us
    • Write for Us
    • Work at The Chronicle
    • Our Reporting Process
    • Advertise With Us
    • Brand Studio
    • DEI Commitment Statement
    • Accessibility Statement
  • Account and Access
    • Manage Your Account
    • Manage Newsletters
    • Individual Subscriptions
    • Institutional Subscriptions
    • Subscription & Account FAQ
    Account and Access
    • Manage Your Account
    • Manage Newsletters
    • Individual Subscriptions
    • Institutional Subscriptions
    • Subscription & Account FAQ
  • Get Support
    • Contact Us
    • Reprints & Permissions
    • User Agreement
    • Terms and Conditions
    • Privacy Policy
    • California Privacy Policy
    • Do Not Sell My Personal Information
    Get Support
    • Contact Us
    • Reprints & Permissions
    • User Agreement
    • Terms and Conditions
    • Privacy Policy
    • California Privacy Policy
    • Do Not Sell My Personal Information
1255 23rd Street, N.W. Washington, D.C. 20037
© 2023 The Chronicle of Higher Education
  • twitter
  • instagram
  • youtube
  • facebook
  • linkedin