Scientists Don’t View Reproducibility as ‘Risky Business’

To the Editor:

Years ago I participated in a personality test administered to staff at my research institution. The test was intended to aid managers in building well-functioning teams that included scientists, engineers, operations professionals, and outreach experts. Among other insights, the test revealed that each of these groups had different values. Significantly, scientists value getting the answer right to the near exclusion of anything else, over staying on schedule, remaining within budget, following process, and especially caring what anyone else thinks. Scientists will not cover up problems for fear of external perception; they are truth-seekers at their core.

Those values were on full display during a three-day colloquium on reproducibility in research held last month at the National Academy of Sciences, in Washington, D.C. So I was surprised to read a recent article in your paper that implied that attendees shied away from exposing thorny issues for fear that detractors of science would use instances of irreproducible research as an excuse to withdraw funding or question the validity of scientific consensus (“In the Age of Trump, Scientists See Reproducibility as Risky Business,The Chronicle, March 21.). Your story led me to ask: “Were we at the same meeting?” What I witnessed was three days’ worth of enthusiastic and strong support for getting to the root of irreproducible research and developing solutions to make science a stronger and more robust enterprise.

One hardly need look any further than the opening direction for the workshop provided by Dr. David Allison from the University of Alabama at Birmingham. Dr. Allison, a member of the organizing committee, reminded everyone that the scientific method is the best approach ever devised for achieving objective knowledge.  The validity of the scientific approach is derived from its carefully crafted procedures, which are constantly being improved and updated as better tools and methods are invented and cross-calibrated with established methods. The culture of science demonstrates regular self reflection and self correction on ways to strengthen and sometimes radically change those procedures when shown to be lacking. I note that this culture is also firmly embedded in the reward system for individual scientists. A scientist never achieves recognition and advancement by going along with what someone else did before, but rather by improving upon it or, better yet, by taking the science to a new level.

Victoria Stodden from the University of Illinois at Urbana-Champaign, another member of the organizing committee, provided an overview of irreproducibility that helped to frame the problems and the search for solutions. For example, with empirical reproducibility, the challenge is for independent labs to guarantee that they have indeed conducted an experiment using the same methodology. She gave an example where two excellent labs intent upon reproducing the same experiment concluded after two years that the reason for differing outcomes was that, at a key point in the experiment, one lab used a centrifuge to mix and another stirred a beaker. Both had thought their technique the only reasonable approach. With statistical reproducibility, the problems involve poor experimental design, how data are handled (e.g., treatment of outliers), false discovery rate, etc. These issues result in irreproducible results even when an experiment is exactly repeated in the empirical sense above. Finally, Dr. Stodden mentioned computational reproducibility, with respect to the Google flu example. The internet search giant created an algorithm that predicted how many cases of flu would materialize as the flu season came on that was double that of the Centers for Disease Control. Because Google’s algorithm was not publicly available, it was not possible to understand the discrepancy in the two predictions.

To suggest that attendees might have “pulled their punches” for fear of retribution ignores the presentations on the meta-analyses that illustrate the difficulty in defining a priori when a study has been reproduced and in achieving empirical, statistical, and/or computational reproducibility. For example, Joachim Vandekerckhove from University of California, Irvine, reviewed results from the Open Science Collaboration’s Reproducibility Project: Psychology. In general, those studies with higher statistical power were more likely to have been reproduced. Randy Schekman from University of California, Berkeley, presented the first results from the cancer research reproducibility study. At this point only a few landmark studies have been examined, and some have successfully replicated, but confounding problems such as evolution of key cell lines could prevent re-establishing identical experimental conditions (empirical reproducibility).  Brian Nosek from the Center for Open Science summarized where we are as a community on identifying the problems, enforcing solutions, and changing cultural norms.

Your article entirely focused on the brief closing statement by Dr. Richard Shiffrin of Indiana University, Bloomington, which merely pointed out the obvious to the assembled scientists. Indeed there is always the possibility that those who would harm science would punish us for ferreting out our own weaknesses and correcting them. If so, shame on them.

Marcia McNutt
National Academy of Sciences

Return to Top