A decade ago, John P.A. Ioannidis published a provocative and much-discussed paper arguing that most published research findings are false. It’s starting to look like he was right.
The results of the Reproducibility Project are in, and the news is not good. The goal of the project was to attempt to replicate findings in 100 studies from three leading psychology journals published in the year 2008. The very ambitious endeavor, led by Brian Nosek, a professor of psychology at the University of Virginia and executive director of the Center for Open Science, brought together more than 270 researchers who tried to follow the same methods as the original researchers — in essence, double-checking their work by painstakingly re-creating it.
Turns out, only 39 percent of the studies withstood that scrutiny.
Even Mr. Nosek, a self-described congenital optimist, doesn’t try to put a happy spin on that number. He’s pleased that the replicators were able to pull off the project, which began in 2011 and involved innumerable software issues, language differences, logistical challenges, and other assorted headaches. Now it’s done! That’s the upside.
What they actually found was not encouraging.
“Of course the results are disappointing in some ways,” said Mr. Nosek. “I would have loved for the reproducibility rate to have been higher, especially in my own field — social psychology, as opposed to cognitive. It does have a sense of, ‘Wow, we could really do better.’”
Back when the project began, a colleague warned Mr. Nosek not to take it on because “this could make us look bad.” And it certainly doesn’t make the field look terrific. These were studies published in respected journals like Psychological Science. As much as possible, and with assistance from the original authors in many cases, the replicators tried to mirror the original methods. Yet in most cases they could not reproduce the effects.
Here’s an example: The Journal of Personality and Social Psychology published a study that found that expressing feelings of insecurity to your partner made you feel more insecure. Fascinating, no?
The replicators followed the same procedures and couldn’t get the same primary result.
Meanwhile Psychological Science published a study indicating that people with dominant personalities were more vertically oriented — that is, they responded more quickly to stimuli presented along a vertical axis. Weird but interesting, right?
The replicators found no evidence for that conclusion.
There are caveats. Just because a finding couldn’t be reproduced doesn’t mean that it’s false. (Likewise, just because a finding is confirmed, it isn’t necessarily true.) Researchers often protest that it’s difficult to precisely re-create the conditions of an experiment, though for this project the scientists performing the replications did go to great lengths to perform direct replications and to request supporting materials from the original researchers.
So why didn’t they get the same results?
It’s not because of outright fraud. Or at least the project’s researchers didn’t report any instances of faked experiments.
Instead it’s probably the usual suspects, like the so-called file-drawer effect, in which researchers perform an experiment multiple times but only publish the exceptional successful attempt. Or it could be because a study used too few subjects, which makes it more likely that statistical noise could masquerade as a positive finding.
Or it could be more subtle forms of bias. You want your experiment to find something important and intriguing — indeed, research careers depend on good stuff happening in the lab — and it’s tempting to tweak conditions to bring that about.
‘It’s Pretty Bad’
For researchers whose studies were among those chosen for replication, the project was likely a bit unnerving.
Among them was Kathleen D. Vohs. The original study by Ms. Vohs, a psychologist at the University of Minnesota, found that subjects who believed that human behavior is predetermined, rather than a function of free will, were more likely to cheat. The replication’s results were “in the same direction as the original result,” but unlike Ms. Vohs’s original study, the effect found was not statistically significant.
Via email, Ms. Vohs writes that the result “makes sense if the hypothesis is in fact true.” Over all, she considers the experience a good one. “I can imagine that a lot of the original scientists’ experiences depended on who was replicating their work,” she writes.
Harold Pashler had a good experience as well. Mr. Pashler, a professor of psychology at the University of California at San Diego, has been an outspoken critic of research he considers dubious. As it turns out, one of his studies was chosen for the project — and was among the 39 percent that passed the test. “It makes me feel pleased but not surprised,” Mr. Pashler writes via email. He added a happy-face emoji for emphasis.
His feelings about the results of the project in general are less sanguine. “It is pretty bad!” he writes. “Not as bad as what the biotech companies report for biomedical studies, but unacceptably bad.”
So what about Mr. Ioannidis, who long ago foresaw these results? Like Mr. Pashler, he is full of praise for the project itself, which he says provided some hard results about an issue that was until now “mostly anecdotal and very emotional.” And he thinks the field deserves credit for aggressively examining a condition that almost certainly infects other disciplines.
But it’s still a bummer.
“It is bad news,” says Mr. Ioannidis. “I would have wished to be proven wrong.”
Tom Bartlett is a senior writer who covers science and other things. Follow him on Twitter @tebartl.