The publication process in social science is broken. Articles in prestigious journals use flawed data, employ questionable research practices, and reach illogical conclusions. Sometimes doubts over research become public, such as in the case of honesty scholar Francesca Gino, but most of the time research malpractice goes unacknowledged and uncorrected. Yet scholars know it is there, hiding below the surface, leading to frustration and cynicism. Research “has become a game of publication and not science,” as one professor wrote in response to a survey on research practices.
The current focus on the “game” of publishing encourages authors and outlets to search for surprising and interesting results rather than those that are scientifically justified. Journals have published outlandish studies (a 2007 paper claimed that attractive parents are 26 percent more likely to have girls, a 2011 study found evidence for extrasensory perception, etc.), as well as costly and even dangerous studies (a paper linking vaccines to autism in 1998, a 2022 meta-analysis of “nudges” drastically overestimating their effects, etc.). These papers gained wide publicity and influence, partly via the credibility provided by peer review. Fortunately, all became so well known that they were eventually rebutted or corrected. More insidious are those cases of flawed research that remain hidden from popular outlets and thus require correction by the journals themselves.
Consider the example of a 2010 article in the Administrative Science Quarterly, a journal many consider the most prestigious outlet for scholars of business management. It advanced a plausible hypothesis (a firm protects the environment near where its owners live), and it employed a classic empirical approach (regression analysis), but it also included a statistical slip that led the authors to mistakenly find evidence for one of their hypotheses. Fifteen years later, it remains uncorrected. In the publishing game, the article is a home run, with over 1,000 citations per Web of Science. Admitting now that it was actually foul would raise awkward questions.
Problems with the credibility of empirical research have been discussed for decades. In the early 1980s, Edward E. Leamer proposed it was time to “take the con out of econometrics.” In 1992, J. Bradford De Long and Kevin Lang warned that because false findings often seem interesting, they are more often published. In 2005, John P. A. Ioannidis cautioned that low statistical power and selection could cause a situation where most published results are false. In 2012, a survey of 2,000 experimental psychologists found that “questionable research practices” were so common that their use could be “the prevailing research norm.” Scandals at Cornell and Harvard suggest that not even the most elite universities are immune.
Most of the time research malpractice goes unacknowledged and uncorrected.
The most common prescriptions for improving credibility — better review, public critique, and replication — have merit, but they fail to direct scarce resources where they can do the greatest good: those few publications with the greatest impact. We propose an alternative, using review resources more efficiently and effectively by borrowing the idea of “replay review” from professional sports. The current peer-review system would continue to judge research articles in real time when they are submitted, but the publications that go on to have an outsized impact would be evaluated again, and in more detail, to confirm or refine the initial assessment. As in sports, this process could be highly effective without undue disruption or cost.
The main drawback of our proposed replay review is that it does not correct the primary review process, and thus it tacitly acknowledges that some flawed studies will be published. Why not correct the primary review instead? In short, fixes to primary review are unlikely to be effective. Current proposals include financial incentives to reviewers, expert support, automated test software, and greater transparency. Some journals now pay their reviewers, others have added additional review by statisticians, several journals in the Wiley Publishing group now use software to evaluate images for signs of manipulation, and many journals have increased data-disclosure requirements.
Such approaches can be helpful, but they are resource-intensive and only partly effective. Financial incentives improve review only if editors can assess quality, which requires them to be universal experts or to sometimes duplicate reviewer effort. Support from experts in statistics can increase labor efficiency, but specialist time is limited. Automated tests catch some types of errors, but they rarely find flaws in design or execution. Transparency only works if someone reviews the posted information.
All proposals for strengthening primary review face an economic challenge: Most of the resources spent on strengthening it are wasted because, for most submissions, the existing review process is already strong enough. At top journals, 90 percent or more of the submissions are rejected for apparent flaws, and thus, a strengthened review process will not change the outcome. Of the 10 percent that are accepted and published, most are lightly read and cited and thus have little influence. Directing strengthened review to the most important publications requires, as in sports, time to assess the importance of a given play and determine if it should be reviewed again.
Another possible objection to our proposal is that it’s made unnecessary by existing systems of critique and replication. We support these systems, but evidence suggests their influence is minimal. A 2022 paper found that only a third of the journals in social science provide any means of critique, and those that do severely restrict its use both in time and length, allowing a median of only four weeks to submit comments of (usually) fewer than 400 words. Across a sample of 2,066 articles, the study found that “only two post-publication critiques prompted publication of a correction.”
Like critique, replication is an oft-prescribed corrective to flaws in the published scientific record, but we don’t believe replications can solve social science’s credibility crisis because, despite the appearance of replication-friendly journals such as Sociological Science and Econ Journal Watch, replications are seldom submitted and even more seldom published. According to a 2018 study of accounting scholars, most authors don’t even try to publish replication analysis because “it is more harmful to one’s career to point out the fraud than to be the one committing it.” Or, as another wrote: “Replication studies don’t get cited, and journals don’t publish them … nor do people get promoted for replication studies.”
Replications are seldom submitted and even more seldom published.
To be accepted for publication, replications are often transmogrified into extensions, rather than refutations, of previous work. A prominent journal editor expressed the problem concisely: We say “we welcome replications and then … impose relatively significant hurdles for those replications.” These hurdles often force replications to water down critical findings and frame the research as an expansion, rather than a refutation, of the original work. As a result, the influence of replications is muted. The original often remains better known and better cited, even after a failed replication. After reviewing 10 years of replications in the American Economic Review, a 2024 study concluded “the economics literature does not self-correct.”
We contend that replay review provides an effective and efficient way to correct the published record. Here’s how it would work. Once a publication receives a specified number of citations, it would receive an independent review. These reviews would then be published in full, along with author responses, so that readers have additional guidance on how to interpret the initial publication. As with “booth review” in sports, replay review in science should include analysis from multiple angles, but also a clear assessment of how the “play” should be called, and whether the original conclusions were justified or not.
To assess the practicality of our proposal, we evaluated the submission and citation history of articles published in the 2014 cohort of the peer-reviewed empirical journals selected by the Financial Times for determining the research rank of business schools. This list includes highly technical publications (Econometrica and Operations Research) and more practical publications (Marketing Science and Human Resource Management). Collectively, these 47 journals received more than 31,000 submissions in 2014 (excluding book reviews or editorial essays), of which about 10 percent (3,174) were accepted for publication. Over the following decade, the median article in these journals received 50 citations, but the distribution of citations is highly skewed, with the top article being cited nearly 2,000 times.
Skewness in citation rates means that a large proportion of the citation impact can be checked at a relatively low cost. Replay review of just those articles receiving more than 250 citations would mean that publications accounting for 28 percent of all the citations would be checked through further review. This would require review of just 162 articles, representing only 5 percent of those published and 0.5 percent of those submitted. Even if each review took twice the effort of the average pre-publication review, our system would add only 1 percent to the total reviewing effort, while providing important perspectives on papers representing more than one-quarter of the citations received by these influential journals.
The detailed implementation of our proposal should be defined by each discipline or even each journal, and the number of citations that trigger a review should vary with the citation patterns of journals and disciplines. Whatever the final implementation, the main aspects of our proposal should be maintained: To avoid the perception of bias, the rules for triggering additional review should be clear and measurable; to encourage objectivity, reviews should be anonymous; and to advance scientific progress, the reviews should appear in the same outlets where the original papers were published.
Social science is in crisis because the publication process is broken, and thus, readers cannot trust what they read in even the most prestigious journals. Many proposals have been made to correct the situation, but all are costly, and none have proven effective. It’s time to try something new.