Littered throughout the troves of scholarly research are a few peculiar phrases. Allusions to “my last knowledge update.” Sentences beginning with “Certainly,” as in, “Certainly, here are some potential highlights for your paper.”
And occasionally, the ultra-frank admission: “I am an AI language model.”
Many publishers now require authors to disclose when they use large-language models like ChatGPT to help write their papers. But a substantial number seemingly aren’t, according to Alex Glynn, a research literacy and communications instructor at the University of Louisville.
Since March, he’s been compiling Academ-AI, a database of articles with likely, but undisclosed, chatbot-generated language. In a preprint last month, Glynn analyzed 500 of those papers and found that about 20 percent of them appeared in venues whose publishers explicitly require AI use to be disclosed.
In one typical policy, the Institute of Electrical and Electronics Engineers says that “the use of content generated by artificial intelligence in an article … shall be disclosed in the acknowledgments section of any article submitted to an IEEE publication.” But Glynn found IEEE to be the single biggest presence in his dataset, with more than 40 suspicious examples submitted to its journals and conferences. (Monika Stickel, a spokesperson, said that IEEE “has clear policies requiring authors to declare and disclose the use of AI in articles and to specify where in the content AI was used. When we detect the use of AI we take action accordingly.”)
Two years after ChatGPT took the world by storm, the findings, which have not been peer-reviewed, indicate that academic publishers are not universally enforcing their policies about AI-generated writing, because they aren’t willing to or able to. Or both.
Glynn argues that publishers need to do a better job of policing in order to preserve research integrity. “In certain cases, it’s just astounding that these things make it through editors, reviewers, copy editors, typesetters, the authors reviewing their own work, and no one catches these things,” he said. “If stuff like this can sneak through the net, what else can sneak through the net?”
Glynn found questionable papers by scouring Google Scholar; Retraction Watch, an outlet that covers research misconduct; and PubPeer, a forum where people comment on research. He looked for what he considered to be telltale phrases, the most common of which invoked the first-person singular (“I recommend,” “I apologize”), started with “Certainly, here …”, and referred to cutoffs in time for acquiring information or to the need for newer sources. As one such study, quoted in Glynn’s paper, stated: “It’s important to note that advancements in HIV research occur regularly, and new findings may have emerged since my last update in January 2022. For the most current information, it’s recommended to consult recent scientific literature or speak with healthcare professionals specializing in HIV care.”
In certain cases, it’s just astounding that these things make it through editors, reviewers, copy editors, typesetters, the authors reviewing their own work, and no one catches these things.
Other manuscripts addressed readers in the second person (“If you have additional details or context about SMO or CSMO, please provide them so that I can offer more specific insights”). Glynn also spotted several dozen instances of “Regenerate response,” a button in earlier versions of ChatGPT, which suggests that it was accidentally copied and pasted along with the program’s output.
For his analysis, Glynn whittled out papers published before the year of ChatGPT’s release and suspicious phrases that he deemed “justified by context.” Guillaume Cabanac, a computer science professor at the University of Toulouse who also roots out seemingly AI-written papers along with other problematic research, praised Glynn’s systematic collection of examples as well as his analysis.
“The AI is yet another [piece of] evidence that peer review is dysfunctional,” said Cabanac, who has collaborated with Glynn on research. “The cases that we found with Alex, with ChatGPT-generated text, if we didn’t see the fingerprints — the markers like ‘regenerate response’ and all — perhaps nobody would suspect.”
Other studies have indicated that apparent AI writing is cropping up in research, and in some fields more than others. In computer science, 17.5 percent of papers between late 2022 and early 2024 contained signs of such usage — namely, suddenly popular words like “pivotal” and “intricate” — versus 6 percent of math and science papers, according to a preprint from Stanford University researchers.
Generally, Glynn said, “an odd bit of phrasing in an otherwise sound article is no more consequential than a typo or a misspelling.” But if seemingly obvious signs of ChatGPT are slipping by editors, Glynn says, then more serious problems — namely false, chatbot-hallucinated claims — could be too. This will happen more often, he predicts, as scholars get savvier about cleaning up their trail.
Glynn identified several studies with seemingly undisclosed AI writing. Publishers of those articles told The Chronicle that they were investigating the allegations but that they by default rely on researchers to be honest. “We require authors to disclose the use of AI and AI-assisted technologies in the manuscript and that statement will appear in the published paper,” said a spokesperson for Elsevier. “We expect authors to comply with our policies,” said a Wiley spokesperson.
But a representative for Springer Nature, which publishes Nature, said it believed it was proactively screening out most AI writing. Glynn’s study identified fewer than 20 questionable articles from the publisher. Those represent “less than 0.005 percent of our publications from a single year,” Chris Graf, Springer Nature’s director of research integrity, said in a statement. This “low proportion would suggest that our policies, and the work by editors, reviewers, publishers, and authors to uphold them, are successful.” The company credits a program called Geppetto with weeding out hundreds of fake papers.
The publisher PLOS says that “contributions by artificial intelligence tools and technologies to a study or to an article’s contents must be clearly reported.” That policy was put to the test in an episode that went viral this spring.
When PLOS ONE editors investigated concerns that AI had been used to write a study, they were unable to find 18 of its 76 cited references. Some of the authors told the journal that they’d only used the AI tool Grammarly to improve their writing. They also provided replacement references, but several “did not appear to support the corresponding statements in the article,” the editors wrote when they retracted the study. Not all of the authors agreed with that decision.
Renee Hoch, PLOS’s managing editor for publication ethics, pointed to the retraction as evidence that editors “follow up as needed when concerns about undisclosed AI usage come to our attention.”
But she said that PLOS is mostly hands-off. “We do not routinely screen submissions for AI usage; our enforcement of our AI policy is largely responsive,” she said in a statement. Mass screening would require a reliable, automated detection tool, she said, and “we do not yet have such a tool.”
If we didn’t see the fingerprints — the markers like ‘regenerate response’ and all — perhaps nobody would suspect.
Most journals are in this boat, said Mohammad Hosseini, an assistant professor of preventive medicine at Northwestern University who specializes in research ethics. In January 2023, he co-wrote an editorial about how to use large-language models in scholarly writing. This essay, which Glynn cited in his study, said that “researchers should disclose their use” and indicate which text came from those tools.
But in an interview, Hosseini said he now believed that guidance — which they started writing right after ChatGPT’s release — underestimated the difficulty of the situation. “I think disclosure policies at the moment are mostly like a suggestion, because they are not enforceable,” he said.
When serving as an editor, as he does at the journal Accountability in Research, Hosseini said that he has no real way of forcing someone to disclose. “What if I say, ‘Oh, it looks like you’ve written this with AI. You have not disclosed it. Do you mind disclosing?’” Hosseini said. “At that point, what if they lie and they push on and say, ‘Oh, no, I have not, but thank you for your comments’? I’ve just made a fool of myself.” And he questioned some of the criteria Glynn used to flag papers: Using the first or second person may sound computer-generated to some, but it may also be how a non-native English-speaker writes.
Hosseini also said that Glynn’s paper assumes that violations should be caught by peer review, which is widely considered to be broken. “A lot of people are just fatigued with how many papers and articles they read and review every year without any compensation, without any sense of appreciation,” he said.
Glynn acknowledged that some of the examples in his database are judgment calls, and there may be false positives. He also doesn’t believe that disclosure policies can be perfectly enforced. But “I do think that journals, at least the ones that I’ve found these instances in, could do a better job,” he said. Same goes for peer reviewers. “I’m sympathetic to burnout,” he said, “but if reviewers are burned out, then they need to say to the journal, ‘Look, I can’t do this, I’m sorry,’ rather than not doing a proper job of it.”
Hosseini said that he and his collaborators from last year’s editorial are now updating their recommendations to be more nuanced. Disclosure should be treated as a continuum, he said, where, depending on the context, it may be mandatory, optional, or not required.
Some publishers are already adjusting their policies. From 2023 to 2024, as ChatGPT became more sophisticated and ubiquitous, the publishing arm of the Institute of Physics went from requiring authors to declare AI use to encouraging them to do so. “Recognizing that AI can be used in many legitimate ways, we are focusing on ensuring the accuracy and robustness of the content through a combination of automated and human checks, rather than prohibiting AI completely,” said Kim Eggleton, head of peer review and research integrity for IOP Publishing, in a statement.
These rapid changes could complicate Glynn’s analysis, Cabanac noted, because an AI policy that’s currently in place may not have existed at the time an author submitted a study.
I think disclosure policies at the moment are mostly like a suggestion, because they are not enforceable.
Another complicating factor, Hosseini said, is that the disclosure policies named in Glynn’s study are not identical. Taylor & Francis says that whenever “AI tools are used in content generation, they must be acknowledged and documented appropriately.” But Springer Nature says that using an AI chatbot for copy editing does not need to be declared. That’s also the policy of MDPI, an open-access journal publisher. A 2023 study in an MDPI journal containing the phrase “regenerate response,” which Glynn’s analysis highlighted, was considered acceptable since generative AI had been used for copy editing, according to a spokesperson.
Glynn noted that the publishing community agreed soon after ChatGPT’s release that AI use should generally be disclosed — the Committee on Publication Ethics was saying so by early 2023, for instance — but added that he planned to address nuances between journals’ policies when revising his paper.
Glynn did find evidence of troubling behavior across publications. He estimated that about 2 percent of the examples in his dataset were formally corrected — a “vanishingly small number,” he said. He also reported that 1 percent underwent “stealth” corrections: times that journals removed phrases like “regenerate response” from articles but didn’t issue a formal correction or didn’t acknowledge AI was used, as their policies said they should.
Three of those instances were in Elsevier journals. A spokesperson for the publisher did not directly address the allegation, but said “that we are conducting an investigation to determine whether there has been any misconduct, and cannot provide further detail until that investigation concludes.” A Taylor & Francis spokesperson said it would investigate another alleged stealth correction.
Hosseini said that these episodes were “well-spotted.” “We are talking about science, which is a context where transparency and integrity are constantly being promoted, and we cannot have parties that can flout those ethical norms with impunity,” he said. “But at the moment, we do have such parties, i.e. publishers, that seem to see themselves above and beyond the established norms.”
Glynn said his goal was not to shame specific authors or editors, or to assume that all ChatGPT-using scientists are trying to cheat the system. “There may be instances where there is a clear party at fault, but the general trend of it all is just ‘something is wrong, we need to fix it,’” he said. “I would rather focus on ‘How do we address this and make sure the policies are followed?’ rather than trying to take individual people to account for what could be mistakes.”