Amy Chatfield, an information-services librarian for the Norris Medical Library at the University of Southern California, can hunt down and deliver to researchers just about any article, book, or journal, no matter how obscure the topic or far-flung the source.
So she was stumped when she couldn’t locate any of the 35 sources a researcher had asked her colleague to deliver.
Each source included an author, journal, date, and page numbers, and had seemingly legit titles such as “Loan-out corporations for entertainers and athletes: A closer look,” published in the Journal of Legal Tax Research.
Then she started noticing oddities about the sources. Each title used the term “loan-out corporation.” Many of the sources were published in state journals in 2018 or 2019, even though the term was popularized by a federal tax law that went into effect in 2018, and academic articles typically take a year or more to be published.
Chatfield and her colleague returned to the original researcher to ask where she had found her sources. ChatGPT produced them, the researcher said.
“They tricked her and they tricked one of my colleagues,” Chatfield said. “And they tricked me for a good 10 minutes.”
Since ChatGPT emerged last November, stories like Chatfield’s have swirled around the collegiate-library world. Librarians receive citations with all the details they need, only to discover that they were fabricated by ChatGPT. In response, library staff have started publishing web pages and hosting workshops all with the same message: ChatGPT can do a lot of things. But it cannot find sources.
It’s really just a word salad of a citation put together.
With public confidence in research already low, experts worry that the use of ChatGPT could further erode faith in academic writing. Students and other novice researchers could also lose essential research skills and run into trouble in the classroom if they don’t understand ChatGPT’s many flaws, they said.
A source from ChatGPT “seems like a good citation, but it’s really just a word salad of a citation put together,” said Hannah Rozear, a librarian for biological sciences and global health at Duke University. “That’s something we’ve never really seen in any kind of tool we’ve used.”
Experts in the tech industry refer to errors generated by artificial intelligence as “hallucinations.” They’ve been causing problems in many fields, and made national news when ChatGPT tripped up a lawyer, who ran into legal trouble after referring to several fake, AI-generated legal proceedings during a hearing.
Scientists aren’t sure why ChatGPT sends users false information. One hypothesis is that the errors come from the way these models are trained, said Hannaneh Hajishirzi, a computer-science professor at the University of Washington.
Generative AI can only use information from the resources it was developed with, she said. ChatGPT, for example, cannot access the internet and was only trained on information available online up until September 2021; it doesn’t know about anything that happened after that. Importantly, it also can’t access paywalled articles — a feature OpenAI, the company that produced ChatGPT, disabled only a week after it rolled out the model. Most scholarly journals are behind a paywall.
ChatGPT creates its responses based on natural language patterns to sound fluent, Hajishirzi said. Even if the system doesn’t have an answer to a question, it will combine the information it has based on the patterns it knows, and sometimes that information is false, she said.
OpenAI declined an interview request from The Chronicle.
Even before the introduction of ChatGPT, researchers had become careless with citing their sources, said Mohammad Hosseini, a postdoctoral researcher at Northwestern University’s Feinberg School of Medicine.
Much of that was in reaction to the advent of the internet, he said, which shortened the time it took to find a source from sometimes months to minutes, and made long waits for interlibrary loans a rarity. Gone were the days when researchers might read a source cover to cover just because it had taken so long to track it down, said Hosseini, who is also an associate editor of Accountability in Research, an academic journal that investigates ethics in biomedical research.
Now, with the internet making sources more ubiquitous, scholars tend to cut corners, sometimes only reading an abstract before citing them, he said. Many citations also have inaccurate dates or even misattributed information, he said.
There’s no easy way to separate the beneficial uses from the less-beneficial uses.
Carelessness in research is indicative of the overall pressure many scholars feel to publish often and gain prestige, Hosseini said. To him, ChatGPT only exacerbates the problem.
As researchers hustle to complete their next study and report their latest findings, he said, AI systems can help, “but they don’t help us get rid of the problem. They help us become better in a flawed game of publish or perish.”
Citing ChatGPT-generated sources also violates an understanding with readers that scholars conducted their work ethically, said Sam Bruton, the director of research integrity at the University of Southern Mississippi.
Although the peer-review process provides some accountability on papers before they’re published, reviewers rarely check every single source, Bruton said. When researchers cite something, they’re generally assumed to have read and understood it.
Hosseini worries that the use of false citations could spiral out of control if researchers find they can use ChatGPT and get away with it. And other scholars may continue to use those fake citations in their own papers.
“It becomes a complicated web of lies,” he said.
Some scholars have started using other source-finding tools that are trained specifically for academic research, such as ResearchRabbit or CitationGecko, so they may not produce as many fake citations as ChatGPT. These tools often have access to articles from several databases, even if they are behind a paywall. GPT-4, OpenAI’s newest model, is advertised as being more accurate, but it is only available to subscribers for $20 a month.
Bruton predicts that using AI tools trained for academic research will continue to gain popularity. But even if they are more accurate, users should still be extra cautious, he said.
For example, he worries that AI may only offer frequently cited articles from acclaimed journals, which could deter readers from consulting other researchers’ work. AI programs also might not be able to catch when an article has been retracted for issues like fraudulent data, causing researchers to read and cite them unknowingly, he said.
“The more you can fine-tune one of these things, the more useful it can be, but also the more dangerous it can be in the sense that it can be harder to figure out when it’s hallucinating,” Bruton said. “There’s no easy way to separate the beneficial uses from the less-beneficial uses.”
The best way to use these AI tools responsibly is to constantly check sources, Bruton said. Researchers should also be transparent in their papers about when they’ve used AI, so readers can determine for themselves if they trust the information, he added.
Some publishers and journals have started requiring researchers to disclose when they’ve used AI or are prohibiting it altogether.
The JAMA Network, which publishes articles from the American Medical Association, prohibits researchers from using AI tools as authors and requires them to disclose when they’ve used them as editing tools or to create content. PLOS One, a “mega journal” that covers science and medicine, requires researchers to disclose any AI tool they’ve used and how they’ve used it, and how they’ve verified information. Other publications like Science and Nature have enacted similar rules.
“If people are trying to publish studies and they don’t have a solid grasp of the underlying literature on the shoulders on which they’re trying to stand,” Bruton said, “then the whole thing becomes increasingly rickety and untrustworthy.”
What’s so tricky about ChatGPT is that many of the sources it produces seem entirely plausible. They have popular authors and come from real scientific journals or websites, and even the titles seem accurate.
When The Chronicle asked ChatGPT to provide three academic articles on great white sharks, this is what it came up with, arranged here into MLA format:
Cullum, J. and Meyer, C. G. “A Review of Shark Aggression Studies: Implications for Shark Conservation and Human-Shark Conflict.” Aquatic Conservation: Marine and Freshwater Ecosystems, vol. 30, no. 3, 2020, pp. 483-98. DOI: 10.1002/aqc3291
Domeier, M. L., and Nasby-Lucas, N. “Great White Sharks: The Biology of Carcharodon carcharias.” Academic Press: Marine and Freshwater Research, vol. 59, no. 7, 2008, pp. 594-605. DOI: 10.1071/MF07159
Rasch, L. J., Martin, K. J. et al. “Biomechanics of Shark Teeth: A Pathway to Tooth Regeneration?” Journal of Anatomy, vol. 234, no. 5, 2019, pp. 539-50. DOI: 10.1111/joa.12939
Each of these journals is a real peer-reviewed publication and many of the authors listed are experts. So far, so good.
A Google search for Meyer C.G., one of the authors in the first citation, leads to an article by Carl G. Meyer on how sharks can detect magnetic fields. Nasby-Lucas N., who supposedly wrote “Great White Sharks: The Biology of Carcharodon carcharias,” probably refers to Nicole Nasby Lucas, a research biologist with several publications on sharks. And looking up Rasch L.J. leads to Liam J. Rasch, who published an article in 2016 with Kyle J. Martin on a gene that causes sharks to regenerate teeth. Still encouraging, right?
But “Great White Sharks: The Biology of Carcharodon carcharias” is not even an academic article; it’s a book published by a company called Academic Press in 1998. Nasby Lucas didn’t write it, either. And the other two articles cited by ChatGPT don’t seem to exist at all.
Spreading awareness about using AI responsibly starts with educating high-school and college students, Hosseini said.
Students who use fake citations are essentially failing to do their assignment.
The appearance of ChatGPT’s sources can trick them, in particular. If a professor asks for a list of 10 citations on a research paper, a student who is rushing to finish the night before might turn to the AI tool to produce a few additional ones, said Chatfield, the librarian from USC.
But using AI for research prevents students from learning how to gather and present information, said Sarah Park, a librarian for computer science and engineering at Duke.
“Students who use fake citations are essentially failing to do their assignment,” Park said.
Moreover, said Duke’s Rozear, if they’re reading only a summary of a source, they might miss some of its essential conclusions and be unable to analyze the article themselves.
After Park and Rozear first encountered fake citations, they published a blog post on Duke’s library website advising readers to avoid asking ChatGPT for sources. Instead, they suggested asking it for databases to look for articles, writing tips, or ideas related to a particular topic.
Chatfield held a workshop for faculty in the spring to explain false citations and ways to deter students from using ChatGPT for sources. Professors can have students do in-class writing assignments or encourage them to consult sources that were covered in class, she said.
Students should be learning about both the benefits and limitations of AI, Rozear said. AI may help them navigate roadblocks they face as novice researchers, she said. Rather than spending hours searching for multiple sources, students can use ChatGPT to get a basic understanding of a subject, then use their own research skills to study it more in depth.
Park recalls one professor who asked his students to produce 10 citations with ChatGPT. Then the professor identified the inaccurate parts of each citation.
“This is a proactive approach — not only educating students about the potential problems, but also guiding them toward responsible, ethical academic conduct when they are using AI,” Park said.
No matter how powerful the technology becomes, she said, it has limited uses for students and scholars.
“Ultimately research is about the expression of our intellectual activities. And that cannot be replaced by AI.”