Wondering how much you should trust AI-detection tools? Consider this cautionary and true tale:
One Thursday morning, a student at a large research university received the following alert: You may have violated the university’s honesty policy. Her professor — suspicious after reading one of the student’s discussion-board comments that had seemed too insightful — decided to check three of her posts using two AI-detection tools. The post that had triggered the professor’s suspicions had a 50-percent probability of being generated by AI, according to GPTZero, while the student’s other two posts stood at 35 percent and 5 percent. Copyleaks, however, found that AI had generated 100 percent of the content in all three posts.
The student knew that the work was her own and was beyond distraught, but she also wondered why there were such large discrepancies between the findings. Then she had an idea. She located the professor’s most recent, single-authored, peer-reviewed article, copied its abstract into the free GPTZero tool and, lo and behold, it reported that there was a 36-percent probability that the professor’s text was written by AI. The student then paid the subscription fee for Copyleaks, and it found that the professor’s abstract had “100-percent AI content” (despite having been published 10 months prior to the release of ChatGPT).
As easy as it is for students to cheat with AI, it’s just as easy for faculty members to build a case with AI accusing a student of cheating. Understanding how students cheat with these new tools and how AI detectors work is now essential for professors, students, and institutions.
There has always been cheating. Covid made it worse, and AI has made it easier. In our new book, Teaching With AI: A Practical Guide to a New Era of Human Learning, we explore this complicated subject from a variety of angles, including cheating and detection. Faculty members will need training in AI and detection, and all of us need to think deeply about some thorny questions:
- How much use of AI in a student’s work is too much? What is your threshold as an instructor? Is it based on the probability of cheating (scored by a detection tool), the percentage of content that’s copied, the context of the assignment (plagiarizing a discussion post versus a major paper), or some combination of those? And will you share that threshold with students?
- Are we in higher ed OK with adopting detection software that works best on students who can’t afford to purchase access to better AI tools?
- How will we protect students from false positives?
- If students are using AI to help them brainstorm ideas, outline, analyze, summarize, draft, and even think, should we be happy about that? Will AI-assisted writing become the norm?
New policies need to be put in place and regularly revised as they are put to the test in the wild.
How much are they cheating? Surveys of self-reported cheating with AI vary widely. Students seem to recognize the ethical considerations and risks of using AI for their writing assignments, but most plan to keep using it anyway. Why? Because it seems to work: 12 percent of student ChatGPT users say it increased their grade-point average.
The race to create and refine AI detectors continues, but efficacy studies don’t always consider the length of the text, the quality of the prompt, or the use of paraphrasing tools. Further, given the rate of tech advancement on this front, and the time-consuming process of peer review, a study published now will ultimately report findings for versions of AI tools that no longer exist or have changed dramatically, even when the study authors rush to share preprint publications.
Ultimately, students who can afford the latest tech will stay ahead in the AI-detection race. We should all question who in our student population is most likely to be caught by AI detection, and why.
Detectors vary wildly in performance. One 2023 study looked at 14 different AI detectors and six varieties of texts, and found substantial differences in how those tools performed and in how students tried to avoid detection. Three of the detectors (including Turnitin) missed only one of the 18 AI-written samples they were fed and had no false positives. However, five of the detectors were only 50 percent accurate, or worse.
Other studies confirm the enormous variability among detectors, and GPTZero (yes, that same free software from our opening anecdote) did poorly in all of them. A March 2024 study of six major AI detectors (including GPT-4, the better, paid version) applied the same basic techniques that students use to avoid detection (like asking the AI to include spelling mistakes), and found that doing so reduced the accuracy of the detectors. That study concluded that “the available detection tools are neither accurate nor reliable.”
Detectors are not all equal, but the best ones are better at separating human from AI writing than faculty members tend to be. For example, in a 2023 study, professors identified only 54.5 percent of the AI-generated content compared with 91 percent correctly identified by Turnitin.
Still, how much collateral damage are you and your institution willing to tolerate?
A false positive rate of 1 percent is acceptable to some. But in the fall of 2023, when Turnitin reported that its false-positive rate had increased from 1 percent to 4 percent, many institutions (including Vanderbilt and Michigan State Universities and the University of Texas at Austin) turned off AI detection. The ethical calculus of falsely accusing students of cheating — and potentially exacerbating the student mental-health crisis — had shifted for those institutions. Still, others stayed the course.
Students and other users are constantly discovering ways to improve AI output: Better prompts, iteration, and even simply asking the tool to “slow down and think” all improve outputs and can help fool detectors. AI is also a terrific mimic and loves being asked to write in a particular style, or to paraphrase a professor’s favorite ideas from their lecture notes.
Cheating has long been big business, and it is quickly adapting to AI. Paper mills, like MyEssayWriter (“We make graduating easy”) and Killer Papers (“Zero AI. Zero plagiarism. Guaranteed”) reported that “business” started to decline after GPT-4 was introduced. So called “student-aid providers” charge students for homework “help,” but the rise of ChatGPT sent their stock prices plummeting. Many of these outfits have countered with new “humanizer” products that promise to beat AI detectors. Some offer a preview of what detectors might find and offer to rewrite your homework, all while selling the reverse services to institutions. Conflicts of interest abound.
Here, too, emerges another advantage for financially secure students: the ability to not only pay for better AI tools but also buy “humanizing” and paraphrasing software.
Where does all of this leave faculty members? It is a fair complaint that some professors and institutions have misused AI-detection tools. After all, these detectors do not accuse students of cheating; they only provide a probability score or an estimate of how much of a text includes AI-generated content. When your smoke alarm goes off, most of us check for a fire before we run out of the house.
Proctoring, oral exams, and blue books are making a comeback, but homework and writing assignments are essential to learning and thinking. Some things you can do to deter cheating:
- State more clearly on your syllabus and in class the purpose and relevance of writing assignments.
- Divide a writing assignment into the components that demonstrate the value of each step of the process. (AI writing tutors might be able to provide students with feedback at each stage, increasing the guidance they receive without increasing faculty workload.)
- Discuss the value of integrity. Attribution is an ancient value that remains important in society and in work as well as in academic situations.
- Have your students use one of the many online writing tools (some still free) that allow a faculty member to see the version history of a document.
But AI also provides us with an opportunity to rethink what we teach and why. Frequent users already understand that AI is best used as a partner: Thinking with AI can make you more creative. One recent study found that AI-assisted writing is better, faster, and even more fun. The world of work and our lives are all about to be transformed, and part of our mission is to prepare students for the life that awaits them beyond our institutions. What we call cheating, business calls progress.
If, indeed, AI collaborations are emerging or probable in the future of your discipline, then it’s time to adjust your curriculum and your pedagogy in ways that will prepare your students for a changing world. The American Association of Colleges and Universities is creating an Institute on AI, Pedagogy, and the Curriculum to specifically assist departments, programs, and colleges with that challenge.
So how did our opening story end? Not happily for either side. The professor’s university had a subscription to Turnitin’s AI-detection tool (one of the most reliable), but when it found no evidence of AI usage in the student’s work, the professor opted to ignore that finding. He turned to other (less accurate) tools to confirm his suspicions. After a contentious meeting, he withdrew the complaint yet pledged to go back and review all of the student’s posts for evidence of plagiarism.
Clearly, before we put our faith in a new industry of AI detection, we all have a lot of thinking to do about how we prepare our students for work, citizenship, and 21st-century life.
This essay is adapted from a new book, Teaching With AI: A Practical Guide to a New Era of Human Learning, published this spring by the Johns Hopkins University Press.