For the entirety of my career, I have harbored the suspicion that a significant part of my job as an English professor is, in the late David Graeber’s technical term, bullshit. I refer to the assigning, collecting, commenting on, and grading of merely competent student essays. About a third of the essays I have received over the 18 years I’ve been teaching belong to this category. These are essays that meet all of the official criteria for student writing: They have a thesis; they are polished, coherent, and well-argued; they support their points with evidence. They also lack any trace of surprise or originality, make no new connections, and are devoid of any striking use of language or evidence of individual human sensibility.
We’re sorry, something went wrong.
We are unable to fully display the content of this page.
This is most likely due to a content blocker on your computer or network.
Please allow access to our site and then refresh this page.
You may then be asked to log in, create an account (if you don't already have one),
or subscribe.
If you continue to experience issues, please contact us at 202-466-1032 or help@chronicle.com.
For the entirety of my career, I have harbored the suspicion that a significant part of my job as an English professor is, in the late David Graeber’s technical term, bullshit. I refer to the assigning, collecting, commenting on, and grading of merely competent student essays. About a third of the essays I have received over the 18 years I’ve been teaching belong to this category. These are essays that meet all of the official criteria for student writing: They have a thesis; they are polished, coherent, and well-argued; they support their points with evidence. They also lack any trace of surprise or originality, make no new connections, and are devoid of any striking use of language or evidence of individual human sensibility.
Every time I have to grade one of these essays, I die a little. By my second year as a professor, I concluded that such papers have no conceivable educational or intellectual value — for myself, the student, the college, or the world. Getting an essay that is above the level of mere competence is, of course, a joy, while getting an essay that falls below the level of mere competence represents an opportunity for real teaching — in working with the student to make it better — or for simple joy in the encounter with another mind. Receiving a merely competent essay is like meeting a toaster. We don’t have much to say to each other.
Not only am I not doing any good by eliciting such essays, I’ve often thought, but perhaps I’m even doing real harm, by encouraging human beings to imitate a machine. And I do encourage this machine-like behavior. Because I will often give a merely competent essay, in this era of grade inflation, a B. And even sometimes, though I can hardly admit it to myself, an A-minus.
To give you a sense of the genre, here’s a good example of the first paragraph of a merely competent essay:
The Fluidity of Identity in Walt Whitman’s “Song of Myself”
Walt Whitman’s “Song of Myself” is a celebrated American poem that explores the multifaceted nature of identity. Published in 1855 as part of his collection “Leaves of Grass,” Whitman’s poem delves into the idea that identity is not a fixed or static construct but a dynamic and ever-evolving concept. Through the use of vivid imagery, symbolism, and a free verse style, Whitman invites readers to contemplate the fluidity of identity in both individual and collective contexts.
The only significant difference between this essay and the dozen or so I received in a poetry course I taught a decade ago is that “The Fluidity of Identity” was produced by ChatGPT, based on a prompt (“analyze Walt Whitman’s ‘Song of Myself’”) entered by a graduate student, Ryan Pfeiffer (whose own writing, I hasten to say, is far above the level of mere competence).
ADVERTISEMENT
Earlier this year, when I began to read reports of the success of AI at generating competent writing, I rejoiced. This is exactly what technology is supposed to do, I thought. Like a dishwasher or a vacuum cleaner, ChatGPT automates drudgery so we can focus on something more important.
Merely competent writing has always had the look and feel of the automatic anyway. Now, I thought, instead of students having to churn out machine-like prose, we’ve got actual machines that can write all the boring essays on Whitman anyone could desire, if in fact there’s anyone in the world so perverse as to desire such a thing.
But it turns out my reaction was anomalous. The more-common response among instructors, as we begin the first academic year of AI writing’s ubiquity, has been to see in this advance in automated writing the destruction of liberal-arts education. Many are even doing what the political scientist Corey Robin has described in these pages and eliminating take-home essays entirely in favor of in-class examinations.
Something has gone very wrong when the advent of a machine that can produce merely competent essays is causing intelligent and committed educators to give up on assigning substantial student papers, which, as Robin acknowledges, are central to the educational enterprise as we have long conceived it. Whether or not one agrees with the analysis I offer here, everyone should be able to see that a technical innovation on the level of natural language processing should not have the power to destroy an entire sphere of civilized life. Imagine a restaurant, upon discovering the existence of a mechanical dishwasher, deciding to close up shop. One would think that it’s not the dishwasher that’s the problem.
After giving the matter some thought, I believe that two related pre-existing problems in higher education have made a technology that ought to be a useful tool appear to many instructors as an existential threat. The first is the phenomenon of grade inflation. The second is a lack of clarity about what we want from student writing.
In a just world, merely competent essays would not receive a grade above C. If such essays were regularly given the grades they deserve, the incentive for cheating would be vastly reduced, and ChatGPT would represent only a minor addition to the existing resources for cheaters. But if essays like “The Fluidity of Identity in Walt Whitman’s ‘Song of Myself’” get B’s or even A’s — as, to my shame, it may very well have gotten if submitted by a student in one of my own courses — then the incentive for cheating will indeed become so strong as to justify draconian measures. The prospect of a college where everyone has both incentive and opportunity to cheat — and in which no reliable means of detecting the cheating exists — is one that produces a degree without reliable value.
Imagine a restaurant, upon discovering the existence of a mechanical dishwasher, deciding to close up shop. One would think that it’s not the dishwasher that’s the problem.
ADVERTISEMENT
ChatGPT has transformed the problem of grade inflation — which professors have been moaning about for decades — from a minor corruption to an enterprise-destroying blight. It has to stop, and stopping it will require commitment from administrations and national accreditors, so that individual professors or schools that combat grade inflation aren’t penalized.
But combating grade inflation, however difficult in practice, is actually the easier of the two existing problems ChatGPT has exacerbated. More difficult is coming to agreement on what exactly we are evaluating in student papers. It seems to me that every discipline will have to engage in their own process of discerning what they want from student writing, and what — in a world where most low-level cognitive tasks may soon be automated — they want to teach students. Perhaps in some fields professors will decide that writing really isn’t at the core of what they do, that it functions mainly as a means of ascertaining that students have synthesized certain information and can order it in a coherent manner.
Corey Robin, for instance, says that for him writing is fundamentally about “ordering one’s world, taking the confusion that confronts us and turning it into something intelligible.” Given that turning a mass of data into a coherent order seems eminently automatable, and could even serve as the definition of “mere competence” in writing, perhaps his decision to ban out-of-class writing assignments is ultimately the best one for him, or even for his discipline — political science — as a whole. Fields might follow this general principle: If what we value in writing is automatable, if, even after eliminating grade inflation, we must give an A to a merely — or even fantastically — competent paper, then there is no way to prevent cheating, and we should ban out-of-class writing.
ChatGPT has transformed the problem of grade inflation from a minor corruption to an enterprise-destroying blight.
What I most value in student writing, as a professor of literature, is the capacity to notice striking features of literary works, and to connect these features to aspects of life, history, or literature in surprising ways. Readers will immediately notice that my criteria are circular. What is a striking feature? Something that strikes a person. What is surprising? Something that surprises.
Such criteria cannot, even in principle, be reduced to a set of rules, or formalized, or generated — so far as I understand the principles underlying natural language processing — by processing thousands of essays on literature. These criteria depend, as the philosopher of science Michael Polanyi argued, on an embodied background, a set of experiences, values, relationships, and perceptions that constitute what he called “tacit” knowledge, and that enable, for example, a student to notice something important in a poem that has never before been noticed in quite that way.
Another reason I think that existing models of natural language processing will struggle to meet my criteria is the reason that leads the philosopher David Shoemaker to doubt whether AI can be funny. Humor, he argues, depends on a sense of the “unexpected,” which in turn depends on the capacity to inhabit another human’s perspective. One accesses a quality of aesthetic or humorous surprise by imagining what will surprise another person. A student’s ability to judge that a given feature of a poem is powerful, insightful, weird, uncanny, or any other of myriad qualities people find in art, depends on a similar capacity.
ADVERTISEMENT
The criteria for good student writing in literary studies are human criteria. The literary humanities, as I understand and practice them, don’t primarily exist for accumulating, ordering, and expressing knowledge — though certainly that’s part of what they do. Their main value lies in enhancing, intensifying, and expanding human life: refining and enriching our capacity to think, read, perceive, and feel. The promise of AI is that by freeing us from the values of mere competence, we can focus more intentionally on cultivating these distinctively human values.
But, one might ask, once the student has had a surprising insight or made an interesting connection, couldn’t they simply rely on AI to arrange their incoherent thoughts into a coherent piece of writing?
Well, my experience, in both working with student writers and as a writer myself, has been that the development of an insight is often inextricable from the process of expressing it. Writing is the laboratory in which the basic structure of an insight or connection is worked out. Before that process, I personally wouldn’t really know how to instruct the AI to put my idea into prose. Perhaps others are different. Insofar as a student might find an AI tool helpful in arranging her thoughts and polishing her expressions, I’m not sure I’d have a problem with that. It seems exactly the kind of thing such a tool is good for.
Some instructors, misled by the false egalitarianism that has persuaded so many to reject the whole idea of “good” writing, grading instead on “labor,” may ask: What about the students who aren’t able to write papers that meet my criteria? My answer is that, in my experience, about 20 to 25 percent of my students are able to do this, and that’s a pretty good proportion of A’s for a class. More importantly, I believe every student can learn to meet these criteria better — which is, after all, the point of education.
Finally, some readers may believe that the criteria I have advanced for good student writing in literary studies is not, in fact, immune from automation. To this I have two replies. First, I haven’t actually seen any example of AI-generated writing that shows even a trace of any of the qualities I most value in writing. This leads me to believe, with writers like Shoemaker, that the current models lack something important that human beings possess.
But, second, it seems entirely possible to me that one day new models of artificial intelligence will emerge, or current models will be refined in such a way as to attain these capacities. Perhaps one day a machine will, without human assistance, be able to write an essay perceptive and surprising enough to get an A in my class. I don’t look on such an event as a disaster — quite the contrary. I will be very interested to learn what surprises an entirely new kind of critical mind about Emily Dickinson’s “I Heard a Fly Buzz — When I Died.” But it seems to me that such an evolved intelligence will expect, and deserve, the right to think its own thoughts and do its own work; it won’t be content to help us with ours.
Michael Clune is a professor of English at Case Western Reserve University. His most recent book, A Defense of Judgment, was published by the University of Chicago Press.