Why Well-Formed Nonsense Doesn’t Matter

I’d like to add one more point on the topic of my post “Computer Says B-Plus.” My modest suggestion was this: If a computer program trained on essays graded by humans could learn enough about the superficial form of academic prose to reliably assign suitable grades to newly presented essays (where “suitable” means “close to what a qualified human grader would have assigned”), that could put a useful tool in the hands of a diligent student who wanted to get anonymous, private, and patient assessments of essay drafts before handing in the final product.

A commenter named Maureen Greenbaum made a similar suggestion earlier, back in April, when discussing an article on Les Perelman’s skepticism about computer grading:

Students respond to feedback when it is immediate and actionable. A computer "graded essay" can be corrected and resubmitted for more feedback right away. Community-college students desperately need more writing practice. There is no way for a human alone to correct and, or importantly correct, re-correct, and re-re-correct the number of essays that could be assigned if faculty accepted the help of the computer. Students need to write more in English 101 and developmental English and history and anywhere that writing can be practiced.

That’s what I had in mind: a chance for keen students to write, correct, submit, re-edit, re-submit, and so on, getting confidential and neutral feedback, until they had achieved (at least to the satisfaction of the computer program, with its limited abilities) something like properly spelled and punctuated English with appropriate sentence lengths, paragraphing, capitalization, and so on.

That leaves us with questions like this one (from a comment on my post by eulerian_ta):

What happens when students figure out that they can get A’s simply by putting together a bunch of long, nonsensical sentences with lots of SAT words? The problem with automated essay grading … it isn't actual grading.

And this, from another commenter called aicaiel:

A computer program cannot judge the effectiveness of writing—whether an argument is convincing, or research is sufficiently comprehensive, or a personal narrative communicates the poignancy of an experience.

These are related to Perelman’s objections—though he underlined them by programming a computer to generate random junk essays that get good machine-determined grades. Do not misunderstand me: I agree that computers cannot do “proper grading” in the sense these commenters are worried about. Essay-grading programs can be fooled (at least at present) by maliciously constructed nonsense. And certainly they cannot evaluate convincingness or poignancy the way a human could.

However, I want to argue that in practice these points will hardly matter at all.

Virtually no students will be trying to write useless gibberish with which they can fool the grading machine (except perhaps after hours, as a hobby). After all, when they come to turning in their final draft to the human in charge of the course (which is what I assume will happen), they won’t want to turn in superficially irreproachable but meaningless word salad.

Typical students instructed to write an essay will always try to write meaningfully on a topic. Humans find it extremely hard to do otherwise. In the same way that students are drawn like moths to the flame of meaning when one is trying to teach them to look at syntax and ignore semantics, they are ineluctably drawn to content rather than form when they compose text.

Inventing grammatically well-formed sentences with arbitrary meanings just to illustrate a syntactic point is something researchers in syntax have to do every day, but it is extraordinarily hard for most people. (Try it. Try writing a 100-word paragraph that has no grammatical mistakes but displays no hint of sensible meaning.) We do not need encouragements for students to tell a story or express their opinions about the topic; they will almost always be striving to do that. The problem is getting them to pay attention to orthographic, syntactic, and presentational form.

Only a human can assess cogency, relevance, balance, coherence, empathy, insight, and other such subtle properties. But how much easier the job of the human TA or writing tutor or professor would be if the essays submitted were not just compliant with length requirements, and free from plagiarism (another thing that computers are good at checking), but also, to the satisfaction of our most ingenious algorithms, framed in literate prose.

As the technology of fully automatic essay grading slowly improves, I suggest that we shouldn’t just throw up our hands in horror and rejection (as Kathleen Anderson does). Despite its inherent limitations, machine grading could be, when sensibly deployed, a useful tool for our students.

Return to Top