Computer Says B-Plus

The mouth-filling abuse of Kathleen Anderson’s post on automatic grading (“Betray Our Students for Publisher’s Profit?”) is such a delight to read that I’m almost sorry to confess that I disagree with her.

Anderson was approached by an educational publisher’s representative about a plan to (i) gather a corpus of several thousand student essays, (ii) hire experienced instructors to grade them, and then (iii) apply machine-learning techniques to train a computer program that will grade further essays automatically to similar standards.

Anderson recoils. Assisting “the unethical development of this unethical product” would be involvement in “the creation of Frankenstein’s latest techno-monster,” she says. The $350 stipend offered, she calls a “bribe.” Even just collecting the essay corpus would mean “violating our students’ privacy and right to ownership of their intellectual property.” She envisions with horror the human essay-evaluators “grading the thousands of essays in auto-program hatcheries while fed a steady supply of soma,” and urges that this “auto-generated wolf in surrogate-cloned-sheep’s clothing” must be killed before it gets loose.

The polemic is deliciously over-the-top. And effective: Just reading it, I almost felt like grabbing a sledgehammer to smash machines.

Yet she offers no rational argument against automated essay grading.

Figuring out how likely specific symbol sequences are to belong to a certain category is a well-understood computational problem. Take the question, “How likely is it that this new crime novel belongs to the category exemplified in the prose of the Harry Potter books?” Computers can address and even answer such questions.

In other contexts we welcome such techniques. I’ll bet Kathleen Anderson is glad that her email program has anti-spam capabilities. Modern spam filters usually exploit a theorem due to the Rev. Thomas Bayes (1701–1761), which established how the probability of truth for a hypothesis given certain observed facts relates to the probability that the observations would have looked like that if the hypothesis were true. What are now called Bayesian methods permit a program, given access to a corpus of known spam, to take an input message and answer the question: “How likely is it that this message is spam?”–and dump it in the spam folder if the probability exceeds a set threshold. It’s spectacularly successful technology: I hardly ever look at my Spam folder now, and hardly ever see spam in my inbox.

So now consider a different but analogous question: “How likely is it that this essay belongs to the category of essays that get an A grade, given the properties seen in several thousand essays known to be deserving of an A grade?”

Computer programs using Bayesian reasoning can tell with a high degree of accuracy whether a text resembles excellent, pretty good, mediocre, or illiterate essays, via a sort of statistical approximation to the look and feel of English prose. But if you don’t believe me, never mind, we can use a Gedankenexperiment: Just imagine that a program capable of this were available to you. Why, exactly, would you object to its use?

No one is suggesting that its decisions will justify eliminating human instructors. Anderson sarcastically assures us that “surrendering one’s professional responsibilities will also be good practice for the day when professors will be entirely replaced by computers,” but that’s nonsense: Professors are not going to be superseded by Bayesian classification algorithms.

Likewise, Anderson’s concern about “violating our students’ privacy and right to ownership of their intellectual property” is just silly: You don’t lose any privacy or intellectual property just because some of your prose is anonymously stored among a set of sample essays of a certain quality level, any more than when Gmail uses properties of your emails to assess which advertisements might interest the recipient.

And no one imagines that machines can evaluate quality of argumentation, or even grammar in any serious sense: Programs of the relevant sort will attend only to the general characteristics of the letter sequences in the essay. Remarkably, this is sufficient for doing consistent grading that mostly agrees with human graders.

So think about this very modest proposal. Literally millions of new freshmen will enter our universities this fall. They will nearly all need writing practice and instruction–more than is affordable. Suppose they could get rapid and confidential feedback on their draft essays from a computer program that was never unfair or grumpy or prejudiced or tired, and gave them carefully computed measurements of the characteristics of their essays, based on properties of typical essays that get high grades. Why would you be against this, given that no one is suggesting you should use anything other than your own judgment when awarding final grades in your courses?

If we are going to reject in advance the very idea of machine grading, we need serious arguments, not just emotional reactions and shuddering allusions to Brave New World.

Added later: See Are We Ready for Robots to Grade?–a recent Chronicle article with positive things to say about an experiment with machine grading of essays in a pharmacology MOOC.

Return to Top