The best way to eliminate grade inflation is to take professors out of the grading process: Replace them with professional evaluators who never meet the students, and who don’t worry that students will punish harsh grades with poor reviews. That’s the argument made by leaders of Western Governors University, which has hired 300 adjunct professors who do nothing but grade student work.
“They think like assessors, not professors,” says Diane Johnson, who is in charge of the university’s cadre of graders. “The evaluators have no contact with the students at all. They don’t know them. They don’t know what color they are, what they look like, or where they live. Because of that, there is no temptation to skew results in any way other than to judge the students’ work.”
Western Governors is not the only institution reassessing grading. A few others, including the University of Central Florida, now outsource the scoring of some essay tests to computers. Their software can grade essays thanks to improvements in artificial-intelligence techniques. Software has no emotional biases, either, and one Florida instructor says machines have proved more fair and balanced in grading than humans have.
These efforts raise the question: What if professors aren’t that good at grading? What if the model of giving instructors full control over grades is fundamentally flawed? As more observers call for evidence of college value in an era of ever-rising tuition costs, game-changing models like these are getting serious consideration.
Professors do score poorly when it comes to fair grading, according to a study published in July in the journal Teachers College Record. After crunching the numbers on decades’ worth of grade reports from about 135 colleges, the researchers found that average grades have risen for 30 years, and that A is now the most common grade given at most colleges. The authors, Stuart Rojstaczer and Christopher Healy, argue that a “consumer-based approach” to higher education has created subtle incentives for professors to give higher marks than deserved. “The standard practice of allowing professors free rein in grading has resulted in grades that bear little relation to actual performance,” the two professors concluded.
Naturally, the standard grading model has plenty of defenders, including some who argue that claims of grade inflation are exaggerated—students could, after all, really be earning those higher grades. The current system forges a nurturing relationship between instructor and student and gives individualized attention that no robot or stranger could give, this argument goes.
But the efforts at Western Governors and Central Florida could change that relationship, and point to ways to pop any grade-inflation bubble.
An Army of Graders
To understand Western Governors’ approach, it’s worth a reminder that the entire institution is an experiment that turns the typical university structure on its head. Western Governors is entirely online, for one thing. Technically it doesn’t offer courses; instead it provides mentors who help students prepare for a series of high-stakes homework assignments. Those assignments are designed by a team of professional test-makers to prove competence in various subject areas.
The idea is that as long as students can leap all of those hurdles, they deserve degrees, whether or not they’ve ever entered a classroom, watched a lecture video, or participated in any other traditional teaching experience. The model is called “competency-based education.”
Designers of Western Governors do not intend to compete with Harvard or any other traditional institution. The online university throws a lifeline to nontraditional students who can’t make it to those campuses.
Ms. Johnson explains that Western Governors essentially splits the role of the traditional professor into two jobs. Instructional duties fall to a group the university calls “course mentors,” who help students master material. The graders, or evaluators, step in once the homework is filed, with the mind-set of, “OK, the teaching’s done, now our job is to find out how much you know,” says Ms. Johnson. They log on to a Web site called TaskStream and pluck the first assignment they see. The institution promises that every assignment will be graded within two days of submission.
Emily L. Child is one of the evaluators. She’s a stay-at-home mother of three who lives near Salt Lake City. Her kitchen table is her faculty office. She grades 10 to 15 assignments per day, six days a week, working early in the morning, before her kids are up, or in the afternoon, while they nap. She estimates that she has graded 14,400 assignments in the six years she has worked for the university.
Western Governors requires all evaluators to hold at least a master’s degree in the subject they’re grading. Ms. Child, a former teacher, grades assignments only in the education major. A typical assignment (the university calls it a “task”), she says, involves a student’s submitting a sample lesson plan or classroom strategies. “I enjoy that it allows me to stay current as a teacher,” she says.
Evaluators are required to write extensive comments on each task, explaining why the student passed or failed to prove competence in the requisite skill. No letter grades are given—students either pass or fail each task. Officials say a pass in a Western Governors course amounts to a B at a traditional university.
All evaluators initially receive a month of training, conducted online, about how to follow each task’s grading guidelines, which lay out characteristics of a passing score.
The identities of the evaluators are kept hidden from students, and even from the mentors. The goal is to protect the graders from students nagging them about grades, or from mentors who might lobby to pass a borderline student to better reflect on their teaching.
The graders must regularly participate in “calibration exercises,” in which they grade a simulated assignment to make sure they are all scoring consistently. As the phrase suggests, the process is designed to run like a well-oiled machine.
Some evaluators have objected to the system at first, says Ms. Johnson, especially professors who have come from traditional higher education. Some insist that they don’t need to justify each grade they give, arguing that they know a passing assignment when they see it. “That’s hogwash,” she says. “If you know it when you see it, then tell us what it is you see.”
Other evaluators want to push talented students to do more than the university’s requirements for a task, or to allow a struggling student to pass if he or she is just under the bar. “Some people just can’t acclimate to a competency-based environment,” says Ms. Johnson. “I tell them, If they don’t buy this, they need to not be here.”
Even Ms. Johnson had to be convinced when she started out at Western Governors, after having taught school and helped to develop instructional standards for the Utah State Office of Education. “I was an academic snob,” she says, noting that she took a position at the university because she needed a job. “As I was going through their training, I began to think, Oh, my gosh I think they have something here.”
Besides aiming for validity in grading, the Western Governors infrastructure is designed to stretch or compress like an accordion: part-time graders can be added if enrollment spikes, or graders can work fewer hours if enrollment drops. In the past two years, the student population has grown from 14,200 to more than 25,000, and the evaluators have graded more than 1.1 million assignments.
“Tuition at Western Governors hasn’t gone up in four years,” Janet W. Schnitz, the interim provost, told me proudly.
Meet the Robots
Pam Thomas, an instructor of biology at the University of Central Florida, decided to try robot grading because she loves to teach large classes—the more students in the lecture hall, the better. She had more than 1,000 in her “General Biology” course last spring, and she wanted to give them assignments more challenging than “punching buttons on multiple-choice tests.”
When she announced to her class that software would automatically grade the essay tests, many students were wary. “The students said, I’m being graded by a robot?” she remembers. “I said, Anybody who doesn’t get a 100, I will look at a machine, and I will see if the machine made a mistake.”
Some students did challenge their scores, but in most cases the computer was proved correct.
Then Ms. Thomas performed an experiment that she hopes her students never find out about. She and some teaching assistants scored the tests by hand and compared their performance with the computer’s. “The TA’s were way far off on inter-rater reliability,” she says, referring to whether different people scored the same test consistently. The graduate students became fatigued and made mistakes after grading several tests in a row, she told me, “but the machine was right-on every time.”
Setting up the software takes some doing, she says. She had to carefully lay out what constituted a correct response, so the machine knew what to look for.
I called up Idea Works, the company that makes the software, which is called SAGrader. Apparently the program is a tough sell, even after six years on the market. “People are a little skeptical that what we’re doing is possible,” says Colin Monaghan, product manager for SAGrader. Only a couple of colleges are testing it, including Park University and the University of Missouri at Columbia. “It’s pretty slow going.”
The program was developed by a Missouri professor, Edward E. Brent, who has a joint appointment in the sociology and computer-science departments. He argues that students like the idea that their tests are being evaluated in a consistent way. “We have a number of courses at my institution that have multiple sections of the same course, and differing graders are grading the same assignment,” he says. “With the SAGrader program, the students can be asssured that it’s consistent across sections, so it’s not like their TA likes them better or something like that.”
Another selling point is the software’s fast response rate. It can grade a batch of 1,000 essay tests in minutes. Professors can set the software to return the grade immediately and can give students the option of making revisions and resubmitting their work on the spot.
“Usually in school you turn something in and you never see it for two weeks, and then you’re like, Oh, I remember that,” says Mr. Monaghan. He said once students get essays back instantly, they start to view essay tests differently. “It’s almost like a big math problem. You don’t expect to get everything right the first time, but you work through it.”
Even if adoption is sluggish, robot grading is the hottest trend in testing circles, says Jacqueline Leighton, a professor of educational psychology at the University of Alberta who edits the journal Educational Measurement: Issues and Practice. Companies building essay-grading robots include the Educational Testing Service, which sells e-rater, and Pearson Education, which makes Intelligent Essay Assessor. “The research is promising, but they’re still very much in their infancy,” Ms. Leighton says.
Skepticism is still the most common response, though.
“I cannot imagine outsourcing the grading,” says Angela Linse, executive director of the Schreyer Institute for Teaching Excellence, at Pennsylvania State University. “The public already thinks that faculty have cushy jobs. I cannot imagine what would happen if we outsourced grading.”
She is among those who feel that grade inflation is not as bad as critics describe it. “From my experience, faculty are very diligent about grading fairly and consistently,” she says. And she argues that most professors don’t give out high grades just to win positive student evaluations. “I say, Have you ever had the student ratings where you see the ratings are high but you know that faculty member is not at the top of their game? I’ve never seen a person say they’ve seen that.”
Ms. Linse does encourage professors to take some advice given to the graders at Western Governors: Try to write out rubrics for assignments and show them to students, to be clear about what is expected.
“Do I think faculty members need to be psychometricians? No,” she says, referring to professional test-makers. “But our workshops on creating an effective test and designing your test are always overflowing.”