Why It Makes Sense for Students to Grade One Another’s Papers

By the time this post appears, the first peer-graded assignment in Cathy Davidson’s Coursera MOOC, “History and Future of (Mostly) Higher Education,” will have come and gone, and students will be well into the second. Unlike programming projects, algebra exercises, and multiple-choice questions that can all be reliably graded by a computer, Coursera offloads the task of evaluating essays to students. After the deadline for an assignment has passed, students have a week to evaluate five of their classmates’ essays using a rubric developed by the teaching staff. A student who fails to evaluate his or her classmates does not get a grade for the assignment, and in our course will not be able to achieve the statement of accomplishment “with distinction.” Whether students see that as a chore, duty, or opportunity, the necessary assessment is eventually done—for better or for worse.

Peer grading can be a controversial proposition. When students’ scholarships and internships are riding on their grades, it isn’t surprising that they hesitate to allow their classmates—who know as much as they do about the course material—to have any effect on their final assessment. Instructors scoff at the idea that students can be left to evaluate one another, certain that they will collude so that everyone will receive an A without doing any of the work. In its worst incarnation, peer grading can be a scheme for lazy professors to offload on students the boring work of assessment.

With all of those concerns, one might wonder why we would ever want to try peer grading, but from a logistical point of view, it makes plenty of sense. The major benefit is that it provides quick feedback. Feedback that isn’t timely is next to useless, and even in a traditional classroom, the time it takes an instructor to produce and return feedback to students can vary widely, depending on the instructor’s workload. When it takes one or two weeks to return feedback, a system in which students go through multiple rounds of drafts and revisions simply isn’t feasible.

Peer grading has been used in software systems like SWoRD and Expertiza (developed by my Ph.D. co-adviser), so that students can go through multiple rounds of revisions interleaved with rapid feedback cycles, encouraging higher-quality final submissions that scale with the size of the class. Students get more reviews, more rapidly, from more points of view. Not only are the reviewees getting extra feedback; the act of reviewing itself has metacognitive benefits because students get to see other submissions of varying quality and have to articulate in their feedback what the other students are doing wrong.

Critics have questioned whether student-assigned grades can be consistent or valid, but numerous studies have found that such concerns are unlikely to be an issue in practice. While it is true that students lack the nuance that an expert grader can provide, most student-assigned scores hover around the scores given by experts on the same papers. Some students may be harsher or easier graders than others, but they often apply those biases consistently enough for them to be normalized (of course, such biases apply to instructors as well).

Daphne Koller, a co-founder of Coursera, presented some of the exploratory results of the first peer-reviewed Coursera assignments when she spoke at Duke in the fall of 2012. At a glance, the findings appeared to replicate those earlier studies, hinting that the results are consistent even on a massive scale (though I haven’t seen anything published on it yet).

Peer grading isn’t a silver bullet and doesn’t work by magic. Research shows that successful peer grading arises only from a well-articulated grading philosophy, training for the would-be reviewers, and high-quality rubrics that very clearly show what’s right, what’s wrong, and why that is. Without careful planning and scaffolding, it comes across as a half-hearted attempt to reduce the tedium of grading. Peer grading already starts at a disadvantage from having to compete with the internalized expectations of how authority in the classroom should be distributed.

In the Coursera courses I’ve taken, peer-graded essays have always been extra credit because so few students and teachers have taken the system seriously. How do we build trust in a peer-grading culture? How do we get students and teachers to unlearn the paradigm of the instructor as the sole authority on what’s right and wrong? As a computer scientist, I can build the tools and run the numbers, but until the attitudes toward crowdsourced learning change, they will remain an extra requirement for a “with distinction” label.

Return to Top