Under the mandate of a recently enacted state law, the Web sites of public colleges and universities in Texas will soon include student-evaluation ratings for each and every undergraduate course. Bored and curious people around the planet—steelworkers in Ukraine, lawyers in Peru, clerical workers in India—will be able, if they’re so inclined, to learn how students feel about Geology 3430 at Texas State University at San Marcos.
But how should the public interpret those ratings? Are student-course evaluations a reasonable gauge of quality? Are they correlated with genuine measures of learning? And what about students who choose not to fill out the forms—does their absence skew the data? Two recent studies shed new light on those old questions.
In one, three economists at the University of California at Riverside looked at a pool of more than 1100 students who took a remedial-mathematics course at a large university in the West (presumably Riverside) between 2007 and 2009. According to a working paper describing the study, the course was taught by 33 different instructors to 97 different sections during that period. The instructors had a good deal of freedom in their teaching and grading practices—but every student in every section had to pass a common high-stakes final exam, which they took after filling out their course evaluations.
That high-stakes end-of-the-semester test allowed the Riverside economists to directly measure student learning. The researchers also had access to the students’ pretest scores from the beginning of the semester, so they were able to track each student’s gains.
Most studies of course evaluations have lacked such clean measures of learning. Grades are an imperfect tool, as students’ course ratings are usually strongly correlated with their grades in the course. Because of that powerful correlation, some studies have suggested that course evaluations fuel a vicious cycle of grade inflation. (Patrick Moore, an associate professor of English at the University of Arkansas at Little Rock, argued in a 2009 essay that higher education has been corrupted by a Law of Reciprocal Grading—"If you give me a high grade, I will give you a high course evaluation.”)
But the remedial-math course analyzed in the Riverside paper largely avoids that syndrome. Whether a student takes the course with a lenient grader or a strict one, the same high-stakes exam awaits at the end of the semester.
What Seems to Matter
The Riverside economists discovered a small but statistically significant positive relationship between students’ course-evaluation ratings and their learning gains from the pretest to the final exam. The effect size was far from earthshaking: a one-standard-deviation increase in student learning was associated with a 0.05-to-0.07 increase in course-evaluation scores, on a five-point scale. But it suggests that student course evaluations might actually sometimes contain meaningful signals about the quality of teaching and learning.
The Riverside economists also broke down the individual components of the course-evaluation form to see which scores were most significantly associated with student learning. The three most-powerful predictors of learning were students’ levels of agreement with these statements: “The instructor was clear and understandable,” “The supplementary materials (e.g., films, slides, videos, guest lectures, Web pages, etc.) were informative,” and “The course over all as a learning experience was excellent.”
The least predictive evaluation questions, by contrast, were “The syllabus clearly explained the structure of the course,” “The exams reflected the material covered during the course,” and “The instructor respected the students.”
Even though the Riverside scholars found a modest positive correlation between course evaluations and learning, their study is unlikely to quiet the concerns of people like Mr. Moore, at Arkansas. The students’ grades were more strongly correlated with their course ratings than their learning was—so if the instructors in the remedial-math course are rewarded or punished for their course-evaluation scores, they may indeed feel pressure to inflate students’ grades in ways that do not reflect learning. A much-discussed study this year, which also benefited from an unusually clean data set, found that student course ratings were negatively correlated with “deep learning"—that is, with their performance in subsequent courses in the same department.
But the Riverside study does offer hope that well-designed evaluation forms might actually capture important information about student learning. (Back in April, The Chronicle looked at several innovative systems that aspire to ask wiser questions than traditional course-evaluation forms do.)
The Riverside working paper is titled “Do Course Evaluations Reflect Student Learning? Evidence from a Pre-Test/Post-Test Setting.” Its authors are Mindy S. Marks, an assistant professor of economics at Riverside; David H. Fairris, a professor of economics and vice provost for undergraduate education there; and Trinidad Beleche, who completed a doctorate in economics at Riverside this year.
Patterns of Nonparticipation
The other recent paper, which was presented last month at the annual meeting of the Association for the Study of Higher Education, examines which students choose not to fill out course evaluations, and why.
Those are longstanding questions in the study of course evaluations. If a significant number of students never fill out the forms (and the nonresponse rate is growing as universities move toward online surveys), does their absence skew the average scores for a given course? Or are the responders and the nonresponders essentially similar groups?
The authors, who are based at North Carolina State University, looked at the experiences of more than 20,000 students at the end of a recent semester “at a four-year research university in the Southeast.”
The overall response rate for student course evaluations was slightly more than 50 percent, and the patterns of nonresponse were not random. The researchers discovered the following:
- Students who earn D’s and F’s in a course are 23 percent less likely than others to fill out the course’s evaluation forms.
- Students’ response rates are six percentage points higher for courses in their majors than for those outside their majors.
- Response rates varied significantly by major. Students in so-called “realistic majors,” which include biology and computer science, were much more likely to fill out evaluation forms than were students in “social majors,” which include communications and psychology.
- Students appeared to exhibit “survey fatigue.” If they received 11 or more online surveys (including course evaluations) from the university in a semester, their response rates tended to decline.
All of those factors, the authors write, mean that it is important to be cautious about interpreting average evaluation scores for a particular course—the kind of scores that will soon be appearing by the thousand on Texas universities’ Web sites.
The paper’s title is “Who Doesn’t Respond and Why? An Analysis of Nonresponse to Online Student Evaluations of Teaching.” The authors are Meredith J.D. Adams, a teaching assistant professor of education at North Carolina State, and Paul D. Umbach, an associate professor of higher education there.