In the fall of 2019, while compiling my files for promotion, I spent some time looking over my teaching evaluations from the previous three years. I was a bit shook. One student complained that I came into class wearing a jacket and that, by taking the jacket off in front of the students, it was as if I was undressing in front of them. Other narrative assessments, while less bizarre, were no more on topic. Although I received mostly nines out of 10 for a range of standards, the narrative portions of the evaluations were largely personal, free of any discussion of the course and its materials. Talking this over with my colleagues, I learned that what I was seeing had been much worse for my Black and Asian colleagues, particularly if they were women.
So I raised the matter in a faculty meeting. “How can we ethically, much less legally, use student evaluations as a basis for our merit reviews?” As usual, be careful raising an issue in a faculty meeting: I was asked to come back with research on the matter.
I found almost 80 peer-reviewed papers demonstrating the gender and racial bias afflicting teaching evaluations, going back to 1979. Study after study showed increasingly disturbing statistics: Women were routinely rated lower than men, younger women were evaluated as less professional than their older female or male counterparts, women of color were rated as less effective than white women, and so on.
A few particularly stunned me. In a meta-analysis of 126 data-rich studies, Rebecca Kreitzer and Jennie Sweet-Cushman (2021) were able to find that — surprise! — evaluations were higher for courses with less work, for electives, and for classes where cookies or chocolate were provided to students. Bob Uttl, Carmela A. White, and Daniela Wong Gonzalez (2017) reviewed previous studies and reanalyzed their data to show “no significant correlations between … ratings and learning.” These biased and ineffective tools have been affirmed by professional organizations and highly ranked universities.
In 2019, the American Sociological Association issued a formal recommendation to cease using student evaluations for merit and promotion decisions unless part of a much broader (“holistic”) assessment. In reaching its conclusion, the ASA not only identified bias based on gender and sex, but also on such seemingly innocent factors as the time of day a course was taught.
At least a few colleges have taken steps to mitigate these problems. The University of Southern California no longer includes student evaluations as an element of tenure and promotion. If we assume that other professionals and experts in our respective fields should help articulate the value of our scholarly work, why then defer our teaching evaluations to students who have no training in pedagogy, much less knowledge of what constitutes effective teaching within our chosen areas? The University of Oregon, the University of Nebraska at Lincoln, and others have already established that student evaluations may be useful but should be combined with other forms of assessment in personnel decisions. If you are considering raising this issue on your campus, consider that the ASA statement has received nearly two dozen endorsements by professional organizations.
If colleges continue to use student evaluations of teaching as a basis for merit and promotion, they may be headed toward legal trouble.
If colleges continue to use student evaluations of teaching as a basis for merit and promotion at all, they may be headed toward legal trouble. In 2009, the faculty association at Ryerson University (now Toronto Metropolitan University) disputed the value of student evaluations as the basis for promotion. After a full review of the evidence, an arbitrator, William Kaplan, saw “serious and inherent limitations” of the evaluations by students. He wrote that such evaluations are “imperfect at best and downright biased and unreliable at worst.”
In the case of my own department, when I presented the research that I had completed, it took about 20 minutes for us to decide that we did not want to put our junior faculty in the position of depending on student evaluations for their job security. We also did not want to put ourselves in a legal quandary. Although we cannot prevent the university from soliciting students’ evaluations — and some of us find them quite useful in rethinking our assignments, lecture style, and course content — we now enable faculty members to use peer-assessment and self-evaluations, including documented revisions to pedagogical statements. We are fortunate at the University of California at Los Angeles to have a formal system for pedagogical evaluation by our colleagues (we call it “peer-assisted reflections on student learning”) to help our instructors build summative teaching portfolios. We have resolved to keep student evaluations firmly in their place.