Student evaluations of my spring English composition course arrived the other day, and I read them with the usual stew of satisfaction, frustration, and puzzlement. The numbers looked good, and I heard plenty of praise. But, as always seems to happen, the same teaching technique inspired completely contradictory reactions.
For example, while I don’t show too many PowerPoints in English composition, every time I introduce a new writing strategy I’ll throw up a few slides with definitions and examples. One student praised the slides as the most helpful tool for her learning — another called them useless and recommended I eliminate them from the course.
Contradictory statements about my methods I can handle; I’ve seen them all before. But this round of reviews also included a comment from a student who said I didn’t “connect well” with the class. This was a community-engaged learning course in which we took a field trip together to a homeless shelter, spent lots of time in group activities in class, and shared personal perspectives on our understanding of poverty. I also followed my own advice and made a point to arrive in the classroom a few minutes early and engage in informal conversations with students.
So that student’s comment — although the only one of its kind in this crop of evaluations — is very likely to spend the entire summer sticking in my craw.
I’m a full professor at a college without merit pay, which means that negative comments like that one don’t have many practical consequences for me, beyond making me wonder whether I will ever have a single semester in which I get teaching right. I’m also a white male in his late 40s, which means that I am usually spared pointed comments about my wardrobe, my voice, or my persona that routinely pepper the course evaluations of female and nonwhite faculty members, as plenty of research has documented.
Over the years, a number of factors — contradictory criticisms, bias against vulnerable instructors, inconsistent response rates — have all been adduced as evidence for why academe should reduce the outsized role that student ratings and comments play in the evaluation of teaching.
This past year, I served on the promotion-and-tenure committee at my college and spent hours flipping through student evaluations of my colleagues’ teaching. I saw clear evidence of all the well-documented problems with this form of evaluation, including bias against certain groups of faculty members.
As a teaching-oriented college, we have a fairly comprehensive system for assessing a professor’s classroom performance. We consider a mix of evidence and don’t rely too heavily on course evaluations. And yet, in every case we evaluated, our committee’s report included at least some mention of the faculty member’s course evaluations — usually to add some extra evidence to a positive point we were making about the candidate’s teaching. But in every case, we did read and discuss the faculty member’s student evaluations.
My time on the tenure committee and my own experience with course evaluations have left me plenty skeptical about their prominent role in the evaluation of teaching. But in May I started questioning another common method of evaluation, when I attended a thought-provoking session at the University of Missouri’s annual Celebration of Teaching, now in its 10th year on the Columbia campus. I gave a talk at the event, which includes lectures, workshops, panels, and poster sessions on all aspects of pedagogy.
One of the workshops I attended, on defining and evaluating teaching effectiveness, kicked off with a presentation on the importance of using “multiple measures” to assess faculty performance. It made me question a practice that many institutions seem to have adopted in response to the problems with student evaluations — supplementing them with peer observations of teaching.
In her talk, Emily Miller, associate vice president for policy at the Association of American Universities, highlighted student evaluations and peer observations as two of the most commonly used measures of teaching effectiveness, and outlined the problems with relying too heavily on either of them.
Serving on the tenure committee, I made plenty of visits to my colleagues’ classrooms, where I perched in the back row taking notes on what I observed. But according to Miller, peer observations are problematic, too: They are subject to the same potential biases as course evaluations and represent only a very thin slice of someone’s teaching.
To illustrate that point, she walked us through a thought-provoking exercise that demonstrated in sharp terms why student evaluations and peer observations should be considered within the context of a host of other measures. “I want you to make a list,” Miller said, “of all of the different things that you do each week in support of your teaching. Don’t just think about being in class. Think of all of the other activities you do each week that relate to your teaching.”
Here is what I jotted down:
- Read for class (for the composition course I just taught, I had to read four assigned books and some additional online essays).
- Prepare my lesson plan.
- Arrange a class visit to the homeless shelter.
- Do background research on the subject we were discussing.
- Grade writing exercises.
- Create assignment sheets.
- Meet with students.
- Grade papers.
- Respond to their emails.
As the list slowly grew under my pen, the point of the exercise became abundantly clear: Much of the work of teaching — perhaps most of it — takes place outside of the classroom.
Scanning my own list, few of those items would be visible to a colleague who observed my teaching. Some of the behind-the-scenes work of teaching would be visible to my students — after all, they see my assignment sheets and my feedback on their work — but by no means all of it, or even most of it.
Further, as Miller’s talk made clear: Much of the work that we put into our teaching cannot be evaluated, or even accessed, via the two most common strategies that institutions use to evaluate our teaching effectiveness of their faculty: student evaluations and peer observations.
Whether you are a faculty member reading this or an administrator, I would encourage you to repeat Miller’s exercise for yourself or encourage your faculty to do so. Affirm on your campus the need to measure teaching effectiveness in multiple ways. Make sure that message is clear to the people who most need to hear it — that is, those with the authority to hire and fire, promote or deny, reward or punish.
Ideally, your institution (or department) will respond to this exercise with a desire to create a comprehensive and fair means of evaluating teaching. In her talk, Miller offered three models to consider:
- Three research institutions — the Universities of Massachusetts, Colorado, and Kansas — have created the TEval project, to explore ways to evaluate teaching at the sort of campus where it can get short shrift, compared with research. The group’s website includes links to research on teaching evaluation, as well as rubrics and models that could be adapted at other campuses.
- In Britain, the Royal Academy of Engineering has developed the “Career Framework for University Teaching.” It includes a category called “Spheres of Influence,” to assess the impact of good teaching practices in a way that parallels how we evaluate the reach of our scholarship. Excellent teachers, in addition to making an impact on the lives of their students, can influence their colleagues across academe, and contribute to the advancement of teaching. The framework provides guidance in acknowledging that important work.
- If your campus uses peer observation of teaching, the Faculty Innovation Center at the University of Texas at Austin offers a robust set of resources for understanding how to do it well.
Of course any research university looking to improve on this front could look to a teaching-intensive college like my own. Our sector of academe features plenty of models for evaluating teaching in a comprehensive way.
My stint on the tenure committee involved more than just flipping through course evaluations and observing my colleagues in their classrooms. I spent far more time reviewing their syllabi and other course documents, reading carefully the long self-evaluations they wrote about their teaching practices, paging through letters of reference and testimonials from current and former students, and more.
But even at teaching-intensive colleges like mine, just piling on lots of documentation to the process doesn’t resolve all of the challenges raised by the attempt to evaluate teaching effectiveness. Evidence doesn’t speak for itself, after all — it needs informed experts who can analyze and understand what the data means. What story does the evidence tell about the teacher’s work? What does it show about how much students have learned?
Understanding how to gather and evaluate evidence of good teaching strikes me as a fundamental and ongoing challenge for all of higher education. Very few academic administrators or tenure-committee members will bring to those roles professional training or scholarly backgrounds in the evaluation of teaching — or in the practice of teaching, for that matter.
Part of the process ought to include training people in how to assess teaching fairly, or we risk basing promotion decisions on the classroom preferences or gut instincts of the evaluators.
It takes time to evaluate teaching well — and time usually requires financial investment. Those are significant obstacles, and they won’t be overcome unless academe is willing to set aside its reliance on easy but dubious methods and take the evaluation of good teaching seriously.
James M. Lang is a professor of English and director of the D’Amour Center for Teaching Excellence at Assumption College in Worcester, Mass. He is the author of Small Teaching: Everyday Lessons From the Science of Learning. Follow him on Twitter at @LangOnCourse.