When Phoebe Young began working at the University of Colorado at Boulder as an assistant professor of history in 2009, her annual teaching reviews were fairly perfunctory. Everyone knew, she says, that student course evaluations were essentially popularity contests. But they were also the only measure used to determine merit raises. You got more money if you were above the average, and less money if you were below it. Professors would often strategize on how to increase their scores — say, bringing in donuts while students filled out the forms.
We’re sorry, something went wrong.
We are unable to fully display the content of this page.
This is most likely due to a content blocker on your computer or network.
Please allow access to our site and then refresh this page.
You may then be asked to log in, create an account (if you don't already have one),
or subscribe.
If you continue to experience issues, please contact us at 202-466-1032 or help@chronicle.com.
When Phoebe Young began working at the University of Colorado at Boulder as an assistant professor of history in 2009, her annual teaching reviews were fairly perfunctory. Everyone knew, she says, that student course evaluations were essentially popularity contests. But they were also the only measure used to determine merit raises. You got more money if you were above the average, and less money if you were below it. Professors would often strategize on how to increase their scores — say, bringing in donuts while students filled out the forms.
Several of the questions were problematic, she recalls, including a “super weird” one that asked students to rate the intellectual challenge of the course. The most impenetrable professors might score the highest because that’s how some students interpreted the question. Young scored low on that question, she believes, because she always tried to make her courses accessible.
When a faculty member came up for tenure, the process was marginally better. The department would scramble to find a colleague or two to sit in on a class. The letters they wrote were all roughly the same, she recalls. A long exegesis on the content of the class. A summary of the person’s teaching style that might boil down to, “They’re great!” (Nobody wanted to get a colleague in trouble.) And a comment or two along the lines of, “Maybe reconsider the font on those PowerPoint presentations.”
“Everybody’s like, ‘Well, whatever. You can’t really measure teaching anyway,’” Young says of the fatalistic view she shared with her colleagues. “And so we just do what’s required by the system.”
ADVERTISEMENT
Today Young is a full professor, and her early classroom experience is not unusual. A core part of a professor’s job — and arguably the most central role of higher education — is teaching. Yet no matter how challenging the subject, how invested the professor, or how varied the students, teaching capability is often reduced to a number on a point scale. Congratulations: You are a 6.3 out of 7. Or, try harder, you’re a point below average. The additions of peer review and a written reflection on your own teaching may appear to give the process more depth, but on many campuses professors say that these tools still come with little guidance or forethought.
Some colleges and universities, including Boulder, are trying to change that equation. They are investing in more thoughtfully designed course evaluations, preparing faculty members to substantively critique their colleagues, and fostering discussions of teaching in departmental meetings, which is often where change happens. There is even a national effort underway, led by several higher-education groups and research universities, to transform the way teaching is evaluated.
But the movement is still young, and the work, according to many reformers, is difficult. They face the familiar obstacles of entrenched norms, disagreement about what it means to be a good teacher, and limited time. This methodical, collaborative approach is also a relatively new way of looking at teaching, which has traditionally been considered the purview of the faculty member, particularly at research universities, where stellar teaching has often operated in the shadow of high-profile research.
“Part of the reason that this is slow here,” says Noah Finkelstein, a physics professor who has been working for about a decade to reform teaching evaluations at Boulder, “is that we had to invent how to do this on our campus. There were no guidebooks.”
As a result, the ways in which many departments and colleges across the country assess teaching skills remain more ad hoc than deliberative, more superficial than substantive. Considering that most instructors enter their first classroom having received little guidance in graduate school on how to teach, getting minimal — and often useless — feedback only compounds the problem.
ADVERTISEMENT
The irony, of course, is that colleges frequently tout their commitment to dynamic and engaged teaching. They offer faculty workshops on active learning and inclusive teaching. They might even encourage instructors to redesign their courses. But what message does it send when those investments of energy aren’t meaningfully measured when it comes time for raises and promotions?
“Human beings are going to try to be efficient in how they use their time. And if something is clearly not being valued or evaluated or rewarded, then they’re going to put their time where they are being rewarded,” says Gabriela Weaver, assistant dean for student-success analytics in the College of Natural Sciences at the University of Massachusetts at Amherst. She is leading a project there to reform teaching evaluation on her campus.
This isn’t just a matter of equitable pay, or a debate on whether colleges should value teaching as much or more than research or service. A system that fails to evaluate teaching effectively, reformers say, shortchanges students. A growing body of research shows that effective teaching is hugely influential in determining whether students succeed in college, and that it is a key lever in helping supportstudents who may have come into college with fewer educational advantages than their classmates. A slapdash evaluation of teaching, in other words, undermines higher education’s ability to deliver on its promise.
It wasn’t that long ago that teaching was seen as more of an art than a science. Examining one’s own teaching or the teaching strategies of others seemed to serve little purpose. The excellent teacher was the charismatic or engaging professor. Questions on course evaluations still reflect that. Two common, and highly subjective, questions ask students to rate their course and rate their instructor over all. Sometimes answers to those two questions have been the only ones used to determine merit raises.
A great deal of scholarship challenges this narrative and offers alternative scripts. The effective instructor, teaching experts tell us, is one who is well organized, whose course material and class discussions align with what students are tested on, who sparks students’ curiosity and fosters their confidence as learners, who is willing to adapt as the student body changes, and who stays on top of the latest teaching innovations.
ADVERTISEMENT
Course evaluations, too, have come under the microscope: Dozens of studies have shown they are subject to racial and gender bias. A meta-analysis found little to no correlation between how highly students rate their instructor and how well they have learned the subject. In 2019 more than a dozen scholarly organizations endorsed a statement that describes the current use of student evaluations as “problematic” and recommends a holistic approach to merit and promotion reviews.
So why haven’t evaluations of teaching kept pace with these developments? Weaver says it’s a chicken and egg problem. Professors don’t want to put in the work of developing and undergoing a more rigorous review of teaching until that work is more highly valued. But many administrators won’t understand that good teaching is a complex endeavor worthy of more attention until the old measurements are scrapped and replaced with something better.
Another dilemma, particularly at research-intensive universities, is that administrators still tout scholarly contributions in ways that make them appear more valuable than teaching, such as announcing how much money their researchers attract. “I personally would counter that good teaching, and therefore teaching that would lead to higher retention and higher success rates, would also lead to higher dollars coming in,” Weaver says. “But that is much more difficult to track than it is to track contracts and grants coming in.”
The current system is also easy — and cheap.
“Student evaluations require no effort on the faculty or the school’s part. It’s all done through our Office of Institutional Research,” says Ginger Clark, associate vice provost for academic and faculty affairs at the University of Southern California, which is undergoing its own reforms. “Everything’s sort of automatic.” Substantive evaluations, by contrast, might require faculty members to sit in on one another’s courses and review their syllabi and grading practices. “And that takes time. Faculty, already, their workloads are tremendous.”
ADVERTISEMENT
Qualitative measures also appear more subjective, which makes those in charge of ranking and measuring teaching quality inclined to stick with a numerical system. Giving a lower raise to a professor who scored a 4.8 than one who received a 6.0 feels fairer to many people.
“Numbers are seductive, and numbers give the appearance of objectivity, but — with this particular measure — none of the substance of objectivity,” says Lindsay Masland, a director at the Center for Excellence in Teaching and Learning for Student Success, at Appalachian State University.
But perhaps the biggest challenge is that before asking how you can better evaluate teaching, you need to ask: What is good teaching? Put another way: What is it you’re measuring? And that’s where things get complicated. If you think creating a positive classroom environment is important, for example, what kinds of teaching practices would you look for? Some benchmarks and inventories encompass dozens of metrics, and a potentially prescriptive approach to evaluation makes professors uncomfortable. Yet there has to be some way to identify good teaching that goes beyond the subjective.
“There are still very many people for whom good teaching seems a little bit magical and mystical,” says Eugene Korsunskiy, an associate professor of engineering at Dartmouth College, who is helping develop a new evaluation model for the School of Engineering. “Our thesis here is that, in fact, teaching is a lot more about a series of learnable skills than any sort of magic talent or trait that some people have and some people don’t. But it’s hard to specifically articulate exactly what are those skills.”
Why? One national effort sheds some light on that. Since 2017, Transforming Higher Education — Multidimensional Evaluation of Teaching, or TEval, has involved hundreds of faculty members and administrators at Boulder, UMass-Amherst, and the University of Kansas in the work of creating new evaluation processes on their campuses. It is funded by the National Science Foundation and has the backing of the Association of American Universities and other higher-education organizations. The goal of the project is to advance the use of evidence-based teaching methods by changing the way teaching is evaluated.
ADVERTISEMENT
Finkelstein has been the lead champion of TEval at Boulder, where he is also one of the directors of the campus Center for STEM Learning. Although multiple kinds of evidence had long been factored into teaching evaluations, he says, the tools used and the data collected “were at best incomplete, often ad hoc, and underspecified and ill determined.”
Building support for change has required the commitment of an evangelist going door to door — or in this case, department to department — to convince people of the value of the project.
Some professors, Finkelstein says, have had to be moved away from the “I know it when I see it” approach toward good teaching. Many of them came of age in a different era, one in which they weren’t held accountable for what they did in the classroom, he says. “That doesn’t work anymore.”
A lot of faculty members also need time to get up to speed on the scholarship about effective teaching practices. They may never have read any of that research, even within their own discipline. And no department can simply adopt wholesale another’s work. Boulder has created what it calls the Teaching Quality Framework Toolkit to start the process, but it’s up to the chair and other academic leaders to decide how it applies to their instructors.
That means transformation is happening one department at a time. Since the project began seven years ago, Finkelstein says, more than 50 departments across the university — well over half — are engaged in evaluation reform, and a few have completed the process.
ADVERTISEMENT
One of the early adopters was Young’s department, where the TEval project coincided with an effort to revamp the history curriculum. Young and her colleagues decided to focus their evaluation-reform efforts on improving peer review.
They tossed out those seat-of-the-pants class observations Young remembered from her early career and instituted a formal plan. Now, at the beginning of the semester the chair asks specific faculty members to serve as course evaluators for other instructors. That gives them time to plan what they want to do over the semester, she says, “as opposed to scrambling at the end.”
The evaluator and instructor decide what to focus on, with an emphasis on teaching techniques rather than course content. The evaluator reviews the professor’s syllabi and course materials, has access to the course in the learning-management site, conducts one or two classroom visits using a standardized form, and meets with the instructor during the semester to discuss what they’ve observed. The aim is to help the instructor progress rather than issue a summary judgment.
It wasn’t entirely smooth sailing, Young notes. Some professors felt that the increased oversight was an attack on their academic freedom and feared these new protocols would be used to tell people how they should teach. Others worried that the process would become reductionist. The TEval project initially focused on STEM departments, she notes, and some professors believed that good teaching in the humanities was less easily measured.
Young and her colleagues tried to steer conversations toward common ground: finding markers that everyone could agree represented effective teaching. Does the professor engage students intellectually, for example? Run a well-organized lecture? And while faculty members could decide what criteria they wanted their evaluator to focus on, “‘no criteria’ was not a defensible position. “That is not the same thing as academic freedom.” Young says. “We are scholars. We do peer review as part of our research. And we have criteria upon which we judge whether your research is valid or not.” Teaching, in short, should be no different.
ADVERTISEMENT
If some senior professors were more hesitant, she says, the newer hires were all in. They wanted the built-in mentoring and more explicit framework. “It was the junior faculty who were like, No, please, could we please have this structure?” she recalls. “Which ended up swaying a lot of tenured folks.”
Project leaders on other campuses say they’ve seen similar reactions among faculty members, many of whom had been looking for more guidance on their teaching. “It gives them the language to talk about something that sometimes seems a little nebulous or unclear,” says Christopher Young, assistant vice chancellor for academic affairs at Indiana University Northwest. The university’s new pathways plan for documenting teaching excellence, he says, provides that vocabulary.
What sparks this kind of change? Growing concern about the inequity of student course evaluations has inspired some campuses to start there, either rewriting them in ways that make them more useful or reducing their weight in determining raises and promotions. That work frequently opens the door to deeper conversations in departments and across campus about how to create a culture of teaching excellence.
University of Oregon leaders took this approach, scrapping the traditional course evaluations in favor of a new instrument called the Student Experience Survey. They created new teaching-evaluation standards, grouping them into four categories — professional, inclusive, engaged, and research-informed — and made sure the questions on the student survey aligned with those categories. And they created new tools for peer review and self-reflection.
Lee Rumbarger, associate vice provost for teaching engagement, notes that this was a multi-year process starting with the Office of the Provost and the University Senate, then moving out into colleges and departments.
ADVERTISEMENT
It has been hard work. “It’s certainly less straightforward than looking at a 4.8. So people have to value teaching enough to read through the accumulation of evidence across sources,” she says. “And they also need to feel confident enough that they can recognize our criteria.”
Having a president or provost champion the work helps. Two years before Boulder signed onto the TEval project, a student-success committee convened by the provost recommended that teaching receive the same scrutiny as research. The University of Georgia’s work to revamp teaching evaluations similarly began in 2017 with a recommendation from a presidential task force. That same year at Indiana University, Michael McRobbie, who was president then, asked the faculty to work with leadership to develop a new path to tenure and promotion based on excellence in teaching.
Faculty members’ concerns can initiate change as well.
At the University of Kansas — another TEval site — a movement to reform the way teaching is assessed accelerated after faculty members realized there was no way for them to be recognized for the many hours they had spent on a major curricular-transformation project spanning hundreds of courses and dozens of departments.
“We had so many faculty that were investing time and energy into transforming their courses to bring in evidence-based, inclusive methods, active learning, collaborative learning focused on assessing student learning,” says Andrea Follmer, who leads the TEval effort and heads the university’s Center for Teaching Excellence. “And then they were finding that in the annual evaluation or P&T evaluation process, that work was largely invisible. That was really disturbing.”
ADVERTISEMENT
That realization led to a feedback loop that continues today, she says. The course reforms and changed teaching practices led to a desire among faculty to better document their work, which has reinforced the importance of the work itself.
Bias in student course evaluations has spurred individual faculty members to champion reform within their departments. Prajna Dhar, a professor at the University of Kansas in the department of chemical and petroleum engineering, works in a discipline where women and faculty of color are in the minority.
“I have seen male faculty members who talk about their kids in class, and the perception is: Ohhh, they are such a family man!” she recalls. “And if you are a female professor and talk about your kids, it’s, ‘Oh!’” — with an exasperated gasp — “‘they talk about their kids!’” When students treat professors differently based on their identities, how can administrators count on their course evaluations to be fair?
“I’m not saying for all white male professors [the system] works,” she adds, “But the stakes are higher for the minority professors who already face a lot of microaggressions.”
Her department is one of five nationwide — including Dartmouth’s engineering program — participating in a project of the Association of American Universities that focuses on changing the evaluation process in STEM disciplines. Dhar has been helping her department develop “peer triads” to foster a more holistic evaluation process. (Kansas has also revamped student surveys of teaching.) Faculty members who teach similar, or sequential, courses meet regularly throughout the semester to share teaching strategies and give one another advice and feedback.
ADVERTISEMENT
A key part of reducing unconscious bias in the process is moving away from strictly quantitative metrics. Interest among other faculty members in a more formative approach varies, Dhar says, but “there are enough colleagues to say, hey, this is great. I can now discuss this with another person. And I can showcase the little things I do” more easily than when evaluations were just focused on a number.
How will colleges know if these new systems are working? Rewriting student course evaluations to remove ambiguity and focus on what students can truly measure, reformers say, immediately makes them more useful and accurate measures of how students experience a course.
Beyond that, project leaders typically speak about how the process itself has been beneficial. By designing and using more substantive ways to assess teaching, they say, faculty members will inevitably devote more attention to their teaching. That may come through regular discussions with departmental colleagues, heightened scrutiny of their course design and teaching styles, and a willingness to experiment in the classroom knowing that they will no longer be penalized if students give them lower marks.
“Are people talking about teaching more? The answer is definitely yes,” says Oregon’s Rumbarger.
Yet even on those campuses where teaching-evaluation reforms are well underway, some of the work is running up against structural barriers that are not likely to go away anytime soon.
ADVERTISEMENT
Samantha Hopkins, head of the department of earth sciences at Oregon, sees those barriers on her campus.
Faculty members are largely happy with the reforms to the course evaluations and other changes that have made evaluations more substantive, she says. But administrators accustomed to the numbers-driven systems are finding the new process challenging. “I’ve heard a lot of people expressing a feeling that they miss the student evaluations, and they don’t like the student-experience surveys as much because it’s so much harder to pull an assessment of someone’s teaching out of it,” she says.
Hopkins doesn’t miss the old system but understands the feeling: It is so much easier to compare numbers to numbers. “It’s something I’m struggling with right now,” during annual evaluation time, she says. “It’s the challenge of looking at what people are doing and saying: Is this good enough? What is good enough?”
And then there are the old, hard-to-budge hierarchies. “You hear a lot of lip service given to the importance of teaching,” she says. “But really when it comes down to it, so much of university culture is really centered around the importance of research.”
She recalls a conversation she had with a senior administrator who objected to the idea that a senior instructional faculty member should make as much as assistant professors. It’s a view widely held across campus, she says. “This idea that someone who does only teaching and not research can’t make as much as someone who does research, even the most junior member of the research faculty, tells you where they are actually putting their money.”
ADVERTISEMENT
The idea that great research is more difficult than great teaching still holds sway. “They are both quite hard,” says Hopkins. “But if you don’t put them on the same footing you’re just never going to get as much effort, as much mental bandwidth, devoted to teaching as you do to research.”
Kathryn Mills, an associate professor of psychology at Oregon, also believes that research will continue to be valued more highly at research-intensive universities. But while the broader system might not change, she thinks these new teaching-evaluation protocols will at least help faculty members know what to focus on. “It’s going to be clearer to folks that we care about engagement, inclusiveness, professionalism, and research-informed teaching,” she says. “And that will make a difference in terms of a collective culture around caring about teaching.”
Meanwhile, the initial group involved in TEval has been branching out and connecting with other colleges interested in this work. Last summer, more than two-dozen institutions got together to discuss how best to foster a national movement. Follmer, of Kansas, says there’s also a particular interest in helping under-resourced institutions, such as regional publics, better understand the kind of infrastructure needed to sustain the work. She’s hopeful that by sharing what places like Kansas, UMass, Boulder, and others have done, people won’t feel like they need to start from scratch.
Conditions have made it more likely that colleges will consider scrapping their old evaluation systems in favor of a process that is more thoughtful, coherent, and based in research. The question now is whether that will be enough to propel them through the skepticism and uncertainty that has kept a flawed system in place this long.
Beth McMurtrie is a senior writer for The Chronicle of Higher Education, where she focuses on the future of learning and technology’s influence on teaching. In addition to her reported stories, she is a co-author of the weekly Teaching newsletter about what works in and around the classroom. Email her at beth.mcmurtrie@chronicle.com and follow her on LinkedIn.