In the coming weeks, students will participate in a ritual as familiar as it is reviled: evaluating their instructors.
One of the latest and most visible critiques of these assessments came this year from Carl E. Wieman, a Nobel Prize-winning physicist and professor at Stanford University’s Graduate School of Education. He cast doubt on their validity and reliability, proposing that instead, professors complete an inventory of the research-based teaching practices they use. That would be more likely to promote learning than garden-variety evaluations do, Mr. Wieman wrote in a recent issue of the magazine Change. “Current methods,” he said, “fail to encourage, guide, or document teaching that leads to improved student learning outcomes.”
Is there a better tool out there? If student input matters, how can it be made meaningful?
The IDEA Center, a 40-year-old nonprofit that spun off from Kansas State University, thinks it has a student-ratings system that overcomes two chief critiques of most surveys: poorly designed questions and misused results. Its course-evaluation tool, which has been steadily gaining traction on campuses, is designed to help professors judge how well they’re meeting their own course goals. “It’s all about the improvement of teaching and learning,” says Ken Ryalls, the center’s president.
Still, IDEA says it’s a mistake to rely too much on any one factor to evaluate teaching. That should involve multiple measures: student feedback, peer observation, and instructors’ self-reflection. “We’re the first ones to say that student ratings are overemphasized,” says Mr. Ryalls.
Most of what’s wrong with typical evaluations, he says, is that administrators often take their results as numerical gospel. The difference in scores of, say, 4.3 and 4.4 becomes objective and meaningful. That’s like judging a researcher on one standard, the center says, like number of publications or grant money. “Neither by itself would signal quality research,” the center’s staff wrote in response to Mr. Wieman, “any more than an average student ratings score should be used as the only measure of teaching effectiveness.”
Nuanced Findings
However they’re used, a lot of course evaluations simply aren’t very good, Mr. Ryalls says.
But as flawed as they are, faculty members still turn to them as some gauge of effectiveness in the classroom. About three-quarters of instructors use formal evaluations and informal feedback “quite a bit” or “very much” when altering their courses, according to the Faculty Survey of Student Engagement.
One limitation of many tools is that they ask students things they don’t really know. A frequent example: Was your instructor knowledgeable about course content?
Azusa Pacific University used to ask those kinds of questions on a homegrown form. In 2002, the university started using IDEA’s tool. A key selling point, says Stephanie L. Juillerat, an associate provost, was that faculty members could identify learning objectives important to them and ask students to what extent they had been achieved.
IDEA draws on about 25 million surveys, collected in rolling five-year increments, to compare faculty members’ scores across institutions. Science professors, for example, whose evaluations might suffer because they’re tough graders, can’t complain that they’re up against colleagues teaching gut courses.
Beyond asking students about their progress on learning objectives, the IDEA survey queries them on their attitudes and work habits, and they are often surprisingly honest, says Mr. Ryalls. Those measures factor into a professor’s results, as do the size of the course and when it meets. A score should be modified, the thinking goes, for an 8 a.m. course that enrolled 400 students, many of whom may have been there to fulfill a requirement.
IDEA’s ratings offer nuanced information that faculty members can use, says Will Miller, an adjunct professor of political science at Flagler College, in Florida. His results on IDEA’s evaluation showed him he wasn’t pushing students much past rudimentary levels of understanding and information retention.
Mr. Miller, who is also Flagler’s director of institutional research, wanted his students to do the more cognitively demanding work of creating and applying knowledge. For a seminar on campaigns and elections, he changed the group work he assigned. He used to ask students to do case studies on real-life candidates, but now his students get more open-ended assignments on fictionalized candidates. The data and comments from his evaluations motivated him to emphasize fewer right-and-wrong answers, he says.
Choice of Tools
The IDEA Center now handles course evaluations for 380 institutions, charging them each about $7,000, depending on enrollment. Nearly all of the center’s $3 million in annual income comes from the student ratings, according to its most recent financial disclosure form. And demand is on the rise. The number of surveys it administers annually has tripled since 2003.
IDEA’s tool is one of several on the market, apart from the many homegrown forms. The Student Assessment of Their Learning Gain asks students if specific aspects of their courses have contributed to their learning. More than 13,000 instructors have administered it to nearly 290,000 students since 1997.
The Teaching and Learning Quality survey, created by Theodore W. Frick, who is now an emeritus professor in Indiana University at Bloomington’s School of Education, attracted interest from dozens of institutions about five years ago. Its questions focus on students’ perceptions of effective educational practices (prompts include “I was able to connect my past experience to new ideas and skills I was learning” and “My instructor demonstrated skills I was expected to learn”).
To study the instrument, instructors assessed student work in 12 courses one month after the courses had ended. Researchers compared those assessments with the results of Mr. Frick’s survey, finding a clear relationship: Students who’d said they frequently saw effective practices in use showed high levels of mastery.
For critics, the problems with student evaluations are too fundamental to be fixed.
It doesn’t much matter what the questions are, says Linda B. Nilson, director of the Office of Teaching Effectiveness and Innovation at Clemson University. While IDEA’s prompts are cleanly written and thoroughly tested, most instruments still may not be valid, she says.
“What they really measure is ‘student satisfaction,’” Ms. Nilson wrote in an email to The Chronicle. “They bear no relationship at all to learning.”
Or consider this case, which Ms. Nilson cited in “Time to Raise Questions About Student Ratings,” a chapter in the 2012 edition of To Improve the Academy. Robert A. Sproule, an economist at Bishop’s University, in Quebec, returned students’ midterm examinations during the following class. But on end-of-semester evaluations, just half of them said he “always” returned their work “reasonably promptly.” Nearly a quarter gave him a three on a five-point scale.
“If such self-reported measures of this objective metric are inaccurate,” Mr. Sproule wrote, “how can one be expected to trust the validity of subjective measures like ‘teaching effectiveness’?”
For Mr. Ryalls, of IDEA, the problems with students’ evaluations shouldn’t scuttle their use altogether. “What drives me crazy,” he says, “is this notion that students don’t know what the hell they’re talking about.” They spend more time than anyone else watching faculty members teach, he says. “Student voice matters.”
Dan Berrett writes about teaching, learning, the curriculum, and educational quality. Follow him on Twitter @danberrett, or write to him at dan.berrett@chronicle.com.