According to one widely circulated grading template, an A should signify that a student is “unquestionably prepared for subsequent courses in the field.”
But if a History 101 professor hands out buckets of A’s to students who really aren’t prepared for intermediate courses, it is possible that no one (other than the intermediate-course instructors) will notice the problem. Some departments informally keep tabs on students’ preparedness, but almost no colleges systematically analyze students’ performance across course sequences.
That may be a lost opportunity. If colleges looked carefully at students’ performance in (for example) Calculus II courses, some scholars say, they could harvest vital information about the Calculus I sections where the students were originally trained. Which Calculus I instructors are strongest? Which kinds of homework and classroom design are most effective? Are some professors inflating grades?
Analyzing subsequent-course preparedness “is going to give you a much, much more-reliable signal of quality than traditional course-evaluation forms,” says Bruce A. Weinberg, an associate professor of economics at Ohio State University who recently scrutinized more than 14,000 students’ performance across course sequences in his department.
Other scholars, however, contend that it is not so easy to play this game. In practice, they say, course-sequence data are almost impossible to analyze. Dozens of confounding variables can cloud the picture. If the best-prepared students in a Spanish II course come from the Spanish I section that met at 8 a.m., is that because that section had the best instructor, or is it because the kind of student who is willing to wake up at dawn is also the kind of student who is likely to be academically strong?
Performance Patterns
To appreciate the potential power of course-sequence analysis—and the statistical challenges involved in the work—consider a study whose findings were published last year in the Journal of Political Economy. Two economists analyzed more than 10,000 students’ performance over a seven-year period at the U.S. Air Force Academy.
The scholars found several remarkable patterns in the data—patterns, they say, that might never have been noticed without this kind of analysis.
For one thing, students’ grades in intermediate calculus courses were better (all else equal) if they had taken Calculus I in a section taught by a senior, permanent faculty member, as opposed to a short-term instructor drawn from the Air Force’s officer corps. The “hard” introductory sections, where students tended to struggle, yielded stronger performances down the road.
One reason for that, the authors speculate, might be that novice instructors of Calculus I taught to the test—that is, they focused narrowly on preparing their students to pass the common final exam that all Calculus I sections must take.
“It may be that certain faculty members guide their students more toward direct memorization, rather than thinking more deeply and broadly,” says James E. West, a professor of economics at the Air Force Academy, who was one of the study’s authors. “The only way to really get at this would be direct classroom observation. We’re economists, so that’s outside our area of expertise.”
A second discovery was that when students took Calculus I from permanent faculty members, they were more likely to later choose to take elective upper-level mathematics courses during their junior and senior years.
“Even though associate and full professors produce students who do significantly worse in the introductory course, their students do better in the follow-on course,” says the paper’s second author, Scott E. Carrell, an assistant professor of economics at the University of California at Davis. “They’re motivating these students to actually learn mathematics.”
Finally, Mr. Carrell and Mr. West looked at student course evaluations. They found that students’ Calculus I course evaluations were positively correlated with their grades in that course but negatively correlated with their grades in subsequent calculus courses. The more students liked their Calculus I section, the less likely they were (all else equal) to earn strong grades in the follow-up courses.
The same pattern held even when the scholars looked only at the single question on the course-evaluation form that asked students how much they had learned in Calculus I.
Students, this study suggests, are not always accurate judges of how much progress they have made.
Mr. Carrell and Mr. West can say all of this with a great deal of confidence because the Air Force Academy is not like most places. Course sequences there are vastly easier to follow than at the average civilian college.
All students at the academy are required to take a common core of 30 credits. No matter how much they might hate Calculus I, they still have to take Calculus II. Most course sections are small—about 20 students—and students have no discretion in choosing their sections or instructors. Finally, every Calculus I section uses the same common tests, which are graded by a pool of instructors. (One instructor grades Question 1 for every section, another instructor grades Question 2, and so on.)
All those factors make the Air Force Academy a beautifully sterile environment for studying course sequences.
Mr. West and Mr. Carrell didn’t have to worry that their data would be contaminated by students self-selecting into sections taught by supposedly easy instructors, or male instructors, or any other bias. They didn’t have to worry about how to account for students who never took the follow-up courses, because every student takes the same core sequence. And they didn’t have to worry about some instructors subtly grading the tests more leniently than others.
“These data,” Mr. West says, “are really an order of magnitude better than what you could get at a typical college.”
Other Courses, Other Colleges
It wouldn’t be worth the effort, Mr. Carrell says, to try to crunch such numbers from his own campus, Davis. “If the good students select the good teachers or the lazy students select the easy teachers,” he says, “then it’s really hard to disentangle those selection effects from the causal effect of the teacher. You just can’t measure motivation and that sort of thing.”
But other scholars disagree. Course-sequence studies, they say, can yield valuable information even if they aren’t as statistically pristine as the Air Force Academy’s.
“Every university registrar has access to this kind of data,” says Valen E. Johnson, a professor of biostatistics at the University of Texas’s M.D. Anderson Cancer Center. “And at every university, there are quite a few courses that are taught in sequence. So there are a lot of opportunities to study the factors that predict subsequent success in a field.”
All that is required, Mr. Johnson says, is to statistically control for the students’ abilities and dispositions, using proxies such as their standardized-test scores and their high-school class rank. “Even just using their raw college GPA isn’t too bad,” Mr. Johnson says.
A decade ago, when Mr. Johnson was on the faculty of Duke University, he analyzed a huge cache of data from that institution. In that project—which he summarized in Grade Inflation: A Crisis in College Education (Springer-Verlag, 2003)—he looked at 62 courses in the spring-1999 semester that had prerequisite courses that had been taught in multiple sections in the fall of 1998.
He found that—all else equal—students did better in the spring-1999 courses if they had taken their prerequisites with instructors who graded relatively stringently. In particular, students did well in their subsequent courses if they described the writing assignments in their prerequisite courses as difficult.
So the lessons from Duke are essentially the same as the lessons from the Air Force Academy. The “hard” instructors seem to do a better job of preparing students for upper-level work, even though their students’ grades were lower and the instructors received weaker student-evaluation scores than their more-lenient colleagues.
At Ohio State, where Mr. Weinberg and two colleagues studied course sequences in the economics department, the pattern seems to be broadly similar. Some introductory-level instructors were consistently stronger than others, as measured by students’ performance in intermediate courses. Student-evaluation forms were a poor predictor of later performance.
In fact, Mr. Weinberg says, student evaluation scores are so weakly related to learning that it is a serious error for colleges to use those evaluations as the primary tool for judging faculty members.
It would be much better, where feasible, for colleges to use course-sequence analyses in tenure-and-promotion decisions, Mr. Weinberg says. (He concedes that that would be simpler to do at huge institutions like his own because with more students and more instructors it is easier to harvest statistically significant pools of data.)
The statistical concerns about the data being contaminated by student self-selection and other biases are overblown, Mr. Weinberg says. “It’s true that we had a less-clean design than Carrell and West,” he says. “But when we looked at our data, we found that for all practical purposes, it’s as if students are randomly assigned to courses at Ohio State.”
Stephen R. Porter, an associate professor of educational leadership and policy studies at Iowa State University, says he is deeply intrigued by the Air Force Academy paper. But he says he doubts he could find a statistically valid way to do a similar analysis at Iowa State. Simply controlling for students’ standardized test scores and high-school class rank, as Mr. Johnson and Mr. Weinberg did in their projects, is probably not good enough.
“I think you’d probably also need to control for student conscientiousness,” Mr. Porter says. “If colleges collected data about students’ personalities when they entered college, then maybe, just maybe, it would be feasible to do some valid analysis.”
Mr. Weinberg says that there is no good reason to be that fastidious about the statistics. Building systems to analyze students’ performance across course sequences would be difficult and time-consuming, he admits. But he says that such systems are perfectly possible, and that they would yield information vastly better than anything that can be learned through traditional student course evaluations.
“Seeing how people do in downstream courses provides the most convincing measure of what happened upstream,” Mr. Weinberg says. “Unless colleges want to turn to large-scale testing, this is really just the way we’re going to have to go.”
For Further Reading
Scott E. Carrell and James E. West, “Does Professor Quality Matter? Evidence From Random Assignment of Students to Professors,” Journal of Political Economy, June 2010.
Maria de Paola, “Does Teacher Quality Affect Student Performance? Evidence From an Italian University,” Bulletin of Economic Research, October 2009.
Valen E. Johnson, Grade Inflation: A Crisis in Higher Education (Springer-Verlag, 2003).
Judy Shoemaker, “Instructional Quality of Summer Courses at UCI [the University of California at Irvine]: A Report Prepared for the Council on Educational Policy,t” 2009.
Bruce A. Weinberg, Masanori Hashimoto, and Belton M. Fleisher, “Evaluating Teaching in Higher Education,” The Journal of Economic Education, Summer 2009.