A new study that examined thousands of examples of student work in nine states may give professors, administrators, policy makers, and the public better tools to systematically understand what students are actually learning in college.
At least that’s what the supporters hope of the research effort, the results of which were released on Thursday.
“Proof of concept is what it is,” said Julie M. Carnahan, a vice president at the State Higher Education Executive Officers, an association that led the project, called the Multi-State Collaborative to Advance Learning Outcomes Assessment, with the Association of American Colleges and Universities. “We have proved that this is an alternative to standardized tests.”
That alternative is a set of rubrics, or grids, that stake out common standards for faculty members to use to evaluate student assignments. The project seeks to unify two ideals: preserving professorial authority over the assigning and grading of student work, and tying such work to norms that can be judged externally and consistently across courses, institutions, and states.
The project began in response to two concerns that have preoccupied leaders of state systems of higher education in recent years, Ms. Carnahan said. Employers and policy makers have complained that newly hired college graduates often lack the problem-solving skills needed in today’s workplace. At the same time, many states have been basing university funding in part on a set of performance measures, but those formulas have used metrics of academic quality, like graduation rates, that faculty members and college administrators have seen as too blunt and too subject to external forces.
Some states use students’ scores on standardized tests, like the Collegiate Learning Assessment and the ETS Proficiency Profile, in their funding formulas. Those tests can provide external standards that allow students and institutions to be compared according to common criteria, but such assessments are unconnected to the curriculum and their results are seen as flawed because students have little incentive to try to score well on the tests. Course grades are often authentic indicators of what students do, but they are also subject to inflation, the whims of instructors, and the differing norms of institutions.
The new project is seen as a potential breakthrough because it uses as its raw material the actual work that students produce and gives faculty members a common language and rating system to evaluate that work.
Some 126 instructors at 59 institutions attended day-and-a-half workshops to use faculty-developed rubrics to evaluate 7,215 examples of student work. Slightly more than a third of the assignments were judged twice to establish consistency between raters. The scorers didn’t judge work from their own institutions.
The colleges represented both two-year and four-year public institutions in Connecticut, Indiana, Kentucky, Massachusetts, Minnesota, Missouri, Oregon, Rhode Island, and Utah. The results, however, were not statistically representative of the states, the nation, or even each institution. Only those students who had completed three-quarters of their credits toward a degree participated.
Useful Feedback
Still, the results painted a picture of student achievement in both broad strokes and minute detail. Students at four-year institutions tended to score better — a three or four on a scale of zero to four — compared with their peers at two-year colleges. A three or four signals a “high” or “very high” level of achievement.
Scorers used rubrics in three skill areas, critical thinking, quantitative literacy, and written communication, each of which was dissected into four or five parts. Critical-thinking skills, which professors and administrators often invoke but seldom define, were divided and scored according to how well students explained an issue or problem; how well they selected and used evidence; how skillfully they examined the larger context influencing the issue; and the quality of their thesis and conclusions.
Overall scores for critical thinking were lower than those for quantitative reasoning and writing: Nearly a third of students at four-year institutions scored a three or four over all in critical thinking, while just 19 percent of students at two-year colleges did so.
But it was the subcategories within each broad skill area that were often more revealing, several faculty members said. In quantitative reasoning, for example, students could often perform calculations but had difficulty describing the assumptions underpinning the numbers in their work. An example of an assumption is from an economics assignment: Students might be asked to imagine that they are developing a city budget based on tax revenues. If the economy plunges, what happens to taxes? Some faculty members realized, after looking at the rubrics, that their assignments often required students to do calculations but not to consider how those calculations related to a broader context.
Such detailed feedback is particularly useful because it directly relates to actual course work, said Jeanne P. Mullaney, assessment coordinator for the Community College of Rhode Island. The results can help faculty members change their assignments, guided by a shared conception of a particular skill area. “The great thing with rubrics,” she said, “is the strengths and weaknesses are readily apparent.”
And when those strengths and weaknesses are shared across institutions, states, and beyond, faculty members have been able to exchange ideas about how to bolster areas in which many students struggle, which has proved to be another benefit of the project.
But the rubrics have proved challenging for faculty members, too, Terrel Rhodes, vice president of the Office of Quality, Curriculum, and Assessment for the AACU, wrote in an email to The Chronicle. “The biggest hurdle we find is for faculty to get out of the grading mode based on getting the right answer to assessing the underlying skill and ability.”
Sticks and Carrots
Assessment is often used in two ways. Sometimes it’s a tool for making judgments and holding people and institutions accountable. At other times, results help those people and institutions to improve.
The collaborative project is achieving the latter purpose, according to several people involved with it, but Ms. Carnahan is concerned that the former may happen, too. The project came about, in part, in response to inadequate measures of educational quality in state-funding decisions.
Does she worry that the project’s results will be tied to money and performance funding? “That does give me some pause,” Ms. Carnahan said. “It’s not where we want to go with this at all.”
The subcategories for each area — students’ ability to calculate or to provide context for their numbers, for instance — are more revealing than an overall score, she said. “The whole point of this is to improve student learning.”
Other risks might surface as well, said John D. Hathcoat, an assistant professor at the Center for Assessment and Research Studies at James Madison University, who has been watching the project with interest. He applauded the effort, adding that the data it produces will be extremely useful to researchers.
But Mr. Hathcoat also worried about the validity of the study’s conclusions, and warned that using different assignments could skew efforts to measure a common standard. Standardization has drawn a backlash in education, he said, but it shouldn’t be equated with multiple-choice tests.
Some standardization, he said, is good. It would be absurd, he wrote in an email to The Chronicle, to compare two students when one has been asked to write a mathematical proof and the other to complete addition problems. “Why would we consider doing this with institutions of higher education?” he asked.
But using different assignments as the basis of such a widespread analysis had also made the exercises, and the larger design of courses, better, said Christopher K. Cratsley. The results of the analyses of student work at Fitchburg State University, where Mr. Cratsley is director of assessment, suggested that students who don’t major in the sciences are seldom asked to engage in quantitative reasoning, while everyone is assigned work that develops their critical-thinking and writing skills.
“We’ve seen that some assignments are sometimes not as good at soliciting these skills as other assignments,” he said. “That helps us think about how we create a balance in the instruction we give.”
Dan Berrett writes about teaching, learning, the curriculum, and educational quality. Follow him on Twitter @danberrett, or write to him at dan.berrett@chronicle.com.