New Doctoral-Program Rankings: Frequently Asked Questions

September 28, 2010

Q. So what's in this report?

A. The National Research Council collected data from slightly more than 5,000 doctoral programs in 62 academic fields at 212 universities. The report includes two "ranges of rankings" for each program, and it also includes a huge amount of data about faculty research productivity and student experiences.

The Chronicle's exclusive interactive tool lets you compare the quality and effectiveness of doctoral programs. To use the tool, purchase a one-year Web pass from The Chronicle Store.

Q. But in some places I see programs in 59 fields, not 62. What happened to the other three?

A. The NRC did not produce overall rankings for these three.

In two cases—computer science and "engineering science and materials (not elsewhere classified)"—there are fewer than 25 programs in the fields, and that put them below the NRC's threshold for producing rankings with statistical reliability.

In the third case—"languages, societies, and cultures"—the NRC concluded that the category is such a hodgepodge that it made no sense to compare programs to one another. (The category includes programs in Chinese languages and literature, Middle Eastern studies, Renaissance studies, and several other subfields.)

The NRC report does include extensive data about programs in those three fields, but they are not ranked relative to other programs.

Q. How do the rankings work? The other 59 fields have overall ranges of rankings called "R-rankings" and "S-rankings." I've never seen anything like those before. Why couldn't they just give us simple numbered lists, the way they did in their 1982 and 1995 reports?

A. The NRC felt that simple numbers implied an unrealistic level of precision. So they elected to try a different approach, involving ranges.

Imagine that there are 30 doctoral programs in art history. The NRC's first step was to gather data about 21 different characteristics of those programs, including faculty citation rates, student time-to-degree, and so on.

For S-rankings, they surveyed faculty members (in our simplified model, let's say they surveyed 100 art historians) about which characteristics they believed were most important in doctoral-program quality. They weren't asking about specific programs here. They just asked: In general, are citation rates more important than student time-to-degree? And so on.

Q. So they averaged the 100 faculty members' opinions about which characteristics are important, assigned weights to the variables, and graded the 30 programs that way?

A. That's the basic idea. But they actually didn't use the simple kind of regression you're thinking of. They used a statistical technique called "random halves."

Here's how the process works: They randomly select half of the survey responses (in our scenario, 50 responses). Then they statistically compare them to the programs' characteristics and come up with a rank-order list: Columbia University is No. 1, New York University is No. 2, and so on.

Then they randomly select another batch of 50 surveys, and repeat the process. This time, the weights assigned to each program characteristic will be slightly different because the group of surveys is different. So, in turn, they'll produce a slightly different rank-order list of programs.

Then they go through those steps again and again. They repeat the random-halves process 500 times.

Q. So then how do they then create a range of S-rankings for each program?

A. Let's say here that we're concerned with the University of Illinois's art-history program. The NRC's last step was to eliminate Illinois's best 25 rankings (the top 5 percent) and its worst 25 rankings (the bottom 5 percent). We're now left with 90 percent of the 500 rankings Illinois received in the random-halves process. Its best remaining ranking is its 5th percentile S-ranking, and its lowest remaining ranking is its 95th-percentile ranking.

So when the NRC report says that Illinois's 5th-percentile S-ranking is 24 and its 95th-percentile S-ranking is 38, that means something like the following: We can say with 90-percent confidence that Illinois's "true" rank in the NRC's analysis is somewhere between 24 and 38. 

Q. There are also "dimensional" rankings related to these S-rankings, right?

A. Those are subcategories. They cover faculty research activity, student support and outcomes, and diversity of the program environment. They're presented in the same 5th-to-95th-percentile range as the other rankings.

Q. OK. So what about the R-rankings?

A. Those rankings are based indirectly on surveys of faculty members' opinions of specific programs. Faculty members were asked to grade a sample of programs in their field on a scale of 1 to 6.

Q. So they used those surveys to create reputational rankings?

A. It's not that simple. Using the same random-halves method, the NRC compared those survey responses to the actual characteristics of all the programs in the field. So if faculty members collectively said that they admired Columbia University's art-history program, then programs that had very similar characteristics to Columbia were ranked similarly high.

The basic idea was to discover discrepancies between the characteristics that faculty members said were important and the characteristics that they (perhaps unconsciously) seemed to value when they rated actual programs. (You might tell a surveyor that nutrition is the most important quality of a restaurant, but when you're asked to rate specific restaurant chains, it might become apparent that you actually value low prices.)

The NRC says that the R-rankings and S-rankings should be regarded as equally valid.