News

Reliance on Test Scores Is a Conspiracy of Lethargy

By Wendy M. Williams

October 10, 1997

Consider two hypothetical situations. In the first, a man waiting for a bus cannot help but notice the young mother next to him, beaming as she coos to her baby. The man leans over for a peek. “What a beautiful little girl!” he exclaims. “Oh, this is nothing,” answers the mother, “you should see her photographs.”

The second situation takes place in the same city. A woman applies to graduate school at the local university, having worked for two years after graduating from college for a major researcher in her field. She submits glowing letters of recommendation from four top scholars; all of them say her research is original, innovative, and creative. She also supplies copies of two articles that she has published, which clearly show her scholarly promise and maturity as a researcher. However, the university does not admit her. Why? The admissions committee cannot see past her uninspiring scores on the Graduate Record Examination.

We're sorry. Something went wrong.

We are unable to fully display the content of this page.

The most likely cause of this is a content blocker on your computer or network.

Please allow access to our site, and then refresh this page. You may then be asked to log in, create an account if you don't already have one, or subscribe.

If you continue to experience issues, please contact us at 202-466-1032 or help@chronicle.com

Whether they realize it or not, many academics and administrators share the young mother’s confusion between reality and surrogates for it. The mother believes that photographs are more important than how her baby looks in real life; universities -- and our society in general -- believe that test scores are more important than tangible evidence of meaningful performance in the very areas for which the tests are alleged to predict success.

Yet the Graduate Record Examination, for example, does not predict success in graduate school particularly well. A study that I published with Robert Sternberg, a professor of psychology at Yale University, in the June issue of American Psychologist showed that the G.R.E. did predict first-year grades in graduate school, but only weakly. The G.R.E. did not predict second-year grades or, indeed, any of the other indicators of success in graduate school: the ability to think analytically, creatively, and practically; the capacity to teach and conduct research; and the quality of one’s dissertation. The single exception that we found was the analytic subtest of the G.R.E., which measures logical and analytical reasoning ability, and which predicted dissertation quality -- for men only.

Those findings accord well with data published by the Educational Testing Service, the creator and marketer of the G.R.E., as well as with data published by a diverse group of researchers. The bottom line is basically the same for the G.R.E., the SAT, and other related tests: Those tests weakly predict course grades during the first year of a multiyear program, but they predict little or nothing else.

This fact should compel us to ask if predicting first-year course grades is really very important. What about longer-term success in an academic program? And what about performance in the real world after graduation? The G.R.E. and its relatives do not tell us who will go on after training to revolutionize a scholarly discipline or a profession. They only predict who will do well in first-year course work, and even here the magnitude of the prediction is modest (correlations hover around 0.2 or 0.3).

The E.T.S. makes data on its tests’ limited predictive ability available to everyone who uses them. There is no conspiracy of silence. Instead, we have a conspiracy of lethargy and frustration on the part of universities, which receive far too many applicants for the slots available. In many cases, virtually all the applicants have high grade-point averages and stellar letters of recommendation. It is not feasible to try to interview all the applicants, so administrators turn to the only quantitative source of information they have that allows them to make some distinctions among applicants: scores on tests such as the SAT and the G.R.E.

How exactly are these tests used in making admissions decisions? Policies vary across institutions, but it is common for applicants whose scores fall below a certain cutoff never even to be considered for admission, particularly at highly competitive universities. In addition, scholarships are often based partly on applicants’ scores. The scores provide a false sense of security because they seem so scientific: Faculty members and administrators alike are attracted by the apparent quantitative distinctions among applicants’ scores, which suggest distinctions among applicants’ competencies that often do not exist. The pressure to rely on a precise number, rather than on an overall subjective impression of an applicant’s worth, is enormous.

Consider the dilemma that results: If a student with high scores is admitted and succeeds, admissions officers and others have their pro-test bias confirmed. If a student with high scores is admitted and fails, people throw up their hands and say, “Who would have guessed, with such high scores?” They consider this student merely an exception to the rule that high scores predict success. If a student with low scores is admitted and fails, everyone says the admissions staff should have known better, and, once again, the pro-test bias is reinforced. But if a low-scoring student is admitted and succeeds, admissions officers and others may come to see that their pro-test bias is inappropriate. Unfortunately, however, because relatively few low-scoring students are admitted to selective institutions, few data exist to measure their persistence and success, and the institutions’ pro-test bias is rarely challenged.

Additional pressures to continue using tests come from outside the college or university: Average scores for students entering each institution are published and used to rank programs and institutions, a practice that encourages admitting only students with high scores. What we have are entrenched policies for admissions decision making that are far from optimal -- a fact that has been recognized by the few institutions that have broadened their criteria for admission and abandoned the use of admissions tests.

Recently, the use of admissions tests has been at the center of a vitriolic debate. In line with a policy enacted by the regents of the University of California barring admissions preferences based on race or sex, the U.C. graduate schools reduced the number of minority-group students to whom they offered admission this fall. For example, 196 minority-group students applied to enter medical school at the University of California at San Diego; none was admitted. The law school at the University of California at Los Angeles broadened its admissions criteria to try to keep its student body diversified (although the school is not giving admissions preference based on race or sex); still, the number of black students entering law school there this fall is the lowest since 1967. Next year, U.C.'s undergraduate programs also will stop using race and gender preferences, and we can expect similar admissions trends elsewhere.

The debate over the fairness of affirmative action affects the lives of many Americans. Yet, despite the bitterness of the debate, in which advocates on both sides pelt each other with statistics, the key issue of the actual predictive value of standardized test scores is often left uninvestigated. The test scores sit quietly at the center of the war over affirmative action, swollen with significance for both sides. Affirmative-action advocates shout that test scores are lower on average for blacks than for whites because of racial discrimination and inequality of educational opportunity. Opponents shout that the tests are fair and objective and that everyone has an equal opportunity to prepare for the tests and score well.

But few consider what the tests actually predict. The bottom line is that if the tests do not meaningfully predict success, their use should be limited, and they should be supplemented by better tests that do predict success. Although the issue of who scores better and why has been at the heart of the affirmative-action debate, psychologists have a responsibility to make sure that the debate also includes an examination of the entrenched use of tests that fail to predict success in any meaningful way.

What, for example, might an alternative to the G.R.E. look like? My vision is a test that assesses, at an introductory level, the types of skills necessary in a particular discipline or profession. We are developing such a test for the social sciences in the department of human development at Cornell University. The test, now being used in pilot studies, provides background information and general rules for completing tasks, to reduce the advantage of students who already have been exposed to the rules of the discipline or profession. Competent students without direct experience in a field who have the capacity to think in the ways important in graduate school are thus able to perform well, by using the information given in the test. For example, the test asks students to review an article and lists the six attributes of a review that experts in the field agree are important. Students can use this information in preparing their responses.

Besides evaluating their ability to review a flawed article critically, the test under development also assesses the following abilities of applicants: to pose and defend an interesting research question; to devise studies to answer specific research questions; to prepare a plan for an introductory lecture; to organize a brief talk for a professional conference; and to interpret sensibly a confusing set of research findings.

Each major area -- natural sciences, social sciences, and humanities -- would require a different test, of course, and my vision is just one of many ways to approach development of an alternative test. Regardless of what it looked like, though, a better test would mean that debates about the use of scores would make more sense, because the scores would be more relevant and defensible. The future of our educational institutions -- and of our society at large -- depends on the admissions decisions that we make, and the size of our nation means that decisions probably must continue to be based on some kind of test scores. It behooves us all to understand what our tests do and do not predict, lest we, too, confuse the baby with her photograph.

Wendy M. Williams is an associate professor in the department of human development at Cornell University.