For thousands of high school students, the recent news that there were scoring errors in thousands of this year’s Scholastic Assessment Test scores is the stuff that nightmares are made of — particularly in this nervous time of envelopes arriving with colleges’ rejections and acceptances. And the number of students affected isn’t minimal. The College Board now admits that about 4,600 students — or almost 1 percent of the 495,000 who took the October 2005 test — had received erroneous scores as a result of answer sheets expanded by moisture, as well as other problems. While not trying to minimize the anxiety and pain of these college aspirants, I must confess that for a social scientist such as myself, these scoring errors — particularly ones from years past, which we must assume also exist in significant numbers — are a dream come true because they afford an opportunity to finally subject the controversial test to an important experiment.
I have long fantasized about persuading New York University (where I work) or some other selective college to let me into the admissions office in the middle of the night. I wouldn’t steal anything. I would merely swap about 50 random applications in the acceptance pile with 50 in the rejection pile — altering the educational trajectories of my poor research subjects forever (assuming NYU was their top choice). Then I would secretly follow these 100 students to see how they turned out several years later. Was not getting into NYU really so bad for those who “deserved” it? Did it hurt their long-term earning power? Likewise, I might ask: How good was our admissions office at picking winners? Did the “losers” secretly swapped into the acceptance pile fare worse than those who “truly” merited admission? This, by extension, would reveal a lot about how good the SAT — on which NYU and most other selective schools still rely heavily — is at predicting academic success.
Of course, I’d probably have an easier time getting approval to shock these undergraduates in some basement psychology lab than to mess with their (and NYU’s) futures by altering their admission results. And for good reason.
Yet the latest SAT scoring errors present a wonderful natural experiment. By going back to previous years and rescoring exams to detect any scoring errors, one could essentially perform the same experiment with no intentional harm done. Did the kids who had artificially low scores, thereby getting into their second or third choice for college, rather than their first, do worse in their later outcomes? Or did they perform better at these schools than would have been predicted by the (false) test results? If the answer is “yes,” that suggests that the “true” score has some validity, but if it is “no,” then we are left to wonder if more schools should follow the route of some universities in California, Texas and other states, and deemphasize the importance of the SAT in admissions.
And surely there must be plenty of students who benefited from errors in years past by receiving higher scores than they should have. These lucky ones make for experimental fodder as well: Did they end up performing just as well as their “fake score” predicted? If so, then either the test is broken, or it has a large Pygmalion effect — the name given to a devious experiment that showed that faux test scores, when communicated to grade school teachers, affected their perceptions of students’ abilities and thereby ended up affecting their “true” abilities. Either way, the results would offer another argument for getting rid of the SAT.
Then there is the perennial issue of racial, class or gender biases in the test, which evidence from bread and butter social science points to, though by no means proves. There is, for example, a substantial economic gradient, with each additional $10,000 of family income predictive of an increased SAT score of between seven and 44 points (depending on where you are on the income curve). And there are well-documented racial differences. For instance, in research published in the Harvard Education Review, Roy O. Freedle compared race differences in responses to easy vs. hard questions and found that the differences were much greater among the easier items. His interpretation was that easier items — those using more common words that can have multiple interpretations across cultural subgroups — are more reflective of bias than hard items would be. Likewise, SAT questions measuring reading comprehension showed less “bias” than vocabulary-based questions.
On the other hand, math items showed the same easy-hard dichotomy, which raises the possibility that the differences are a result of subpopulations within racial groups performing differentially well. That is, it could be that the gap between low-performing whites and low-performing blacks is greater all around than the gap between high-performing whites and high-performing blacks. This would suggest stratification within the black community rather than test bias per se.
And then there is the issue of “stereotype threat.” When psychologists Claude Steele and Joshua Aronson primed college students with negative racial stereotypes before administering achievement tests, they found significantly worse-than-predicted scores for minority students (as predicted, incidentally, by their SAT scores). This suggests that perhaps it is not the tests that are biased, but the entire culture.
But the data are far from conclusive on the subject. In the book “The Shape of the River: Long-Term Consequences of Considering Race in College and University Admissions,” authors William G. Bowen and Derek Bok (former presidents of Princeton and Harvard, respectively) show that minority students admitted to elite colleges with lower SAT scores than their white counterparts go on to perform equally well post-graduation on a variety of measures, including professional achievement and community service. On the other hand, these students do not fare as well in other important measures, such as income.
The proposed experiment with SAT scoring errors would certainly shed light on the issue of race, class or gender bias as well. If the error-induced score gaps were equally predictive of later success for specific groups, then they would suggest little bias in the SAT. If they were more predictive for nonwhites — that is, if the “true” scores were less valid — then they would suggest significant bias. And, of course, we must be open to the counterintuitive possibility that such results would show that the test in its current form actually helps minority students. (In fact, this was the original, lofty dream of the SAT — to counteract the old-boy, networked admissions game for elite schools with a universal, meritocratic standard.)
Oh how I would love to get my hands on these data!
And before you call me completely selfish, I must say that, just maybe, the news for this year’s seniors isn’t all bad. If you get a rejection letter from your dream school, you can always blame it on a scoring mistake — you might have been Yale-bound, if it weren’t for the darn College Board tests. At least, that’s what you can now tell your parents as they pack you off to state college. Sometimes, even in the world of education, ignorance is bliss.
This story has been changed since it was first published.