Roy O. Freedle's recent article in Harvard Educational Review, entitled "Correcting the SAT's Ethnic and Social-Class Bias: A Method for Reestimating SAT Scores," is based on small differences between white students' responses and the responses of students from other ethnic groups to test items that were discussed by a number of researchers...Although any study that purports to reduce group differences must be looked at seriously, Freedle's study is so flawed that its conclusions are misleading.
There are myriad technical problems with the report, including misuse of regression and differential item functioning (DIF), and even a misunderstanding of how scores on the SAT are calculated. But one need not be a psychometrician to understand the fundamental problem with the study. The reduction in group differences is not the result of more sensitive or appropriate measurement, but rather, it is because the proposed measure relies mostly on students' guessing the answers to test questions.
To probe a little deeper, let us examine more closely Freedle's argument around DIF. Researchers have found that, on average, African-American, Hispanic, and Asian-American students tend to choose the correct response on easy test questions slightly less often than white students with an equal total test score. In contrast, they choose the correct response on difficult test questions slightly more often than white students with an equal total test score. Noting that this phenomenon occurs with SAT vocabulary questions but not with critical reading questions, Freedle suggests that the College Board should dispense with SAT critical reading questions, as well as the easier half of all vocabulary questions to improve the scores of ethnic minority test-takers.
Te suggestion that critical reading be dropped or de-emphasized on the SAT, given its importance for success in college, would not be educationally or psychometrically sound even if it were based on a credible analysis..Freedle himself notes that the critical reading items lack what he calls "the familiar pattern of bias."
To summarize so far - Mr. Freedle is suggesting dropping items that show no bias, according to his own results. The College Board alleges that he doesn't even correctly grasp the scoring method of the SAT, much less calculate DIF in the proper fashion. Doesn't look good for Mr. Freedle, does it?
Let us look briefly at the data for the so-called SAT-R Section that Freedle recommends. On the difficult items that are included in the SAT-R, African-American candidates receive an average score of 22 percent out of a perfect score of 100 percent. Since there are five answer options for each question, 22 percent is only slightly above what would be expected from random guessing, namely 20 percent. White candidates do somewhat better, achieving an average score of 31 percent. [I'm assuming this gap is smaller than for the SAT overall.] The results indicate that this test is too hard for either group and would be a frustrating experience for most students. There are simply too many questions that are geared to those with a much higher level of knowledge and skill than is required of college freshmen. Extending Freedle's argument, we could substantially reduce all group differences if the test were made significantly more difficult so that all examinees would have to guess the answers to nearly all of the questions. We could then predict that each subgroup would have an average of 20 percent of their answers correct, based on chance...
In brief, Freedle's suggestions boils down to capitalizing on chance performance. This kind of performance may represent either random guesses, or unconnected bits of knowledge that are not sufficiently organized to be of any use in college studies.
Very interesting. I hadn't even considered the guessing argument, but then, I wasn't aware of just how difficult the difficult items were. The College Board is claiming that the proposed revised SAT would not be a true measure of anyone's ability, because it would be so difficult a test that most test takers would be guessing the answers. If black students at high ability levels guess better than white students, that is most certainly not a valid measure of ability.
As the College Board puts it, "Freedle's suggestions boils down to capitalizing on chance performance." For those of you not in the field of psychometric research, the statement that one is "capitalizing on chance" is synonymous with saying, "You started with the end result in mind, and now you're trying to prove that the data show more than they actually do, and if you collect another set of data, you'll get a different answer, because your results aren't going to generalize." It's an important and fundamental criticism to make against a research study.
The rebuttal also emphatically denies that the mathematics questions measure any sort of secondary vocabulary dimension, which removes any justification whatsoever for creating a revised SAT for difficult math items. Overall, the rebuttal feels pretty definitive to me, but it won't surprise me if reporters pick up on Mr. Freedle's article without mentioning the rebuttal. The buzzwords of "racial bias" and "SAT" will be just too tempting for some to ignore, and chances are they won't look further to assess the validity of Mr. Freedle's claims. (emphasis added)