But tucked into the reams of data the College Board included with the new scores was some wonderful news: I was wrong. In 2003 I spent six months tracking the development of the new SAT. I sat through hours of test-development sessions and long debates sometimes fiery, sometimes soul-crushingly boring over new questions. I even learned how to grade SAT essays. TIME ran my resulting story on its cover that October.
The story did make some predictions that turned out right. For instance, the new test favors girls more than the old test did. It is a long-standing tenet of testmaking that girls outperform boys on writing exams. For reasons I am not foolish enough to speculate about in print, girls are better than boys at fixing grammar and constructing essays, so the addition of a third SAT section, on writing, was almost certain to shrink the male-female score gap. It did. Girls trounced boys on the new writing section, 502 to 491. Boys still outscored girls overall, thanks largely to boys’ 536 average on the math section, compared with girls’ 502. But boys now lead on the reading section by just three points, 505 to 502; that gap was eight points last year. What changed? The new test has no analogies (“bird is to nest” as “dog is to doghouse”), and boys usually clobbered girls on analogies.
My story also predicted that the addition of the writing section would damage the SAT’s reliability. Reliability is a measure of how similar a test’s results are from one sitting to the next. Theoretically, if a test had a standard error of measurement of 0 points, you would score exactly the same each time you took it. But no test is that good. The pre-2005 SAT had a standard error of measurement of about 30 points for each section. In other words, if you got a 500 on the math section, your “true” score was anywhere between 470 and 530. But the new writing section, which includes not only a multiple-choice grammar segment but the subjective essay, has a standard error of measurement of 40 points, meaning a kid who gets a 760 on the section may actually be a perfect 800 or a clever-but-no-genius 720. In short, the College Board sacrificed some testing reliability in order to include writing.