(8 of 10)
Which isn't a bad thing if you want to encourage students to read great books (though some may object to a private group like the College Board deciding which books). But now you're measuring not just reading ability but also the achievement of having plowed through As I Lay Dying. At ETS, measuring anything beyond developed ability used to be considered noise that disrupted the clear sound of a score. Psychometricians try to screen out all kinds of noise--questions that ask about subways, for instance, could be excluded because rural kids may not be familiar with them. Questions showing even the vaguest bias are excised; you will never find a woman measuring cups of flour in an SAT question. The concern is that girls who read such a question will be distracted by the implicit sexism, and so their answer will reflect not their ability but their distraction--that's noise.
But other kinds of noise are now to be allowed. Take the writing section, which will be divided between multiple-choice questions on grammar and style, and an essay students must write on an assigned topic (see chart for an example). Historically, the SAT has had only multiple-choice items. As Lemann writes of the early rationale for the SAT, "Tests that require a student to write essays ... are highly susceptible to the subjective judgment of the grader and to the mood of the taker on the day of the test, so they have low reliability."
Reliability is a measure of a test's precision from one administration to the next--a gauge of how much noise, or measurement error, it has eliminated. The standard error of measurement for a typical SAT is about 30 points for the math section and 30 for the verbal. That's why the College Board tries to get students and admissions officers to think of scores not as pinpoints but as ranges: if you get 510 on the SAT's math section, your "true" score is anywhere between 480 and 540.
Thirty points in either direction is a pretty big swing, but scores on the writing section will be even less reliable: field trials of the New SAT estimate a standard error of measurement of 41 points. That means a kid who gets a 670 may "really" be in the elite reaches of the 700s--or in the more average environs of the low 600s. There are two reasons for the writing test's imprecision: first, the multiple-choice component of the test will be just 20 to 30 minutes, compared with 70 minutes each for math and reading. Less time means fewer questions, and it's harder to wring out measurement error with a small number of items. (Think about it this way: if you taste only one dish served by a chef, you can't judge him with as much precision as if you eat everything on the menu.) Even worse, each test will feature just one essay topic; if you retake the test and get a topic you really love, your score could shoot up--a clear example of low reliability.