Last night, however, I was pleasantly surprised to find a link via her blog to an extensive and rigorous discussion among NYC educators and journalists on the cleverly-disguised lowering of standards on New York state standardized exams. Diana Senechal, in particular, had an intruiging idea that one could simply conduct an "experiment" to see if it is indeed possible to pass the test by mere guessing.
You can see where this is going. She "guessed" her way through both the 6th-grade English exam and the 7th-grade Math exam, passing both times, albeit barely. I decided to repeat this little experiment of hers thousands of times, using the iterative power of my computer, to see if it was possible for many or even most 6th-grade students to pass their standardized, required-by-federal-law Language Arts Exam merely by guessing.Last week I read a thought-provoking column by Diane Ravitch in the New York Post, in which she discusses the lowering of the bar on New York State math and ELA tests. She points out that to reach level 2, which is sufficient for promotion in New York City, a student needs a significantly lower percentage of points than he or she would have needed three years ago. Ravitch comments. “Ending social promotion, as the city rightly wants to do, is thus meaningless, because students can reach Level 2 by just guessing.”
Likewise, Meredith Kolodner writes in the Daily News, “The number of correct answers needed to score a Level 2 to get promoted has sunk so low that a student can guess on the multiple choice section and leave the rest of the test blank.”
This is disturbing. Surely it isn’t possible to get a 2—and thus a promotion to the next grade—by just guessing! Or is it?
...
To find out, I conducted a little experiment. First, some background facts:
(a)Each question on each of the tests is worth a certain number of points. The total number of points earned on a given test is the raw score.
(b) Each test has its own conversion table for converting the raw score to a scale score.
(c) The conversion from scale score to proficiency level is different for each grade and subject (though 650 is the minimum for a level 3 across the board).
(d) Thus, to find out if a student got a 2 on a test, you have to (1) correct the test, (2) calculate the raw score, (3) convert it to scale score, and then (4) convert the scale score to proficiency level.The New York State Education Department website has all the tests, scoring keys, and tables you need.
Now follow along with me as I reproduce the experiment. My question was: is it possible to get a 2 by just guessing?
I wrote a program to replicate and then repeat Senechal's experiment on command. Like Senechal, I made sure that every question was answered randomly, without even looking at the questions or reading passages. There are a total of 26 multiple-choice questions, each with four possible answers (A, B, C, D), and a few essay questions at the end. Omitting any answer for the essays (and earning a zero in that section), I coded it so that the computer behaves exactly as Senechal would:
- All multiple choice questions are answered randomly. Python uses the Mersenne Twister random-number-generator, and I'm pretty damn confident in the "randomness" of its answers, given that it has a period (4.3 × 106001) greater than the estimated number of particles in the observable universe (1087).
- The exam is scored according to the state-issued answer key, giving what they've decided to call a "raw score."
- The raw score is converted to a "scale score" based on this chart.
- That scale score is then converted yet again to a "Performance Level" score, which ranges from 1 to 4.
- Steps 1-4 are repeated 10000 times. The scores generated are sorted according to their "Performance Level" score, then converted to percentages.
Level 1: 51.29%
Level 2: 48.71%
Level 3: 0%
Level 4: 0%
While I'm relieved to see that it is impossible for, say, a monkey (or perhaps an infant, making random stabs at a keyboard) to score a 3 or 4 on this exam, it is slightly disconcerting to me that in a group of 10000 students (or monkeys or infants), despite that they have all applied Senechal's faux-"method" on their exams, nearly 50% of them will pass anyway.
I had a bit of difficulty finding the exact number of sixth graders in the NYC public school system (where it is possible for a student to score a 2 and still be promoted to the next grade-level), but let's suppose that NYC's standards were applied across the entire state of NY, and, just for ghits and shiggles, that its 214,819 sixth-graders (source) made completely random stabs at their answer sheets just like above.
Results:
Level 1: 51.48%
Level 2: 48.52%
Level 3: 0%
Level 4: 0%
This took a bit longer to run, in part because of my mediocre coding skills, but the results were almost exactly the same.
I realize that in doing all of this I'm beating a dead horse, because one a posteriori demonstration (like Senechal's) is enough to show that the test is worthless as a measure of student proficiency. I'm also relieved to admit that "only" 20% of 6th-graders actually earned a 2 in 2009. Since only 0.1% of students earned a score of 1 in 2009, however, the bar is just barely low enough to insure that everyone is passed along to the next grade-level, regardless of proficiency.
Even that isn't my main point though. What really got my goat about this whole issue was the way that NY state educators actually take the results of these exams seriously, and compare statistics from year to year in order to plan their policies. They break down the data relentlessly, in powerpoint after powerpoint, comparing groups of passing/failing students across all possible demographics and categories. The don't appear to take the scale scores seriously either, which in my opinion are the closest thing in this test to an actual measure of proficiency, instead going on endlessly about differences in "Performance Levels."
The problem with these tests (or at least the one broken down here) is that because of the nature of the scoring system, guessing plays such an important role in how one fares on them. The "guesses" made by real students obviously aren't random, and their real "answers" (and therefore their proficiency in Language Arts) are being obscured behind a wall of scores (raw score, scaled score, performance level). Take away the last two steps in the scoring algorithm and you've probably got an OK test, or at least one that could be made to be okay.
0 comments:
Post a Comment