In which I fix everything (part 3 of 3)


For the last two days I’ve been griping about standardized tests, brought on by this article and Diane Ravitch’s reaction to it.  I hope I’ve adequately demonstrated that relying on a pass-no-pass model for determining effectiveness in schools is at least pointless and at most actively destructive.  I’ve also talked about the alternative to a pass model, which is a growth model, and offered some criticisms of how growth models, at least as they’re practiced in Indiana, tend to work.

Here, I’ll present an outline of how a growth model for standardized testing ought to work.

  • First, and most importantly:  remove any notion of “passing” and “failing” completely from the testing process.  The two most well-known standardized tests in America right now are the SAT and the ACT, the two tests for college readiness, taken by nearly every high school student at some point or another.  Even kids who don’t necessarily plan on going to college take at least one of those two tests, and many take both.  Have you ever heard of someone “failing” the SAT?  No.  Because it can’t be done.  You can get a terrible score on it, yes, but you can’t fail.  Your score is your score.  As it stands right now, creating a cutscore for “pass” and “fail” does the following:  1) It makes the test scores easier to manipulate (just change the cutscore and it looks like more kids passed– or that you’ve demonstrated “higher expectations”) 2) it puts an artificial, pointless barrier between kids who barely passed and kids who barely failed (there is no difference between a kid who got a 450 or a 46o; that’s a question or two.  It’s I-had-breakfast-and-eight-hours-of-sleep versus I-sorta-have-a-cold-today.  But if you put the pass cutscore at 455, it looks like a huge difference.  3) It embeds a shaming mechanism into the test that has no good reason for being there; 4) It creates an incentive for teachers to focus solely on the “bubble kids;” 5) It provides no useful information to anyone that the actual scores did not already provide.  There’s no reason for these tests to have a passing score. It is an entirely useless piece of information.  I can think of only one exception, which is when districts use test scores as part (PART!!!) of a decision on whether to pass a student from one grade to another.  Most districts don’t do that, though, since frequently scores aren’t available until very late in the year– it’s the second week of July already and I don’t know my kids’ scores yet.
  • Removing the notion of pass/fail from the equation makes it easier to focus on growth as the metric.  As I’ve demonstrated already, this means that you can’t exclude any of your kids as “unimportant” to your school’s or your classroom’s end-of-year scores.  How a student’s score changes from year to year becomes vastly more important than what their score actually is, which is as it should be.  There’s a bunch of ways to do this; Indiana’s model has some good points but is unnecessarily complicated.  Here’s my suggestion:
  • Pick a start year; any start year.   Divide those kids into groups based on percentile scores on the test.  I like using decile groups (in other words, ten) but you can use quintiles or quartiles or whatever.  In Year Two, determine how much those kids moved in their test scores from year one to year two.  There are a bunch of ways to quantify this depending on how mathy and technical you want to be about it; the simplest way is to determine movement by thirds.  In other words, let’s say the lower third of decile A went from a drop of 140 points to a gain of 10 points, the middle third went from a gain of 11 points to a gain of 90 points, and the top third went from a gain of 91 points to a gain of a million points.  You could use standardized deviations from the average or something else if you wanted, but the point is there’s a different standard based on your decile.  This means that the kids in the top decile (who don’t have a lot of movement up left for them) can only gain a few points or possibly even lose one or two and still be “high growth,” and kids who start in the low decile and drop anyway would probably be “low growth” kids.  This allows some recognition of where the kids started from without looking as random as Indiana’s model does, where a kid who got a 525’s growth model looks wildly different from a kid who got a 526; it should be a bit more predictable as well.
  • In Year Three, you determine how much they moved from Year Two, and so on.
  • Kids who transfer into a district aren’t a problem because they should have some sort of score from their previous district, and even if they were taking a different test in their previous district a percentile score on that test should be trivial to establish.  They then join whatever decile their percentile score belongs to.  If they literally took no standardized tests in their previous district because of their age or their district’s policy on standardized tests, well, the world doesn’t end.
  • Teachers and schools are evaluated by how many kids they have in the “average growth” and “high growth” categories.  Those kids should have been enrolled in the district for a certain minimum number of days (I’d say no less than 75% of the school days up to the test week) and– and this may be controversial– should have been present for a certain minimum number of days as well, and I’d say the absence number should be more stringent than the residence number.  I can’t teach a kid who isn’t in school, and I also can’t control whether a kid’s in my classroom or not.  Individual districts or states can determine on their own what their requirements for average growth and high growth numbers should be.

One disadvantage of this is that it does make it more difficult to present school data to the public in an easy-to-understand, useful format.  One big advantage of the pass rate is that parents understand it; moving from 50% pass to 52% pass has a clear meaning, while we’d have to present averages and medians and all sorts of other data to make the new model understandable when we’re comparing schools.  That said, if you want a “one number” comparison, providing the sum of the “high growth” and the “average growth” kids would do nicely; giving all three, combined with averages and medians of actual scores, would provide sufficient information, and anybody who wants to dig deeper (provide numbers per decile, too, maybe) is welcome to.

It’s not great– we’re still paying too much attention to standardized test scores– but it’s certainly better than what we’re doing now.  Feel free to comment (Please!  Comment!) with suggestions and questions.

Be prepared, by the way, for me to find something utterly irrelevant to gripe about tomorrow.

Published by

Luther M. Siler

Teacher, writer of words, and local curmudgeon. Enthusiastically profane. Occasionally hostile.