Deeply Trivial: On Standardized Testing, Educational Policy, and Statistics

Earlier today, I encountered the following article on Facebook: I Can’t Answer These Texas Standardized Test Questions About My Own Poems. The article is written by Sara Holbrook, a poet and educator who learned that two of her poems were being used in the Texas STAAR assessments and who had issues with the questions generated to accompany her poems. I understand and agree with many of the concerns she poses in her article, but at the same time, had some pretty strong reactions to things she said in the article that I, as someone who works in test development, found to be completely inaccurate.

But first, a caveat: the Texas STAAR assessments are developed and administered by Pearson. I do not work for Pearson, I don't know anything about how they do things, and Holbrook's criticisms of Pearson may be completely valid. All I know is my experience working for another company that sells similar assessments, a company that could be considered a competitor to Pearson. I also need to stress that I can't divulge anything that would be considered privileged company knowledge.

First off, Holbrook discusses the "lack of experience" among test scorers, who are, as she says, "routinely hired from ads on (where else?), Craiglist, [and] also receive scant training." She links to an article written by one such test scorer. The thing is, she's discussing the closed-ended questions about her own poems while also discussing the lack of experience by test scorers who (and this is even clearly stated in the article to which she linked) score the open-ended written responses. These are two different tests. The questions she takes issue with are responded to with good old bubble sheets and are electronically scored.

Second, she discusses research that found test scores could be predicted using demographic data, and links to another HuffPo article which states:

Tienken et. al. have demonstrated that we do not need to actually give the Common Core-linked Big Standardized Test in order to generate the “student achievement” data, because we can generate the same data by looking at demographic information all by itself.

Tienken and his team used just three pieces of demographic data—

1) percentage of families in the community with income over $200K
2) percentage of people in the community in poverty
3) percentage of people in community with bachelor’s degrees

Using that data alone, Tienken was able to predict school district test results accurately in most cases. In New Jersey 300 or so middle schools, the team could predict middle school math and language arts test scores for well over two thirds of the schools.

Assessment serves many purposes but we could simplify it down to two:

1. How well is a school doing overall and/or is a program effective overall?
2. How well is a particular student doing?

Tienken and colleague's study looked at predicting overall performance at schools, and it looks like they were able to do so fairly accurate for over two-thirds of the schools. But that's not all of them, and when current educational policies base things like funding off of school performance, two-thirds accurate is probably not going to be acceptable for the remaining schools, especially if the prediction equation underestimates how well they are doing.

But even more importantly, this study highlights the difference between an idiographic approach (understanding and predicting for people in general) and a nomothetic approach (understanding and predicting for one particular person). If you want to understand the performance of a single student and use that information to, say, decide if they need an individualized education program (either because they are way below or way above grade level in their abilities), you need that student's actual data. That means conducting rigorous measurement of that particular case, because what is true in general may not be true for one particular person. In statistics, which is generally focused on "people in general," we call that error. But for understanding an individual, it may be error or it may be something else.

Third, but strongly related to number two, is that Holbrook insists test developers are not going to let go of Common Core because it makes them money:

When I heard the campaign promises to eliminate the Common Core made by Donald Trump, I thought, yeah, right. Wait until someone educates him on how much money is being made making kids miserable with these useless tests. Talk is cheap. School testing is big bucks, and those testers are not going down without a fight.

Yes, these assessments are conducted to examine the effectiveness of Common Core while also examining school performance to make sure they are abiding by Common Core, and for decisions about funding and so on. But Common Core is not the only educational policy, nor is the only reason these assessments are conducted. If Common Core goes away, it would be replaced with some other educational policy that is going to require, you guessed it, assessment. In fact, any program will require assessment, because we need to measure performance to demonstrate that a program is working (or not working). This is the nature of program evaluation. We can't just ask people how well they think they know something, or simulate data based on demographics, which is essentially what Holbrook and Peter Greene (who wrote the other article linked above) are arguing for. We need real data.

When I worked for Chicago Public Schools, we did a lot of work with assessment data. BTW, this was pre-Common Core. The assessments we worked with then were for other policies, like No Child Left Behind. And when I was in public school in the 1980's and 1990's, we regularly took standardized assessments, like the Iowa Assessments. And in my case, my scores on those Iowa Assessments were what resulted in me being tested, using a different kind of standardized test (an individually administered cognitive ability test, specifically the Wechsler Intelligence Scale for Children), for the gifted program. Assessment is a big part of education and educational policy, and in addition to group administered standardized tests, schools will keep using individually administered tests. These companies are going to do just fine, even without Common Core. So if Common Core needs to go away (and personally, I think it at least needs to be changed, perhaps dropped altogether), it will. It will just be replaced with something else.

And in fact, even if the national Department of Education were completely gutted, as some fear may happen in the new administration, we'd still have assessments required by state and local boards of education. That's right - some of the assessments taken by your students are required at the state- or local-level, not the federal-level.

I can understand Holbrook's arguments, especially understanding her perspective as an educator who sees these assessments are taking away from valuable classroom time. And I agree with that perspective. Yes, I work in test development and I think we may be over-testing our students. But some of her conclusions are based on inaccuracies and myths more than fact.

Deeply Trivial

Friday, January 6, 2017

On Standardized Testing, Educational Policy, and Statistics

No comments:

Post a Comment