In a past post, I talked about content validation studies, a big part of my job. Today, I'm going to give a quick overview of standard setting, another big part of my job, and an important step in many testing applications.
In any kind of ability testing application, items will be written with identified correct and incorrect answers. This means you can generate overall scores for your examinees, whether the raw score is simply the number of correct answers or generated with some kind of item response theory/Rasch model. But what isn't necessarily obvious is how to use those scores to categorize candidates and, in credentialing and similar applications, who should pass and who should fail.
This is the purpose of standard setting: to identify cut scores for different categories, such as pass/fail, basic/proficient/advanced, and so on.
There are many different methods for conducting standard setting. Overall, approaches can be thought of as item-based or holistic/test-based.
For item-based methods, standard setting committee members go through each item and categorize it in some way (the precise way depends on which method is being used). For instance, they may categorize it as basic, proficient, or advanced, or they may generate the likelihood that a minimally qualified candidate (i.e., the person who should pass) would get it right.
For holistic/test-based methods, committee members make decisions about cut scores within the context of the whole test. Holistic/test-based methods still require review of the entire exam, but don't require individual judgments about each item. For instance, committee members may have a booklet containing all items in order of difficulty (based on pretest data), and place a bookmark at the item that reflects the transition from proficient to advanced or from fail to pass.
The importance of standard setting comes down to defensibility. In licensure, for instance, failing a test may mean being unable to work in one's field at all. For this reason, definitions of who should pass and who should fail (in terms of knowledge, skills, and abilities) should be very strong and clearly tied to exam scores. And licensure and credentialing organizations are frequently required to prove, in a court of law, that their standards are fair, rigorously derived, and meaningful.
For my friends and readers in academic settings, this step may seem unnecessary. After all, you can easily categorize students into A, B, C, D, and F with the percentage of items correct. But this is simply a standard (that is, the cut score for pass/fail is 60%), set at some point in the past, and applied through academia.
I'm currently working on a chapter on standard setting with my boss and a coworker. And for anyone wanting to learn more about standard setting, two great books are Cizek and Bunch's Standard Setting and Zieky, Perie, and Livingston's Cut Scores.