Tuesday, April 2, 2019

B is for Bank

As I alluded to yesterday, in Rasch, every item gets a difficulty and every person taking (some set of) those items gets an ability. They're both on the same scale, so you can compare the two, and determine which item is right for the person based on their ability and which ability is right for the person based on how they perform on the items. Rasch uses a form of maximum likelihood estimation to create these values, so it goes back and forth with different values until it gets a set of item difficulties and person abilities that fit the data.

Once you have a set of items that have been thoroughly tested and have item difficulties, you can begin building an item bank. A bank could be all the items on a given test (so the measure itself is the bank), but usually, when we refer to a bank, we're talking about a large pool of items from which to draw for a particular administration. This is how computer adaptive tests work. No person is going to see every since item in the bank.

Maintaining an item bank is a little like being a project manager. You want to be very organized and make sure you include all of the important information about the items. Item difficulty is one of the statistics you'll want to save about the items in your bank. When you administer a computer adaptive test, the difficulty of the item the person receives next is based on whether they got the previous item correct or not. If they got the item right, they get a harder item. If they got it wrong, they get an easier item. They keep going back and forth like this until we've administered enough items to be able to estimate that person's ability.

You want to maintain the text of the item: the stem, the response options, and which option is correct (the key). The bank should also include what topics each item covers. You'll have different items that cover different topics. On ability tests, those are the different areas people should know on that topic. An algebra test might have topics like linear equations, quadratic equations, word problems, and so on.

With adaptive tests, you'll want to note which items are enemies of each other. These are items that are so similar (they cover the same topic and may even have the same response options) that you'd never want to administer both in the same test. Not only do enemy items give you no new information on that person's ability to answer that type of question, they may even provide clues about each other that leads someone to the correct answer. This is bad, because the ability estimate based on this performance won't be valid - the ability you get becomes a measure of how savvy the test taker is, rather than their actual ability on the topic of the test.

On the flip side, there might be items that go together, such as a set of items that all deal with the same reading passage. So if an examinee randomly receives a certain passage of text, you'd want to make sure they then receive all items associated with that passage.

Your bank might also include items you're testing out. While some test developers will pretest items through a special event, where examinees are only receiving new items with no item difficulties, once a test has been developed, new items are usually mixed in with old items to gather item difficulty data, and to calibrate the difficulties of the new items based on difficulties of known items. (I'll talk more about this when I talk about equating.) The new items are pretest and considered unscored - you don't want performance on them to affect the person's score. The old items are operational and scored. Some tests put all pretest items in a single section; this is how the GRE does it. Others will mix them in throughout the test.

Over time, items are seen more and more. You might reach a point where an item has been seen so much that you have to assume it's been compromised - shared between examinees. You'll want to track how long an item has been in the bank and how many times it's been seen by examinees. In this case, you'll retire an item. You'll often keep retired items in the bank, just in case you want to reuse or revisit them later. Of course, if the item tests on something that is no longer relevant to practice, it may be removed from the bank completely.

Basically, your bank holds all items that could or should appear on a particular test. If you've ever been a teacher or professor and received supplementary materials with a test book, you may have gotten items that go along with the text. These are item banks. As you look through them, you might notice similarly worded items (enemy items), topic in the textbook covered, and so on. Sadly, these types of banks don't tend to have item difficulties attached to them. (Wouldn't that be nice, though? Even if you're not going to Rasch grade your students, you could make sure you have a mix of easy, medium, and hard items. But these banks don't tend to have been pretested, a fact I learned when I applied for a job to write many of these supplementary materials for textbooks.)

If you're in the licensing or credentialing world, you probably also need to track references for each item. And this will usually need to be a very specific reference, down to the page of a textbook or industry document. You may be called upon - possibly in a deposition or court - to prove that a particular item is part of practice, meaning you have to demonstrate where this item comes from and how it is required knowledge for the field. Believe me, people do challenge particular items. In the comments section at the end of a computer adaptive test, people will often reference specific items. It's rare to have to defend an item to someone, but when you do, you want to make certain you have all the information you need in one place.

Most of the examples I've used in this post have been for ability tests, and that's traditionally where we've seen item banks. But there have been some movements in testing to use banking for trait and similar tests. If I'm administering a measure of mobility to a person with a spinal cord injury, I may want to select certain items based on where that person falls on mobility. Spinal cord injury can range in severity, so while one person with a spinal cord injury might be wheelchair-bound, another might be able to walk short distances with the help of a walker (especially if they have a partial injury, meaning some signals are still getting through). You could have an item bank so that these two patients get different items; the person who is entirely wheelchair-bound wouldn't get any items about their ability to walk, while the person with the partial injury would. The computer adaptive test would just need a starting item to figure out approximately where the person falls on the continuum of mobility, then direct them to the questions most relevant to their area of the continuum.

Tomorrow, we'll talk more about the rating scale model of Rasch, and how we look at category function!


No comments:

Post a Comment