Tuesday, April 9, 2019

H is for How to Set Up Your Data File

The exact way you set up your data of course depends on the exact software you use. But my focus today is to give things to think about if/when setting up your data for Rasch analysis.

First, know how your software needs you to format missing values. Many programs will let you simply leave a blank space or cell. Winsteps is fine with a blank space to notate a missing value or skipped question. Facets, on the other hand, will flip out at a blank space and needs a missing value set up (usually I use 9).

Second, ordering of the file is very important, especially if you're working with data from a computer adaptive test, meaning missing values is also important. When someone takes a computer adaptive test, their first item is drawn at random from a set of moderately difficult items. The difficulty of the next item depends on how they did on the first item, but even so, the item is randomly drawn from a set or range of items. So when you set up your data file, you need to be certain that all people who responded to a specific item have that response in the same column (not necessarily where the item was administered numerically in the exam).

This why you need to be meticulously organized with your item bank and give each item an identifier. When you assemble responses for computer adaptive tests, you'll need to reorder people's responses. That is, you'll set up an order for every item in the bank by identifier. When data are compiled, their responses are put in that order, and if a particular item in the bank wasn't administered, there would be a space or missing value there.

Third, be sure you differentiate between item variables and other variables, like person identifiers, demographics, and so on. Once again, know your software. You may find that a piece of software just runs an entire dataset as though all variables are items, meaning you'll get weird results if you have a demographic variable mixed in. Others might let you select certain variables for the analysis and/or categorize variables as items and non-items.

I tend to keep a version of my item set in Excel, with a single variable at the beginning with participant ID number. Excel is really easy to import into most software, and I can simply delete the first column if a particular program doesn't allow non-item variables. If I drop any items (which I'll talk more about tomorrow), I do it from this dataset. A larger dataset, with all items, demographic variables, and so on is kept usually in SPSS, since that's the preferred software at my company (I'm primarily an R user, but I'm the only one and R can read SPSS files directly) in case I ever need to pull in any additional variables for group comparisons. This dataset is essentially the master and any smaller files I need are built from it.


No comments:

Post a Comment