Monday, October 9, 2017

Complex Models and Control Files: From the Desk of a Psychometrician

We're getting ready to send out a job analysis survey, part of our content validation study. In the meantime, I'm working on preparing control files to analyze the data when we get it back. I won't be running the analysis for a couple weeks, but the model I'll be using is complex enough (in part because I added in some nerdy research questions to help determine best practices for these types of surveys), I decided to start thinking about it now.

I realize there's a lot of information to unpack in that first paragraph. Without going into too much detail, here's a bit of background. We analyze survey data using the Rasch model. This model assumes that an individual's response to an item is a function of his/her ability level and the difficulty level of the item itself. For this kind of measure, where we're asking people to rate items on a scale, we're not measuring ability; rather, we're measuring a trait - an individual's proclivity toward a job task. In this arrangement, items are not difficult/easy but more common/less common, or more important/less important, and so on. The analysis gives us probabilities that people at different ability (trait) levels will respond to an item in a certain way:

It's common for job analysis surveys to use multiple rating scales on the same set of items, such as having respondents go through and rate items on how frequently they perform them, and then go through again and rate how important it is to complete a task correctly. For this kind of model, we use a Rasch Facets model. A facet is something that affects responses to an item. Technically, any Rasch model is a facets model; in a basic Rasch model, there are two facets: respondents (and their ability/trait level) and items. When you're using multiple rating scales, scale is a facet.

And because I'm a nerd, I decided to add another facet: rating scale order. The reason we have people rate with one scale then go through and rate with the second (instead of seeing both at once) is so that people are less likely anchor responses on one scale to responses on another scale. That is, if I rate an item as very frequent, I might also view it as more important when viewing both scales than I would have had I used the scales in isolation. But I wonder if there still might be some anchoring effects. So I decided to counterbalance. Half of respondents will get one scale first, and the other half will get the other scale first. I can analyze this facet to see if it affected responses.

This means we have 4 facets, right? Items, respondents, rating scale, and order. Well, here's the more complex part. We have two different versions of the frequency scale: one for tasks that happen a lot (and assess daily frequency) and one for less common tasks (that assess weekly/monthly frequency). All items use the same importance scale. The two frequency scales have the same number of categories, but because we may need to collapse categories during the analysis phase, it's possible that we'll end up with two very different scales. So I need to factor in that, for one scale, half of items share one common response structure and the other half share the other common response structure, but for the other scale, all items share a common response structure.

I'm working on figuring out how to express that in the control file, which is a text file used by Rasch software to describe all the aspects of the model and analysis. It's similar to any syntax file for statistics software: there's a specific format needed for the program to read the file and run the specified analysis. I've spent the morning digging through help files and articles, and I think I'm getting closer to having a complete control file that should run the analysis I need.


  1. This seems very clever to me. But I have to admit I'll need to study it more to fully understand it. This reminds me of ROC signal detection...yes?

    One thought I had was whether analyses done with this level of sophistication produce results different from a more basic analysis.

    1. It's a great question. Sometimes complex isn't better; it's just complex. Using two scales is pretty standard for our job analysis surveys. We want to make sure we're testing on concepts that are essential for the job, and that could be measured by frequency of that task (and hence frequency one draws upon knowledge needed to perform the task), importance (and hence importance of knowing the right way to do it), or a combination of both. In healthcare, we know there are some tasks that are frequent but the price of doing it wrong isn't very high (for instance, choosing between two different materials used for a tooth cleaning). On the other hand, some tasks are infrequent, but it's very important to approach it correctly (adverse reactions, for example). Facets lets us combine those scales, which we then use to generate weights - that tells us how much of the test should be on certain concepts. Rasch gives us item-level statistics that help generate these weights.

      As for the additional counterbalancing variable I added, it probably isn't necessary. It's really more curiosity on my sake: does order affect how people respond to the items? My hope is it does not, and then we can go back to the less complex model.

  2. I did a counterbalancing of scales here, to great effect.

    I remember being taught in the pre-desktop computer age to always counterbalance. That seems a rare methodological control these days.

    Scotti, J. R., Slack, B. S., Bowman, R. A. and Morris, T. L. 1996. College students' attitudes concerning the sexuality of persons with mental retardation: Development of the Perceptions of Sexuality Scale.. Sexuality and Disability, 14: 249–63. [Crossref], [Web of Science ®]
    , [Google Scholar]