Sunday, April 9, 2017

G is for Goodness of Fit

Statistics is all about probability. We choose appropriate cut-off values and make decisions based on probability. Usually, when we conduct a statistical test, we're seeing whether two values are different from each other - so different, in fact, that the difference is unlikely to have occurred just by chance.

But other times, we may have expectations about how the data are supposed to look. Our goal then is to test whether the data look that way - or at least, close enough within a small margin of error. In that case, you would use statistical analysis to tell you how well your data fit what you would expect it to look like. In this case, you're assessing what's called "goodness of fit."

This statistic is frequently used for model-based analyses. For my job, I use a measurement model called Rasch. When Georg Rasch developed his model, he figured out the mathematical characteristics a good measure should have. Part of our analysis is to test how well the data we collected fit the model. If our data fit the model well, we know it has all those good characteristics Rasch outlined.

I also use this for structural equation modeling, where I develop a model for how I think my variables fit together. SEMs are great for studying complex relationships, like causal chains or whether your survey questions measure one underlying concept. For instance, I could build on my caffeine study by examining different variables that explain (mediate) the effect of caffeine on test performance. Maybe caffeine makes you feel more focused, so you pay more attention to the test questions and get more correct. Maybe caffeine works better at improving performance for certain kinds of test questions but not others. Maybe these effects are different depending on your gender or age (we call this moderation - when a variable changes the type of relationship between your independent and dependent variables). I could test all of this at the same time with a SEM. And my overall goodness of fit could tell me how well the model I prescribe fits the data I collected.

When testing goodness of fit, this is one of those times statisticians do not want to see significant results. A significant test here would mean the data don't fit the model. That is, the data are more different from the model you created than you would expect by chance alone, which we take to mean we have created a poor (incorrect) model. (Obviously, Type I and II errors are at play here.) I've been to multiple conferences where I've seen presenters cheer about their significant results on their SEM, not realizing that they want the opposite for goodness of fit statistics.

Side note - our G post is directly related to the F post. The default estimation method for SEM is Maximum Likelihood estimation, which we have thanks to Fisher!