## Sunday, July 30, 2017

### Statistics Sunday: Fixed versus Random Effects

As I've said many times, statistics is about explaining variance. You'll never be able to explain every ounce of variance (unless you, say, create a regression model with the same number of predictors as there are cases; then you explain all variance, but fail to make any generalizable inferences). Some variance just can't be explained except as measurement or sampling error. But...

That is, it's possible for you to have two variance components, and attempt to partition variance that appears random and variance that appears to be systematic - it has some cause that is simply an unmeasured variable (or set of variables). This is where random effects models come into play.

You may not have heard these terms in statistics classes, but you've likely done fixed effects analysis without even realizing it. Fixed effects models deal specifically with testing the effect of one or more independent variables - variables you've operationally defined and manipulated or measured. When you conduct a simple linear regression, you have your constant (the average on the Y variable when your X = 0), your (fixed effect) slope, and your (fixed effect) error. The effect you're testing is known.

But there are many other variables out there that you may not have measured. A random effects model attempts to partition variance, by seeing what residual variance in cases appear to be meaningful (that is, there are common patterns) and what appears to be just noise.

Often, we use a combination of the two, called a mixed effects model. This means we include predictors to explain as much variance as we can, then add in the random effects component, which will generate an additional variance term. It has the added bonus of making your results more generalizable, including to cases unlike the ones you included in your study. In fact, I mostly work with mixed and random effects models in meta-analysis, which add an additional variance component when generating the average effect size. In meta-analysis, a mixed effects model is used when you have strong justification that there isn't actually one true effect size, but a family or range of effect sizes, that depend on characteristics of the study. The results then include, not just an average effect size and confidence interval for that point estimate, but a prediction interval, which gives the range of possible true effect sizes. And this is actually a pretty easy justification to make.

Why wouldn't you use random effects all the time? Because it isn't always indicated, and it comes with some big drawbacks. First, this residual, random effects variance can't be correlated with any predictors you may have in the model. If that happens, you don't really have a good case for including the random effects component. The variance is related to the known predictors, not the unknown random effects variance. You're better off using a fixed effects model. And while random effects models can be easily justified, fixed effects models are easier to explain and interpret.

Additionally, the random (and mixed) effects models are more generalizable in part because they generate much wider confidence intervals. And of course, the wider the confidence interval, the more likely it is to include the actual population value you're estimating. But the wider the confidence interval, the less useful it is. There's a balance between being exhaustive and being informative. A confidence interval that includes the entire range of possible values will certainly include the actual population value. But it tells you very little.

Finally, a random effects model can reduce your power (and by the way, you need lots of cases to make this analysis work), and adding more cases - which increases power in fixed effects models - may actually decrease power (or even have no effect) because it adds more variance and also increases the size of confidence intervals. This may make it more difficult to show a value is significantly different from 0, even if the actual population value is. But as is always the case in statistics, you're estimating a value that is unknown with data that (you have little way of knowing) may be seriously flawed.

Hmm, that may need to be the subtitle of the statistics book I'm writing: Statistics: Estimating the Unknown with the Seriously Flawed.