Monday, April 3, 2017

B is for Beta

So we're moving forward with blogging A to Z. Today's topic is really following up from Saturday, where we talked about alpha. Perhaps you were wondering, "If alpha is equal to Type I error, where you say there's something there when there really isn't, does that mean there's also a chance you'll say there's nothing there when there really is?" Yes, and that's what we're talking about today: Beta. Also known as your Type II error rate.

Continuing with the horror movie analogy, this is when you walk in the room and walk right by the killer hiding in the corner without even seeing him. He was right there and you missed him! Once again, you don't ever get any feedback that you missed it by ending up dead later on. So it won't necessarily kill you to commit a Type II error but at the very least, you're missing out on information that might be helpful.

Unlike alpha, which we set ahead of time, we can only approximate beta. We can also do things to minimize beta: use really strong interventions (dosage), minimize any diffusion of the treatment into our control group, and select good measures, to name a few. In fact, if you want to learn more, check out this article on internal validity, which identifies some of the various threats to internal validity (the ability of our study to show that our independent variable causes our dependent variable).

To use my caffeine study example from Saturday, I would want to use a strong dose of caffeine with my experimental group. I would also make sure they haven't had any caffeine before they came in to do the study, and if I can't control that, I would at least want to measure it. I would also probably keep my experimental and control groups separate from each other, to keep them from getting wise to the differences in the coffee. And I would want to use a test that my participants have not taken before.

There's also a way you can minimize beta directly, by maximizing its inverse: 1-β or Power. Power is the probability that you will find an effect if it is there. We usually want that value to be at least 0.8, meaning 80% probability that you will find an effect if there's one to be found. If you know something about the thing you're studying - that is, other studies have already been performed - you can use the results of those studies to estimate the size of the effect you'll probably find in your study. In my caffeine study, the effect I'm looking for is the difference between the experimental and control group, in this case a difference between two means (averages). There are different metrics (that I'll hopefully get to this month) that reflect the magnitude of the difference between two groups, metrics that take into account not only the difference between the two means but also how spread out the scores are in the groups.

Using that information from previous studies, I can then do what's called a power analysis. If you're doing a power analysis before you conduct a study (what we would call a priori), you'll probably use that power analysis to tell you how many people you should have in your study. Obviously, having more people in your study is better because more people will get you close to the population value you're trying to estimate (don't worry, I'll go into more detail about this aspect later). But you can't get everyone in your study, nor would you want to spend the time and money to keep collecting data - studies would never end! So an a priori power analysis helps you figure out what resources you need while also helping you feel confident that, if there's an effect to be found, you'll find it.

Of course, you might be studying something completely new. For instance, when I studied Facebook use, rumination, and health outcomes, there was very little research on these different relationships - there's more now, of course. What I did in those cases was to pick the smallest effect I was interested in seeing. Basically, what is the smallest effect that is just big enough to be meaningful or important? In this case, we're not only using statistical information to make a decision; we're also potentially using clinical judgment.

For instance, one of my health measures was depression: how big of a difference in depression is enough for us to say, "This is important"? If we see that using Facebook increases depression scores by that much, then we have a legitimate reason to be concerned. That's what I did for each of the outcomes, when I didn't have any information from previous studies to guide me. Then I used that information to help me figure out how many people I needed for my study.

Power analysis sounds intimidating, but it actually isn't. A really simple power analysis can be done using a few tables. (Yep, other people have done the math for you.) More complex power analyses (when you're using more complex statistical analysis) can be conducted with software. And there are lots of people out there willing to help you. Besides, isn't doing something slightly intimidating better than doing a study without actually knowing you're going to find anything?

No comments:

Post a Comment