Saturday, July 15, 2017

Statistics Sunday Prelude: What's Bayes' Got to Do With It?

Note: This is the first of a two-part post. Check out part 2 here.

As I mentioned in Wednesday's Statistical Sins post, I'm working on a post about how Bayes' theorem can demonstrate that Type I error is actually much higher than alpha. We know that increasing the number of tests or comparisons we make in the data biases us to commit the Type I error. But Bayes' can show us that even if we're conducting only one test, the probability of a Type I error can be high - really high.

In fact, the post is already written in the form of copious notes in my favorite notebook (which I'm using for writing toward my planned statistics book):


and an R markdown file. That's right - I have code and figures. But as I started looking over everything I've written, I realized the post will be pretty long. So today, I'm writing an explanation to set up for tomorrow's demonstration.

As a reminder, the concept of alpha (and beta for that matter) is key to a statistical approach called Null Hypothesis Significance Testing (NHST). When we set our alpha at 0.05, we're accepting that there is a 5% chance that we will conclude there is a real effect in our data when actually the null hypothesis, which says there is no effect, is true. This is Type I error. We conduct power analyses to maximize the chance that we will find a significant effect - usually we set power to 0.80, so 80% chance that, if there's an effect to find, we'll find it. But that means there's a 20% chance that we'll fail to reject the null hypothesis when it is actually false. This is Type II error.

We don't always know if we've committed a Type I or Type II error. We don't have a gameshow host to buzz us if we got the wrong answer. We just have to keep studying something in different ways and over time, we can build up results to determine once and for all which is true: the null hypothesis or the alternative hypothesis. After all, if we conduct all of our studies with an alpha of 0.05, we'll know that a body of literature is wrong if only 5% of the studies find significant results, right?

Now's the time I pull the rug out from under you. Because the Type I error rate can be much higher than 0.05.

Why? Because Bayes' theorem.

Type I error rate is the probability that something is not true given it is significant (in probability terms: P(Tc|S), where T = true, Tc = not true, and S = significant) - a false positive. This is different from alpha, which is the probability that something is significant given it is not true, or P(S|Tc).

(Yes, I know I'm contradicting a lot of statistical teaching, because everyone always says Type I error and alpha are the same thing. I said it too. I've since changed my mind, and I'm arguing that they're related but not quite the same thing. After all, NHST doesn't care about conditional probabilities in the same ways Bayesian approaches do. I accept that I could be completely off-base with using the terms in these ways, but I think my understanding of the conditional probabilities involved is sound.)

Beta, on the other hand, is the probability that something is not significant given it is true, or P(Sc|T), where Sc = not significant.

When you plug these values into Bayes' theorem, you'll find Type I error can skyrocket in the right conditions.

What are those conditions? Check back tomorrow to find out!

No comments:

Post a Comment