Deeply Trivial: A is for Alpha

So for today, the letter A, I thought I'd start at the very beginning because Julie Andrews says it's a very good place to start. And that's with alpha, the first letter of the Greek alphabet.

Alpha is a really important concept in statistics. Statistics is all about probability. I know that sounds kind of obvious. If you've taken a stats class or even a section on statistics in another math class, they probably spent a lot of time talking to you about probability. In the past, they did that hands in poker. The downfall of teaching it that way is you have to teach people poker so they understand what hands are better than other hands. I guess that makes dorm parties a little more interesting. More recently, I've seen them use combinations of dice or coin flips. But I don't think that people understand the connection of why is it that we're really hammering down this concept of probability, and the probability of different combinations, or that some outcomes are more or less likely than others.

That's because any statistical inference we make - and by statistical inference I mean any decision we make based on the results of statistical tests - is based entirely on probability. It's the probability that the outcome we saw was unlikely - so unlikely that we can conclude it had to be because of a real difference rather than just chance.

So let's go with a concrete example. Let's say I do an experiment, and I want to test the effect of caffeine consumption on test performance. I think having caffeine before you take a test is going to give you a better grade than no caffeine (my hypothesis). This exact study has probably been done hundreds or thousands of times. I've got my experimental group that I have drink caffeinated coffee, and my control group that I force to drink decaf, because I'm a mean person - but sometimes it's necessary for science. And then I give them a test and see how well they did. I expect that, on average, the experimental group will do better. Now, I can't just look and go, "Hey, the experimental group has a higher score!" It would be really really unlikely for the groups to have the exact same average, but how much different do they need to be before we would say there is a real difference? We would set a cut-off, a critical value - if the difference is at least this big, we will conclude that there is a real difference between my two groups and not just random chance.

And the way that we set that critical value is with probability. We go with a difference that would have a small chance of happening by random luck alone. And that is alpha. We set that ahead of time. The most common alpha - the convention - is 0.05, which means that the critical value is based on a difference we would have only a 5% chance of seeing if we just had random luck operating - if there wasn't a true difference between caffeine consumers and non-caffeine consumers.

But that's 5% - that's not 0! There's still a chance that we'll find a difference that isn't real; it's not because of the caffeine, it's just luck. We accept that. We know that's a possibility. We can't really know from a single study whether there is a real effect of caffeine, or whether we've fallen in that 5%. In fact, when we fall in that 5%, we've made what's called a Type I error. We're saying, "Hey, there's something here!" when really there's nothing.

You know in horror movies, which I blog about a lot, someone walks in a room and we think the murderer is going to jump out at them, and instead it's just a cat? They jump, we jump, everyone freaks out. It's like that. We reacted like it was the murderer but it was just the cat.

That's Type I error. Except in this case, we don't have the immediate feedback of seeing the cat or not being dead to know we didn't actually find something real. All we know is we found something and it made us jump. We don't know if we should have jumped or not. This probably why I hate jump scares in horror movies. I'm traumatized by the possibility of Type I error.

So does that mean there are a bunch of studies out there that have found results that are just bogus, that are just Type I error? Yes, it does. Because with an alpha of 0.05, that means if there is no real effect, and you do that study 100 times, 5 of them will probably come out significant. And considering that we have this thing called publication bias, where studies that find significant results are more likely to published, there's a whole lot of type I error floating around out there. This is why replication is so important and needs to be encouraged. And why we need to stop publication bias. And we also need to stop something I've blogged about before called p-hacking.

P-hacking is directly related to alpha. When you have a huge dataset, tons of variables, and you just run analyses willy nilly, looking for a significant result, you're dramatically increasing the chance that you'll commit a Type I error, that at least one of those results will be significant just by chance. If you have an alpha of 0.05 each time you run a test, those probabilities add up. Because even if there's no relationship between two variables, there's a 5% chance you'll find one anyway. And that's just for 1 test. If you run 2 tests, it's 10%, 3 tests, 15%, and so on and so on. If you run 20 tests, 1 of those is probably going to be significant just by luck. If you 100 tests, 5 will probably be significant just by chance.

And if you don't tell people, "Oh, by the way, we just ran a shit-ton of tests, and only reported the few that were significant," they might not realize how much you inflated your Type I error rate. So this is why you shouldn't do a bunch of a tests, and if you're going to do a whole bunch of tests you should, plan them ahead of time, and do a correction to your alpha. It's 5% each shot. So if you're doing 10 tests, you've inflated that to 50%. So you should instead take that .05 and divide it by the number of tests you're going to do.

If you're doing 10 tests, your alpha for each test is .05/10. If you want to learn a new term to impress your friends, that is called a Bonferoni correction. It's named after a guy named Bonferoni, who came up with it. I can kind of understand wanting to name something after yourself, especially with a name like Bonferoni, because I bet he got made fun of for that name (people still make fun of it), and the best way to get back at the haters is to make them say your name with a little bit of respect. I can get behind that.

Deeply Trivial

Saturday, April 1, 2017

A is for Alpha

1 comment: