Here's a more concrete example. Imagine you're going to a movie with three friends. You buy your tickets, get your popcorn and sodas, and go into the theatre. You turn to your friends to ask where they'd like to sit.
The first says, "The back. You don't have anyone behind you kicking your seat and you can see the whole screen no matter what."
"No," the second friend says, "that's way too far away. I want to sit in the front row. No one tall in front of you to block your view, and you can look up and see the actors larger-than-life."
|And you can be like these guys|
"You're kidding, right?" asks the third friend. "And deal with the pain in neck from looking up the whole time? No, thank you. I want to sit farther back, but not all the way in the back row. The middle is the best place to sit."
How do you solve this dilemma? With research, of course! (Why? How do you solve arguments between friends?) You could pass out a survey to movie goers to see who has the best experience of the movie based on where they sit - front, middle, or back. But now you want to see which group, on average, has the best experience. You know a t-test will let you compare two groups, but how do you compare three groups?
Yes, you could do three t-tests: front v. middle, front v. back, and middle v. back. But remember that you inflate your Type I error with each statistical test you conduct. You could correct your alpha for multiple comparisons, but you also increase your probability of Type II error doing that. As with so many issues in statistics, there's a better way.
Enter the analysis of variance, also known as ANOVA. This lets you test more than two means. And it does it, much like the t-test, by examining deviation from the mean. In any statistical situation, the expected value is the mean - in this case, it's what we call the grand mean, the mean across all 3+ groups. If seating location makes no difference, we would expect all three groups to share the same mean; that is, the grand mean would be the best descriptor for everyone. We're testing the statistical hypothesis that the grand mean is not the best descriptor for everyone. So we need to see how far these groups are from the grand mean and if it's more than we expect by chance alone.
But the mean is a balancing point; some groups will be above the grand mean, and some below it. If I took my grand mean, and subtracted each group mean from it, then added those deviations together, they would add up to 0 or close to it. What do we do when we want to add together deviations and not have them cancel each other out? We square them! Remember - this is how we get variance: the average squared deviation from the mean. So, to conduct an ANOVA, we look at the squared deviations from the grand mean. Analysis of variance - get it? Good.
Once you have your squared deviations from the grand mean - your between group variance - you compare those values to the pooled variance across the three groups - your within group variance, or how much variance you expect by chance alone. If your between group variance is a lot more than your within group variance, the result will be significant. Just like the t-test, there's a table of critical values, based on sample size as well as the number of comparisons you're making; if your ANOVA (also known as a F test - here's why) is that large or larger, you conclude that at least two of the group means are different from each other.
You would need to probe further to find out exactly which comparison is different - it could be only two are significantly different or it could be all three. You have to do what's called post hoc tests to find out for certain. Except now, you're not fishing - like you would be with multiple t-tests. You know there's a significant difference in there somewhere; you're just hunting to find out which one it is. (Look for a future post about post hoc tests.)
The cool thing about ANOVA is you can use it with more than one variable. Remember there is a difference between levels and variables. A level is one of the "settings" of a variable. For our caffeine study, the levels are "experimental: receives caffeine" and "control: no caffeine." In the movie theatre example, the variable is seating location, and the levels are front, middle, and back. But what if you wanted to throw in another variable you think might effect the outcome? For instance, you might think gender also has an impact on movie enjoyment.
There's an ANOVA for that, called factorial ANOVA. You need to have a mean for each combination of the two variables: male gender-front seat, female gender-front seat, male gender-middle seat, female gender-middle seat, male gender-back seat, and female gender-back seat. Your ANOVA does the same kind of comparison as above, but it also looks at each variable separately (male v. female collapsed across seating location, and front v. middle v. back collapsed across gender) to tell you the effect of each (what's called a main effect). Then, it can also tell you if the combination of gender and seating location changes the relationship. That is, maybe the effect of seating location differs depending on whether you are a man or a woman. This is called an interaction effect.
On one of these Statistics Sundays, I might have to show an ANOVA in action. Stay tuned!