I've blogged previously about standard deviation and also why the denominator for sample standard deviation is N-1. Standard deviation tells us the typical spread of individual scores, and we can use the central limit theorem and various tables to determine percentiles associated with different scores. If we want to convert an individual score to a Z-score, we subtract the population mean and divide by the population standard deviation.
But when we want to engage in hypothesis testing of means, say two sample means, we need to use a different metric. Standard deviation tells you how much scores typically vary. But when we are engaging in hypothesis testing of two means, we are interested in how much means typically vary. So if I collected a bunch of samples of a specific size and measured them on the same thing, I want a metric that tells me how much I can expect the means of those different samples to vary. We use a metric based on standard deviation that gives you credit for the size of the sample, because means are going to be more stable, as in closer to the population mean, when sample sizes are bigger.
That metric is called the standard error of the mean, which I've sometimes seen abbreviated as SEM, but that can get really confusing when you start getting into the other abbreviation with use for SEM: structural equation modeling. I usually just see it called SE for standard error.
When we run a t-test, we subtract one mean from the other want to get our mean difference, and then we divide by standard error: how much we expect means of groups to typically vary. The bigger the sample size, the smaller the standard error.
But don't worry: standard deviation is still very important. Because when we run a hypothesis test using standard error, we're testing the hypothesis that these means are more different from each other than they should be by chance alone. But "more different than we would expect by chance alone" is not the same thing as saying a big difference. There is a difference between statistical significance and the size of the effects.
As you can see the standard error can get very small if you have a very large sample size. Even trivial differences between your means can becomes statistically significant simply because the sample size was large. But does that mean difference actually mean something?
That's why we use a measure called effect size, which tells us about the magnitude of the difference. There are many different effect sizes, depending on what statistical analysis you're using, and I'm going to start writing about some of the different effect sizes in these posts. In fact, we've already discussed one: correlation is an effect size, because it tells the strength of the relationship between two variables.
But I wanted to start by introducing the concept, and also introducing one effect size that is really related to this concept of standard error versus standard deviation. And that would be Cohen's d.
Cohen's d tells us by how many standard deviation units two sample means differ - standard deviation, not standard error. Standard error is directly impacted by sample size; standard deviation is not (not directly anyway). Getting two sample means to differ by a certain number of standard errors isn't a difficult task when sample sizes are large - but getting them to differ by a certain number of standard deviation units is quite a feat.
The formula for Cohen's d is simply the mean difference divided by the pooled standard deviation. And to show how much of a feat it is to differ by a measure of standard deviation units, Cohen considered a difference of 0.8 of a standard deviation to be a large effect (also, he called 0.2 small and 0.5 medium). If you found two sample means differed by a full standard deviation, you've found an extremely large effect.
I'll be doing some posts on different effect sizes soon, in part because I decided to spoil myself and bought the second edition of a fantastic book, Effect Sizes for Research. I have copious notes and photocopies from a borrowed copy of the first edition, and this one has expanded to include more multivariate effect sizes. It isn't an expensive book - I simply say spoil and discuss notes & photocopies from the previous one because it came out when I was a broke-ass grad student. But I highly recommend it for anyone who uses statistics regularly; effect sizes are being demanded more and more in research - for a good reason I'll discuss more about later!