Sunday, May 14, 2017

Statistics Sunday: What's Normal Anyway?

I've already done one bonus statistics post, which was published on a Thursday. But I wanted something that alliterated well, and Statistics Sunday (with Sara!) seemed perfect and just cheesy enough. I'll try to post something about statistics every Sunday. Once again, feel free to contact me with questions and I might cover them here.

The normal distribution is very important in statistics. Because statistics is about determining the probability of certain outcomes - and inferring that when an outcome (the result of a statistical test) is unlikely, it has an explanation beyond random chance - it's important that we know what the distribution of scores looks like, or should look like in the population, which we are always referring and generalizing back to. We use sample data as a stand-in for the population and to infer what the population distribution might look like if we had that data.

The normal distribution is well-understood, and we can easily determine probabilities of certain results using area under the curve.

So when we use samples - as we almost always do - to study something we often need those data to also be normally distributed, so we can determine those probabilities. Many statistics are based on the assumption (the rule) that the data have a known distribution, and usually that distribution is normal. The distribution from the sample data may not look exactly like the standard normal distribution, but how normal does it need to be? Or more specifically, how far can we depart from normal before we are unable to use probabilities from a normal distribution?

We should first look at the distribution of scores using a histogram, but this doesn't tell you if your data are normal enough. Remember, in statistics, we don't eyeball things. We don't use subjectivity with our results. We let the math do the talking. But it's still important to do this step, because looking at the histogram tells us whether there is only one most frequent score (mode), making the distribution what we call 'unimodal.'

But then we should examine two statistics: skewness and kurtosis. Skewness has to do with where the mode (the top of the distribution) falls. It should fall in the middle, rather than more on one side or the other. It does this by looking at the tail(s) of the distribution, the smallest part(s) of the distribution out to the side(s). A true normal distribution should have two tails, negative (below the mean) and positive (above the mean), that are approximately symmetrical. If the distribution is unskewed, the skewness statistic will be equal to 0, but the skewness statistic ranges (theoretically) from negative infinity to positive infinity. A negative skew means there is more of a tail on the negative end and less at the positive end of the distribution; a positive skew means there is more tail on the positive end.

But skew isn't all; we still need to look at kurtosis, which is a fancy term for how "peaked" the distribution is - the height of the mode. If the distribution is very peaked, that means there are far fewer scores in the tails (extreme scores are very rare); if the distribution is flat, that means there are many scores in the tails (extreme scores aren't very rare). There are three types of distributions with regard to kurtosis:
  • Mesokurtic (perfectly normal)
  • Leptokurtic (peaked)
  • Platykurtic (flattened peak)
Kurtosis is always positive and has a theoretically infinite range. A truly mesokurtic distribution will have kurtosis of 3, though some statistical analysis programs will subtract 3 from kurtosis (creating a measure often referred to as "excess" kurtosis), so that a mesokurtic distribution will have a value of 0.

There are conventions for both, though it can get more complicated than that. Often, programs will give you a standard error and you can conduct statistical tests using skewness and kurtosis. You divide your skewness by the standard error (like many statistics, the test formula is essentially a signal to noise ratio), and the resulting metric will be a Z-score. You would want that result to be less than 1.96, if you're using an alpha of 0.05 for that test. For kurtosis, the standard error is based on the standard error for skewness (it's the skewness standard error times 2 with a slight correction). Both standard errors are computed based on sample size; remember, as sample size increases, the more closely our data should resemble the population distribution.

But some people prefer to simply use conventions. In general, a skewness between -1.5 and +1.5 is considered acceptable. Kurtosis is a bit more disputed, in part because some analysis programs give excess kurtosis without clearly specifying. When using conventions, many people don't even worry about kurtosis and just focus on skewness.

This could be, in part, because skewness and kurtosis really aren't emphasized in statistics courses - at least not that I've seen. That data should be normally distributed is stated but then glossed over, and data provided for student exercises are often generated so that they do not violate assumptions. Real data is far messier. So by glossing over these concepts, courses aren't preparing students for situations they very likely will encounter. But once again, I digress.

My advice? Use the provided standard error and conduct a simple Z-tests. It really only adds one more step. If your data aren't normal, the results of the statistical tests could be wrong.

1 comment:

  1. Actually, kurtosis measures the outliers of the distribution relative to what is expected from a normal distribution. And that information can be very useful: For example, in finance, one would like to know whether an asset can produce wild up or down daily returns (high kurtosis), or whether the asset is stable in the sense that there are no wild ups or downs (negative excess kurtosis).

    One reason that people don't use kurtosis much is because they don't understand it! The "peakedness" folklore is actually 100% wrong. When interpreted correctly, namely, as a measure of outliers, it is actually a useful statistic. See here for a clear explanation of why the archaic "peakedness" definition is absolutely incorrect: