Friday, April 21, 2017

R is for r (Correlation)

You've probably heard the term "correlation" before. It's used to say that two things are related to each other. Two things can be correlated with each other but that says nothing about cause - one could cause the other OR another variable could cause both (also known as the "third variable problem" or "confound").

BTW, my favorite correlation-related cartoon:


There are different statistics that measure correlation, but the best known is Pearson's correlation coefficient, also known as r. This statistic, which is used when you have two interval or ratio variables, communicates a great deal of information:
  • Strength of the relationship: r ranges from -1 to +1; scores of +/-1 indicate a perfect relationship, while scores of 0 indicate no relationship
  • Direction of the relationship: positive values indicate a positive relationship, where as one variable increases so does the other; negative values indicate a negative or inverse relationship, where as on variable increases the other decreases
Just like the t-test I hinted at in my post on p-values, r also has a p-value to let you know if the relationship is significant (stronger than we would expect by chance alone). And as with any statistic we've talked about thus far, there's the potential for Type I error. We could, just by luck, get a significant correlation between two variables that actually have nothing to do with each other. Why? Because probability, that's why.

Here's a demonstration of that concept. I created 20 samples of 30 participants measured on two randomly generated continuous variables. Because these are randomly generated, they should not be significantly correlated other than by chance alone. I then computed correlation coefficients for each of these samples. If you recall from the alpha post, with an alpha of 0.05, we would expect at least 1 of 20 to be significant just by chance. It could be more or less, because, well, probability. It's a 5% chance each time, just like you have a 50% chance of heads each time you flip a coin - you could still get 10 heads in a row. And you could figure out the probability of getting multiple significant results just by chance in the same way as you would multiple heads in a row: with joint probability.

The results? 3 were significant.


BTW, using joint probability, the chance of having 3 significant results in this situation was 0.0125%. Small, but not 0.

Tomorrow I'll talk about how we visualize these relationships.

No comments:

Post a Comment