Saturday, April 22, 2017

S is for Scatterplot

Visualizing your data is incredibly important. I talked previously about the importance of creating histograms of your interval/ratio variables to check the shape of your distribution. Today, I'm going to talk about another way to visualize data: the scatterplot.

Let's say you have two interval/ratio variables that you think are related to each other in some way. You might think they're simply correlated, or you might think that one causes the other one. You would first want to look at the relationship between the two variables. Why? Correlation assumes a linear relationship between variables, meaning a consistent positive (as one increases so does the other) or negative (as one increases the other decreases) relationship across all values. We wouldn't want it to be positive at first, and then flatten out before turning negative. (I mean, we might, if that's the kind of relationship we expect, but we would need to analyze our data with a different statistic - one that doesn't assume a linear relationship.)

So we create a scatterplot, which maps out each participants' pair of scores on the two variables we're interested in. In fact, you've probably done this before in math class, on a smaller scale.

As I discussed in yesterday's bonus post, I had 257 people respond to a rather long survey about how they use Facebook, and how use impacts health outcomes. My participants completed a variety of measures, including measures of rumination, savoring, life satisfaction, Big Five personality traits, physical health complaints, and depression. There are many potential relationships that could exist between and among these concepts. For instance, people who ruminate more (fixate on negative events and feelings) also tend to be more depressed. In fact, here's a scatterplot created with those two variables from my study data:


And sure enough, these two variables are positively correlated with each other: r = 0.568. (Remember that r ranges from -1 to +1, and that 1 would indicate a perfect relationship. So we have a strong relationship here, but there are still other variables that explain part of the variance in rumination and/or depression.)

Savoring, on the other hand, is in some ways the opposite of rumination; it involves fixating on positive events and feelings. So we would expect these two to be negatively correlated with each other. And they are:


The correlation between these two variables is -0.351, so not as a strong as the relationship between rumination and depression and in the opposite direction.

Unfortunately, I couldn't find any variables in my study that had a nonlinear relationship to show (i.e., has curves). But I could find two variables that were not correlated with each other: the Extraversion scale from Big Five and physical health complaints. Unsurprisingly, being an extravert (or introvert) has nothing to do with health problems (r = -0.087; pretty close to 0):


But if you really want to see what a nonlinear relationship might look like, check out this post on the Dunning-Kruger effect; look at the relationship between actual performance and perceived ability.

As I said yesterday, r also comes with a p-value to tell whether the relationship is larger than we would expect by chance. We would usually report the exact p-value, but for some these, the p-value is so small (really small probability of occurring by chance), the program doesn't display the whole thing. In those cases, we would choose a really small value (the convention in these cases seems to be 0.001) and say the p was less than that. Here's the r's and p-values for the 3 scatterplots above:

  1. Rumination and Depression, r = 0.568, p < 0.001
  2. Rumination and Savoring, r = -0.351, p < 0.001
  3. Extraversion and Health Complaints, r = -0.087, p = 0.164

No comments:

Post a Comment