Monday, March 11, 2019

Statistics Sunday: Scatterplots and Correlations with ggpairs

As I conduct some analysis for a content validation study, I wanted to quickly blog about a fun plot I discovered today: ggpairs, which displays scatterplots and correlations in a grid for a set of variables.

To demonstrate, I'll return to my Facebook dataset, which I used for some of last year's R analysis demonstrations. You can find the dataset, a minicodebook, and code on importing into R here. Then use the code from this post to compute the following variables: RRS, CESD, Extraversion, Agree, Consc, EmoSt, Openness. These correspond to measures of rumination, depression, and the Big Five personality traits. We could easily request correlations for these 7 variables. But if I wanted scatterplots plus correlations for all 7, I can easily request it with ggpairs then listing out the columns from my dataset I want included on the plot:


(Note: I also computed the 3 RRS subscales, which is why the column numbers above skip from 112 (RRS) to 116 (CESD). You might need to adjust the column numbers when you run the analysis yourself.)

The results look like this:

Since the grid is the number of variables squared, I wouldn't recommend this type of plot for a large number of variables.


  1. Can you do the same with partial correlations? Ta!

  2. What, no P-values, or sample size (n), simultaneously with the graph?
    One of my pet peeves with these ggplot

  3. I like the density function this provides. However, I also like the lines of fit through the scatterplots that pairs.panels within the psych package gives. It also just reports the correlation value rather than prepending "Corr" to it with a ggplot grid through the background.

    And welcome back!