Sunday, February 11, 2018

Statistics Sunday: My Favorite R Packages

Last year, I shared a post to help you get started with R and R Studio - check it out here. As I install R on yet another computer, it occurred to me that now might be a good time to blog about the R packages I use so often, installing them is usually my first step right after installing R and R Studio.

Whenever you install R, you'll get the base package, which has many built-in statistics, and some additional libraries. Libraries add functionality to R - you install and load a library to have access to its built-in functions. These libraries/packages are written and contributed by users - some by individuals, some organizations or universities, and some collaborations among users and/or organizations. If you navigate over to the Comprehensive R Archive Network (CRAN) website, you'll find that there are currently 12,133 packages available. There are R packages to do just about anything, and often more than one for any particular statistical approach.

You don't need all of them of course, and may not have any need for most of them. And the packages I use for my own work are likely to be very different from the ones you would need. But my goal for today is to show you the R packages that I think are either universally useful for statisticians, or are just so good, I have to share them with others.

  • dplyr - Part of the "tidyverse" of R packages, this package offers a "grammar" of data manipulation, allowing you to easily filter and mutate (the term used for aggregating data or computing a new variable); this package works on data both in and out of memory, so you can even use it on datasets too large to store in your own computer's memory
  • ggplot2 - Another member of the tidyverse, this one using the grammar of graphics (gg); similar syntax is used to create many different kinds of charts and figures, with just a few changes for type, making it much easier to learn and very flexible
  • psych - Described by the creator, William Revelle of Northwestern University, as a "general purpose toolkit for personality, psychometric theory, and experimental psychology," this package is great for running quick descriptives, data reduction, and psychometric analysis (mostly classical test theory); it also has its own website, filled with resources for learning R
  • lavaan - An easy-to-use package for conducting confirmatory factor analysis and structural equation modeling; I had the pleasure of attending a workshop with one of the developers of the package, Yves Rosseel, a couple of years ago
  • semPlot - Is it possible for an R package to change your life? This package is brilliant; you create your measurement or structural equation model as an R object - to analyze with lavaan or whatever package you choose - then use this package to draw that model for you, with just a line or two of code, complete with factor or path loadings if you'd like. No more hunching over Powerpoint creating figures or accepting the messy drawings produced by SEM software.
  • metafor and rmeta - Two R packages for meta-analysis, which I learned to use in a Meta-Analysis with R course I took a year or so ago. Personally I found the metafor package more useful, but both packages are installed on my computer and have different enough strengths that I could definitely justify installing both
  • RPostgreSQL - Last year, I took a course on SQL, which, after teaching us some basics in PostgreSQL, showed us how to bring SQL data into R; if you, like me, know just enough SQL to be dangerous and prefer to use statistical software to analyze your data, this package will let you pull SQL data into R data frame to be analyzed with whatever package(s) you choose
You can install any of these libraries with install.packages("libraryname") and load a package for use with library("libraryname"). While it's completely fine to have multiple libraries loaded at once, remember that some libraries may use the same function name. R will give the most recently loaded library precedence when functions exist in more than one - and it will let you know when you've loaded a library what functions are now masked from the other loaded libraries.

I tend to measure my productivity by how many R packages I installed that day, so I'm always exploring and learning new approaches and installing new packages. Hopefully I'll do another post like this in the future where I blog about new packages I'm loving.

Sound off, readers - what are your favorite R packages?


  1. OpenMx and umx - A really powerful, flexible package for SEM, IRT, and other latent variable models. I find the grammar easier to use than lavaan, and the developers' support forum really responsive.

    mirt - A package for fitting various uni- and multidimensional IRT models. Think the ease of use of the psych package, but for IRT.

    psychmeta - A package Jeff Dahlke and I wrote (sorry for the self-plug) for conducting meta-analyses with corrections for reliability, range restriction/enhancement, and other artifacts. I use it every day in my non-meta-analysis research for converting effect sizes, computing confidence intervals, and applying psychometric corrections (reliability, range restriction, scale coarseness, etc.).

    1. Self-plugs are highly encouraged! Thanks so much for sharing; looking forward to checking out the psychmeta package.

  2. Instead of dplyr and ggplot2, just use tidyverse itself! library(tidyverse) includes dplyr, ggplot2, tidyr, readr, tibble, and purrr. All very useful!