Tuesday, March 31, 2020

Blogging A to Z: The A to Z of tidyverse

Announcing my theme for this year's blogging A to Z!


The tidyverse is a set of R packages for data science. The big thing about the tidyverse is making sure your data are tidy. What does that mean?

  1. Each row is an observation
  2. Each column is a variable
  3. Each cell contains only one value
When I first learned about the tidy approach, I thought, "Why is this special? Isn't that what we should be doing?" But thinking about keeping your data tidy has really changed the way I approach my job, and has helped me solve some tricky data wrangling issues. When you really embrace this approach, merging data, creating new variables, and summarizing cases becomes much easier. And the syntax used is the tidyverse is much more intuitive than much of the code in R, making it easier to memorize many of the functions; they follow a predictable grammar, so you don't need to constantly look things up.

See you tomorrow for the first post - A is for arrange!

No comments:

Post a Comment