Deeply Trivial: A is for arrange

The arrange function allows you to sort a dataset by one or more variable, either ascending or descending. This function is especially helpful if you plan on aggregating your data with summarize (which, we'll get to later), so you can select specific rows in that command.

It's similar to the Excel complex sort, where the order of entry determines which variable is sorted first, second, ..., last. For this example, I'll use data I put together about my reading for 2019 (which you can download here, or use your own datafile of choice). First up, as with all of these posts going forward, we'll want to load tidyverse. (If you haven't installed it yet, just add install.packages("tidyverse") before the rest of the code.)

library(tidyverse)

## -- Attaching packages ------------------------------------------- tidyverse 1.3.0 --

## <U+2713> ggplot2 3.2.1     <U+2713> purrr   0.3.3
## <U+2713> tibble  2.1.3     <U+2713> dplyr   0.8.3
## <U+2713> tidyr   1.0.0     <U+2713> stringr 1.4.0
## <U+2713> readr   1.3.1     <U+2713> forcats 0.4.0

## -- Conflicts ---------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

reads2019 <- read_csv("~/Downloads/SarasReads2019.csv", col_names = TRUE)

## Parsed with column specification:
## cols(
##   Title = col_character(),
##   Pages = col_double(),
##   date_started = col_date(format = ""),
##   date_read = col_date(format = ""),
##   Book.ID = col_double(),
##   Author = col_character(),
##   AdditionalAuthors = col_character(),
##   AverageRating = col_double(),
##   OriginalPublicationYear = col_double(),
##   read_time = col_double(),
##   MyRating = col_double(),
##   Gender = col_double(),
##   Fiction = col_double(),
##   Childrens = col_double(),
##   Fantasy = col_double(),
##   SciFi = col_double(),
##   Mystery = col_double(),
##   SelfHelp = col_double()
## )

head(reads2019)

## # A tibble: 6 x 18
##   Title Page… date_started date_read  Book.ID Author
##   <chr> <dbl> <date>       <date>       <dbl> <chr>       
## 1 1Q84  925 2019-09-03   2019-09-10  1.04e7 Murakami, H…
## 2 A Diso… 256 2019-08-21   2019-08-22  5.46e4 Kalfus, Ken 
## 3 Alas, … 323 2019-12-21   2019-12-23  3.82e4 Frank, Pat  
## 4 Artemis 305 2019-04-08   2019-04-11  3.49e7 Weir, Andy  
## 5 Bird B… 262 2019-02-07   2019-02-13  1.85e7 Malerman, J…
## 6 Bounda… 314 2019-04-23   2019-04-26  9.44e5 Cloud, Henry
## # … with 12 more variables: AdditionalAuthors <chr>, AverageRating <dbl>,
## #   OriginalPublicationYear <dbl>, read_time <dbl>, MyRating <dbl>,
## #   Gender <dbl>, Fiction <dbl>, Childrens <dbl>, Fantasy <dbl>, SciFi <dbl>,
## #   Mystery <dbl>, SelfHelp <dbl>

Let's sort the data first by page length (longest to shortest), then by author (A to Z).

reads2019 <- reads2019 %>%
  arrange(desc(Pages), Author)

head(reads2019)

## # A tibble: 6 x 18
##   Title Page… date_started date_read  Book.ID Author
##   <chr> <dbl> <date>       <date>       <dbl> <chr>       
## 1 It    1156 2019-08-14   2019-08-21  2.79e7 King, Steph…
## 2 1Q84  925 2019-09-03   2019-09-10  1.04e7 Murakami, H…
## 3 Insomn… 890 2019-08-10   2019-08-13  1.06e4 King, Steph…
## 4 The In… 576 2019-12-06   2019-12-11  4.38e7 King, Steph…
## 5 The Ro… 528 2019-04-11   2019-04-20  1.76e4 Atwood, Mar…
## 6 Life o… 460 2019-07-13   2019-07-16  4.21e3 Martel, Yann
## # … with 12 more variables: AdditionalAuthors <chr>, AverageRating <dbl>,
## #   OriginalPublicationYear <dbl>, read_time <dbl>, MyRating <dbl>,
## #   Gender <dbl>, Fiction <dbl>, Childrens <dbl>, Fantasy <dbl>, SciFi <dbl>,
## #   Mystery <dbl>, SelfHelp <dbl>

The longest book I read was It by Stephen King, which was 1,156 pages, followed by 1Q84 by Haruki Murakami, 925 pages.

Arrange becomes very helpful if you're planning on displaying a small dataset, summarizing data, or for charting. We'll get to all that later.

Deeply Trivial

Wednesday, April 1, 2020

A is for arrange

No comments:

Post a Comment