These days, when I want descriptive statistics from a dataset, I generally use summarise, because I can specify the exact statistics I want in the exact order I want (for easy pasting of tables into a report or presentation).
Also, if you're not a fan of the UK spelling, summarize works exactly the same. The same is true of other R/tidyverse functions, like color versus colour.
Let's load the reads2019 dataset and start summarizing!
library(tidyverse)
reads2019 <- read_csv("~/Downloads/Blogging A to Z/SaraReads2019_allrated.csv", col_names = TRUE)
reads2019 %>% summarise(AllPages = sum(Pages), AvgLength = mean(Pages), AvgRating = mean(MyRating), AvgReadTime = mean(read_time), ShortRT = min(read_time), LongRT = max(read_time), TotalAuthors = n_distinct(Author))
## # A tibble: 1 x 7 ## AllPages AvgLength AvgRating AvgReadTime ShortRT LongRT TotalAuthors ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> ## 1 29696 341. 4.14 3.92 0 25 42
reads2019 %>% filter(is.na(OriginalPublicationYear)) %>% select(Title)
## # A tibble: 5 x 1 ## Title ## <chr> ## 1 Empath: A Complete Guide for Developing Your Gift and Finding Your Sense of S… ## 2 Perilous Pottery (Cozy Corgi Mysteries, #11) ## 3 Precarious Pasta (Cozy Corgi Mysteries, #14) ## 4 Summerdale ## 5 Swarm Theory
reads2019 <- reads2019 %>% mutate(OriginalPublicationYear = replace(OriginalPublicationYear, Title == "Empath: A Complete Guide for Developing Your Gift and Finding Your Sense of Self", 2017), OriginalPublicationYear = replace(OriginalPublicationYear, Title == "Summerdale", 2018), OriginalPublicationYear = replace(OriginalPublicationYear, Title == "Swarm Theory", 2016), OriginalPublicationYear = replace_na(OriginalPublicationYear, 2019)) genrestats <- reads2019 %>% filter(Fiction == 1) %>% arrange(OriginalPublicationYear) %>% group_by(Childrens, Fantasy, SciFi, Mystery) %>% summarise(Books = n(), WomenAuthors = sum(Gender), AvgLength = mean(Pages), AvgRating = mean(MyRating), NewestBook = last(OriginalPublicationYear), OldestBook = first(OriginalPublicationYear))
genrestats <- genrestats %>% bind_cols(Genre = c("General Fiction", "Mystery", "Science Fiction", "Fantasy", "Fantasy SciFi", "Children's Fiction", "Children's Fantasy")) %>% ungroup() %>% select(Genre, everything(), -Childrens, -Fantasy, -SciFi, -Mystery) library(expss)
as.etable(genrestats, rownames_as_row_labels = NULL)
Genre | Books | WomenAuthors | AvgLength | AvgRating | NewestBook | OldestBook |
---|---|---|---|---|---|---|
General Fiction | 15 | 10 | 320.1 | 4.1 | 2019 | 1941 |
Mystery | 9 | 8 | 316.3 | 3.8 | 2019 | 1950 |
Science Fiction | 19 | 4 | 361.4 | 4.4 | 2019 | 1959 |
Fantasy | 19 | 3 | 426.3 | 4.2 | 2019 | 1981 |
Fantasy SciFi | 2 | 0 | 687.0 | 4.5 | 2009 | 2006 |
Children's Fiction | 1 | 0 | 181.0 | 4.0 | 2016 | 2016 |
Children's Fantasy | 16 | 1 | 250.6 | 4.2 | 2008 | 1900 |
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 3.6.3
reads2019 %>% mutate(Gender = factor(Gender, levels = c(0,1), labels = c("Male", "Female")), Fiction = factor(Fiction, levels = c(0,1), labels = c("Non-Fiction", "Fiction"), ordered = TRUE)) %>% group_by(Gender, Fiction) %>% summarise(Books = n()) %>% ggplot(aes(Fiction, Books)) + geom_col(aes(fill = reorder(Gender, desc(Gender)))) + scale_fill_economist() + xlab("Genre") + labs(fill = "Author Gender")
I've always appreciated the support that tidyverse packages have for UK and US spelling. It's taken more error messages that I care to admit to learn that I have to specify "color" in 'kableExtra' package functions but I can get away with "colour" in 'ggplot2' package functions.
ReplyDelete