Let's say I wanted to plot my reading over time, specifically as a cumulative sum of pages across the year. My x-axis will be a date. Since my reads2019 file initially formats my dates as character, I'll need to use my mutate code to turn them into dates, plus compute my cumulative sum of pages read.
library(tidyverse)
reads2019 <- read_csv("~/Downloads/Blogging A to Z/SaraReads2019_allchanges.csv", col_names = TRUE)
reads2019 <- reads2019 %>% mutate(date_started = as.Date(reads2019$date_started, format = '%m/%d/%Y'), date_read = as.Date(date_read, format = '%m/%d/%Y'), PagesRead = order_by(date_read, cumsum(Pages)))
ggplot2 did a fine job of creating this plot using default settings. Since my date_read variable is a date, the plot automatically ordered date_read, formatted as "Month Year", and used quarters as breaks. But we can still use the scale_x functions to make this plot look even better.
One way could be to format years as 2-digit instead of 4. We could also have month breaks instead of quarters.
reads2019 %>% ggplot(aes(date_read, PagesRead)) + geom_point() + scale_x_date(date_labels = "%b %y", date_breaks = "1 month")
genres <- reads2019 %>% group_by(Fiction, Childrens, Fantasy, SciFi, Mystery) %>% summarise(Books = n()) genres <- genres %>% bind_cols(Genre = c("Non-Fiction", "General Fiction", "Mystery", "Science Fiction", "Fantasy", "Fantasy Sci-Fi", "Children's Fiction", "Children's Fantasy")) genres %>% ggplot(aes(Genre, Books)) + geom_col()
Unfortunately, my new genre names are a bit long, and overlap each other unless I make my plot really wide. There are a few ways I can deal with that. First, I could ask ggplot2 to abbreviate the names.
genres %>% ggplot(aes(Genre, Books)) + geom_col() + scale_x_discrete(labels = abbreviate)
These abbreviations were generated automatically by R, and I'm not a huge fan. A better way might be to add line breaks to any two-word genres. This Stack Overflow post gave me a function I can add to my scale_x_discrete to do just that.
genres %>% ggplot(aes(Genre, Books)) + geom_col() + scale_x_discrete(labels=function(x){sub("\\s", "\n", x)})
MUCH better!
As you can see, the scale_x function you use depends on the type of data you're working with. For dates, scale_x_date; for categories, scale_x_discrete. Tomorrow, we'll show some ways to format continuous data, since that's often what you see on the y-axis. See you then!
By the way, this is my 1000th post on my blog!
No comments:
Post a Comment