But as with many people who have worked in the professional research world, I still work with and have access to files built with programs like SPSS. Fortunately, I can still access those files at home on my personal laptop, thanks to haven.
Since I have access to SPSS through work, I was able to save my 2019 reads datafile in SPSS format, to give you something to test out the haven package with. You can download that file here - you won't need SPSS to access it, because R will do that for us.
haven is part of tidyverse, so installing the tidyverse will give you that package. However, it isn't part of the core tidyverse, meaning the library(tidyverse) command won't load it automatically, so you'll need to load it separately. I'll still load tidyverse, though, mainly because it's typically the first library load when I start analyzing data in R, since the core functions are so frequently used in my code.
library(tidyverse)
library(haven) spssreads <- read_spss("~/Downloads/Blogging A to Z/SaraReads2019_allrated.sav") head(spssreads)
## # A tibble: 6 x 18 ## Title Pages date_started date_read Book.ID Author AdditionalAutho… ## <chr> <dbl> <date> <date> <dbl> <chr> <chr> ## 1 1Q84 925 2019-09-03 2019-09-10 1.04e7 Murak… "Jay Rubin, Phi… ## 2 A Di… 256 2019-08-21 2019-08-22 5.46e4 Kalfu… "" ## 3 Alas… 323 2019-12-21 2019-12-23 3.82e4 Frank… "" ## 4 Arte… 305 2019-04-08 2019-04-11 3.49e7 Weir,… "" ## 5 Bird… 262 2019-02-07 2019-02-13 1.85e7 Maler… "" ## 6 Boun… 314 2019-04-23 2019-04-26 9.44e5 Cloud… "John Townsend" ## # … with 11 more variables: AverageRating <dbl>, OriginalPublicationYear <dbl>, ## # read_time <dbl>, MyRating <dbl>, Gender <dbl>, Fiction <dbl>, ## # Childrens <dbl>, Fantasy <dbl>, SciFi <dbl>, Mystery <dbl>, SelfHelp <dbl>
Typically that origin is January 1, 1970, and in fact, any date variable you work with in R is (under-the-hood) represented in this way (so-called UNIX time or Epoch time). But haven uses a different origin: October 15, 1582. What is the significance of this date? It's the day we switched over to the Gregorian calendar. You didn't realize you'd be getting a history lesson with this post, did you?
Haven can also be used to write data in these programs' file formats, so if you have a collaborator who wants to use SPSS or Stata to analyze the data, you could create a version in the program's native format.
Tomorrow, we'll talk about resources for learning more about tidyverse. And the day after that, get ready for joins!
I guess the reason why haven uses this origin is the fact that SPSS ('born' in 1968) choose this date as their reference.
ReplyDeleteGreat point! SPSS predates the Epoch, so they had to pick some meaningful date to represent time that wasn't related to UNIX. Funny side-note: My dissertation director used SPSS for analysis of her dissertation data and still had the punch cards to show me. SPSS uses a lot of its lingo in relation to those punch cards (e.g., specifying the column length of the variable, etc.), so it was cool to see the physical representation and explanation for that lingo. I love learning about the history of statistics, analysis, computers, and so on, so thanks for sharing this fact!
Delete