According to the tibble overview on the tidyverse website:
Tibbles are data.frames that are lazy and surly: they do less (i.e. they don't change variable names or types, and don't do partial matching) and complain more (e.g. when a variable does not exist). This forces you to confront problems earlier, typically leading to cleaner, more expressive code.What does this mean? Well, remember when I noted that a character variable in my measures data frame had been changed to a factor? I manually changed it back to character. But had I simply created a tibble with that information, I wouldn't have had to do anything. Data frames will also do partial matching on variable names - so if I requested Facebook$R, it would have given me all variables in that set starting with R. If I tried that with a tibble, I'd get an error message, because it matches variable references literally.
There are a few ways to create a tibble, one using the tibble packages and the other using the readr package. Fortunately, you don't need to worry about that, because we're just going to use the tidyverse package, which contains those two and more.
install.packages("tidyverse")
library(tidyverse)
First, let's create a new tibble from scratch. The syntax is almost exactly the same as it was in the data frame post.
measures<-tibble( meas_id = c(1:6), name = c("Ruminative Response Scale","Savoring Beliefs Inventory", "Satisfaction with Life Scale","Ten-Item Personality Measure", "Cohen-Hoberman Inventory of Physical Symptoms", "Center for Epidemiologic Studies Depression Scale"), num_items = c(22,24,5,10,32,16), rev_items = c(FALSE, TRUE, FALSE, TRUE, FALSE, TRUE) ) measures
## # A tibble: 6 x 4 ## meas_id name num_items ## <int> <chr> <dbl> ## 1 1 Ruminative Response Scale 22 ## 2 2 Savoring Beliefs Inventory 24 ## 3 3 Satisfaction with Life Scale 5 ## 4 4 Ten-Item Personality Measure 10 ## 5 5 Cohen-Hoberman Inventory of Physical Symptoms 32 ## 6 6 Center for Epidemiologic Studies Depression Scale 16 ## # ... with 1 more variables: rev_items <lgl>
As you can see, the name variable is character, not factor. I didn't have to do anything. Alternatively, you could convert an existing data frame, whether it's one you created or one that came with R/an R package.
car<-as_tibble(mtcars) car
## # A tibble: 32 x 11 ## mpg cyl disp hp drat wt qsec vs am gear carb ## * <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 ## 2 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 ## 3 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 ## 4 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 ## 5 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 ## 6 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 ## 7 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 ## 8 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 ## 9 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 ## 10 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 ## # ... with 22 more rows
But chances are you'll be reading in data from an external file. The readr package can handle delimited and fixed width files. For instance, to read in the Facebook dataset I've been using, I just need the function read_tsv.
Facebook<-read_tsv("small_facebook_set.txt",col_names=TRUE)
Facebook
## # A tibble: 257 x 111 ## ID gender Rum1 Rum2 Rum3 Rum4 Rum5 Rum6 Rum7 Rum8 Rum9 ## <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> ## 1 1 1 3 1 3 2 3 1 2 1 1 ## 2 2 1 1 1 1 1 1 1 0 0 1 ## 3 3 1 4 3 3 4 3 4 2 3 3 ## 4 4 0 4 0 0 2 0 0 4 0 2 ## 5 5 1 2 2 2 1 2 1 1 1 1 ## 6 6 0 2 4 3 4 2 3 2 2 3 ## 7 7 1 1 2 3 2 0 2 3 1 2 ## 8 8 0 2 1 1 2 0 2 3 3 3 ## 9 9 1 4 1 4 4 3 2 2 1 1 ## 10 10 1 4 2 0 3 4 2 4 1 2 ## # ... with 247 more rows, and 100 more variables: Rum10 <int>, ## # Rum11 <int>, Rum12 <int>, Rum13 <int>, Rum14 <int>, Rum15 <int>, ## # Rum16 <int>, Rum17 <int>, Rum18 <int>, Rum19 <int>, Rum20 <int>, ## # Rum21 <int>, Rum22 <int>, Sav1 <int>, Sav2 <int>, Sav3 <int>, ## # Sav4 <int>, Sav5 <int>, Sav6 <int>, Sav7 <int>, Sav8 <int>, ## # Sav9 <int>, Sav10 <int>, Sav11 <int>, Sav12 <int>, Sav13 <int>, ## # Sav14 <int>, Sav15 <int>, Sav16 <int>, Sav17 <int>, Sav18 <int>, ## # Sav19 <int>, Sav20 <int>, Sav21 <int>, Sav22 <int>, Sav23 <int>, ## # Sav24 <int>, LS1 <int>, LS2 <int>, LS3 <int>, LS4 <int>, LS5 <int>, ## # Extraverted <int>, Critical <int>, Dependable <int>, Anxious <int>, ## # NewExperiences <int>, Reserved <int>, Sympathetic <int>, ## # Disorganized <int>, Calm <int>, Conventional <int>, Health1 <int>, ## # Health2 <int>, Health3 <int>, Health4 <int>, Health5 <int>, ## # Health6 <int>, Health7 <int>, Health8 <int>, Health9 <int>, ## # Health10 <int>, Health11 <int>, Health12 <int>, Health13 <int>, ## # Health14 <int>, Health15 <int>, Health16 <int>, Health17 <int>, ## # Health18 <int>, Health19 <int>, Health20 <int>, Health21 <int>, ## # Health22 <int>, Health23 <int>, Health24 <int>, Health25 <int>, ## # Health26 <int>, Health27 <int>, Health28 <int>, Health29 <int>, ## # Health30 <int>, Health31 <int>, Health32 <int>, Dep1 <int>, ## # Dep2 <int>, Dep3 <int>, Dep4 <int>, Dep5 <int>, Dep6 <int>, ## # Dep7 <int>, Dep8 <int>, Dep9 <int>, Dep10 <int>, Dep11 <int>, ## # Dep12 <int>, Dep13 <int>, Dep14 <int>, Dep15 <int>, Dep16 <int>
Finally, if you're working with SAS, SPSS, or Stata files, you can read those in with the tidyverse package, haven, and the functions read_sas, read_sav, and read_dta, respectively.
If for some reason you need a data frame rather than a tibble, you can convert a tibble to a data frame with class(as.data.frame(tibble_name)).
You can learn more about tibbles here and here.
No comments:
Post a Comment