Wednesday, April 11, 2018

J is for jsonlite Package

J is for jsonlite package Today I'm going to introduce a new method of storing and exchanging data: JSON or JavaScript Object Notation. Up to now, we've been working with delimited text files and R data frames. But JSON (pronounced "Jason") is another way we can store data that can be read by different software packages. In my previous job, some of our tests arrived in Research as JSON files, which we then parsed using Python, SAS, or R. In fact, JSON can be parsed by any programming language. JSON allowed the transfer of large amounts of data, organized into name and value pairs. Each name and value pair is separated by commas, with each object (case) enclosed in curly brackets {}, and also separated by commas. The full dataset is then enclosed in square brackets []. JSON files are human readable; they are self-describing, because you get to pick whatever name to assign to a value, and hierarchical.

Here's what a JSON file might look like for my Blogging A to Z posts:

{"posts": [
  {"postname": "A is for (Cronbach's) Alpha", "date": "20180401", "shorturl": "a-is-for-cronbachs-alpha.html", "posted": true},
  {"postname": "B is for Betas (Standardized Regression Coefficients)", "date": "20180402", "shorturl": "b-is-for-betas-standardized-regression.html", "posted": true},
  {"postname": "C is for Cross Tabs Analysis", "date": "20180403", "shorturl": "c-is-for-cross-tabs-analysis.html", "posted": true},
  {"postname": "D is for Data Frame", "date": "20180404", "shorturl": "d-is-for-data-frame.html", "posted": true},
  {"postname": "E is for Effect Sizes", "date": "20180405", "shorturl": "e-is-for-effect-sizes.html", "posted": true},
  {"postname": "F is for (Confirmatory) Factor Analysis", "date": "20180406", "shorturl": "f-is-for-confirmatory-factor-analysis.html", "posted": true},
  {"postname": "G is for glm Function", "date": "20180407", "shorturl": "g-is-for-glm-function.html", "posted": true},
  {"postname": "H is for Help with R", "date": "20180409", "shorturl": "h-is-for-help-with-r.html", "posted": true},
  {"postname": "I is for (Classical) Item Analysis or I Must Be Flexible", "date": "20180410", "shorturl": "i-is-for-classical-item-analysis-or-i.html", "posted": true},
  {"postname": "J is for jsonlite Package", "date": "20180411", "shorturl": null, "posted": false}

As you can see, the structure is readable and you can make sense out of what information it is communicating. JSON allows many kinds of data, including numeric, string, logical, and null values. This was one reason it was so useful for our test data; because some of our tests were adaptive, each examinee only received certain items, so they would have null values for most of the items in the item bank. We could read in their responses to the items they saw, along with the item ID, and fill in null values for other items they didn't see. We can then put all examinees in a single file, with those who saw an item having values in that column, and those who didn't with null values. Then we can analyze all examinees together and generate item statistics and/or person ability estimates.

JSON does not allow functions or dates, though, so I've create my date variable as a string, enclosed in quotes. To read that information as a date, I would need to do an extra step once I parse it into R, but that's only necessary if you plan on doing any kind of analysis or calculations with dates. For instance, you might have a date of birth variable and want to calculate exact age, using current date, for everyone in your sample. In that case, you'd want to make certain that whatever statistical package you're using knows the variable is a date so it can handle it properly in calculations.

White space is ignored in JSON, with brackets dictating hierarchy and structure, so I could space this file out more if I wanted to, to make it even more readable:

{"posts": [

  {"postname": "A is for (Cronbach's) Alpha",

  "date": "20180401",

  "shorturl": "a-is-for-cronbachs-alpha.html",

  "posted": true}


I saved the object created above, using a simple text editor, as a .json file, which I can then read into R with the jsonlite package. Though the jsonlite package has a way of coercing an object into a data frame, I found it a bit finicky, so I just read the object in then converted it to a data frame.


Now I have a data frame called "posts", containing all of the information from my JSON file. Let's take a look at how the data were read in, in particular the data types.

## 'data.frame': 10 obs. of  4 variables:
##  $ posts.postname: chr  "A is for (Cronbach's) Alpha" "B is for Betas (Standardized Regression Coefficients)" "C is for Cross Tabs Analysis" "D is for Data Frame" ...
##  $    : chr  "20180401" "20180402" "20180403" "20180404" ...
##  $ posts.shorturl: chr  "a-is-for-cronbachs-alpha.html" "b-is-for-betas-standardized-regression.html" "c-is-for-cross-tabs-analysis.html" "d-is-for-data-frame.html" ...
##  $ posts.posted  : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...

If I want to do any kind of date math, I need to convert my column into a date variable. I just need to tell R to turn it into a date and provide the format of the string. (Nerdy note: date variables are actually represented as the number of seconds since January 1, 1970, known as the Unix epoch. This is then converted into a date, formatted in whatever way you specify.)

posts$ <- as.Date(posts$, "%Y%m%d")
##  Date[1:10], format: "2018-04-01" "2018-04-02" "2018-04-03" "2018-04-04" "2018-04-05" ...

Now I can use that variable to compute a new variable - days since posted.

posts$ <- Sys.Date() - posts$
## 'data.frame': 10 obs. of  5 variables:
##  $ posts.postname : chr  "A is for (Cronbach's) Alpha" "B is for Betas (Standardized Regression Coefficients)" "C is for Cross Tabs Analysis" "D is for Data Frame" ...
##  $     : Date, format: "2018-04-01" "2018-04-02" ...
##  $ posts.shorturl : chr  "a-is-for-cronbachs-alpha.html" "b-is-for-betas-standardized-regression.html" "c-is-for-cross-tabs-analysis.html" "d-is-for-data-frame.html" ...
##  $ posts.posted   : logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ 'difftime'  atomic [1:10] 8 7 6 5 4 3 2 0 -1 -2
##   .. ..- attr(*, "units")= chr "days"

But the jsonlite package will not only parse JSON; it can also create a JSON file, for easy sharing. Let's convert the Facebook data file into a JSON file. We'll also add an additional argument, pretty, which adds whitespace to make the file more readable.

Facebook<-read.delim(file="small_facebook_set.txt", header=TRUE)
Facebook_js<-toJSON(Facebook, dataframe=c("rows","columns","values"), pretty=TRUE)
save(Facebook_js, file="FB_JS.JSON")

If you're interested in learning more about JSON files, check out the tutorial on

No comments:

Post a Comment