Sunday, November 18, 2018

Statistics Sunday: Reading and Creating a Data Frame with Multiple Text Files

First Statistics Sunday in far too long! It's going to be a short one, but it describes a great trick I learned recently while completing a time study for our exams at work.

To give a bit of background, this time study involves analzying time examinees spent on their exam and whether they were able to complete all items. We've done time studies in the past to select time allowed for each exam, but we revisit on a cycle to make certain the time allowed is still ample. All of our exams are computer-administered, and we receive daily downloads from our exam provider with data on all exams administered that day.

What that means is, to study a year's worth of exam data, I need to read in and analyze 365(ish - test centers are generally closed for holidays) text files. Fortunately, I found code that would read all files in a particular folder and bind them into a single data frame. First, I'll set the working directory to the location of those files, and create a list of all files in that directory:

filelist <- list.files()

For the next part, I'll need the data.table library, which you'll want to install if you don't already have it:

Exams2018 <- rbindlist(sapply(filelist, fread, simplify = FALSE), use.names = TRUE, idcol = "FileName")

Now I have a data frame with all exam data from 2018, and an additional column that identifies which file a particular case came from.

What if your working directory has more files than you want to read? You can still use this code, with some updates. For instance, if you want only the text files from the working directory, you could add a regular expression to the list.files() code to only look for files with ".txt" extension:

list.files(pattern = "\\.txt$")

If you're only working with a handful of files, you can also manually create the list to be used in the rbindlist function. Like this:

filelist <- c("file1.txt", "file2.txt", "file3.txt")

That's all for now! Hope everyone has a Happy Thanksgiving!


  1. Hi your code is not correct:
    1. You missed a bracket
    2. When I have added the bracket it throws an error message:

    Error in FUN(X[[i]], ...) :
    unused arguments (use.names = TRUE, idcol = "FileName")

    Thank you, Arni

    1. Great catch, Arni! I forgot the bracket after simplify = FALSE. It has been added above. The code should work now.

    2. the code still doesn't work. it throws in the error

      Error in FUN(X[[i]], ...) :
      unused arguments (use.names = TRUE, idcol = T)