Compile many data files into one big data frame

Frequently, we collect multiple data sets for a single experiment that all need to be combined together.

Here’s how to combine all of your long-form txt data into one compiled data frame in R.

# here's where the data is
file_location <- "C:\\data"
# set that as the working directory
setwd(file_location)
# list all the files in that directory
all_files <- list.files(pattern = ".txt")
# for each item in that list, read it as a comma-separated table with a header,
# and bind all of the rows together. 
# Assign the big output to a data.frame called 'all_data'
all_data<-do.call("rbind", lapply(all_files, read.table, 
                header = TRUE, sep=','))

You’ll notice that the code above will combine ALL of the txt files in the folder.

But I don’t want to include all the files

Suppose you want to exclude certain files based on a pattern. In this case, I am excluding all files that contain “Prac”.

file_location <- "C:\\data"
setwd(file_location)
all_files <- list.files(pattern = ".txt")
# the practice files have "Prac" in the filename
prac_files <- list.files(pattern = "Prac")
test_files <- all_files[!all_files %in% prac_files] 
all_data<-do.call("rbind", lapply(test_files, read.table, 
                header = TRUE, sep=','))

But I have a big header section

Suppose you have a complete header portion of your document that spans multiple lines, like this:

Date: 9/12/2010
Time: 4:07pm
Location: Silver Spring, MD
Participant: GAL
Experiment:AFI
Trial,Stimulus,Response,RT,Accuracy
1,beep,boop,350,1
2,beep,bip,410,1
3,beep,bop,364,1
4,beep,bep,390,0
5,beep,bap,333,1
… … In this case, you could add a ‘skip’ argument inside the lapply() function, like this:

numberofSkippedRows = 5
all_data<-do.call("rbind", lapply(test_files, read.table, 
                header = TRUE, sep=',', skip=numberofSkippedRows))   

Important note:

All of this code assumes that each of your files has all the information it needs directly in the delimited text. It will NOT be ideal if you have essential information in the filename or in a header that is is skipped. In other words, be sure that you record and save your data in LONG format.

your raw data should like this:

Participant,Experiment,Condition,Trial,Stimulus,Response
S001,LexicalDecision,Masked,1,Doctor,Nurse
S001,LexicalDecision,Masked,2,Black,White
S001,LexicalDecision,Masked,3,Tree,Leaf
S001,LexicalDecision,Masked,4,Shirt,Pants

and NOT like this:

Participant: S001
Experiment: LexicalDecision
Condition:Masked
Trial,Stimulus,Response
1,Doctor,Nurse
2,Black,White
3,Tree,Leaf
4,Shirt,Pants

because if you have it in the second format, the indexical information will be lost. …(of course you could always write a custom function to gather those data)

’# # ## ### ##### ######## ############# #####################

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.