R Loop Through Subfolders

[This article was first published on Daniel MarcelinoDaniel Marcelino » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have a loop which runs through the files in a folder and subfolder. I need this to be recursive (so it runs through all subfolders until they are all done). I have never done any recursive coding so would really appreciate being shown how! Or if there is better code then I am using, please make any suggestions. This code works for the subfolders of the named directory, but ignores any subfolders of the subfolder. If you get my meaning. I'm not sure how on the syntax on how to search all subfolders subfolders of a chosen directory. There are alot of posts regarding looping through directories, but I can't quite get the job done.

Loop

Today, I finally got inspired to deal with tons of datasets from the Tribunal Superior Eleitoral on the Brazilian elections. The cause of the delay for putting my finger on them was simply to avoid troubles with messy large text files. The set of data I collect consists of above 40GB of pure text files, which reports electoral results, candidates’ profile, campaign revenues and expenditures etc. Therefore, if anything it may be a good example of using R for data management, and that it might be useful for students while dealing with messy datasets from everywhere.

The task can be stated as follows. Suppose you have a set of data files (data1.txt, data2.txt, […] ,data27.txt) which represents some data–or a subset data–sliced by states or electoral districts. What you want to do is simply stack every data file into a beautiful unique file for more aggregated analyses, or just releasing the computer from storing too many sliced data. In sum, the task is to obtain a table of all subsets; more complex cases will be addressed on later posts. This can be done by browsing to the directory where the files are, then looping through them importing and merging. Finally, the aggregated file can be written back to the disk.

The piece of code below does just that. The first line paste the path where R must look at for the files. The second line creates an empty data object to store each of the importing files if any. The third line reads the path to the files, and then a loop for reading each existing file of type “.txt” as table. The last line in the loop creates the final table by appending each subset that was imported into memory. Finally, the last part of the program, which is out of the loop for efficiency purpose, simply write the final table to the disk as a text file,delimiting the columns by semicolon ‘;’.

To leave a comment for the author, please follow the link and comment on their blog: Daniel MarcelinoDaniel Marcelino » R.
R-bloggers.com offers Loop

R Loop Through Subdirectories

daily e-mail updates

R Loop Through Subfolders

about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.