How to read ZIP files in R

In this blog post I discuss how you can load compressed CSV files, such as .zip and .tar.gz. Nowadays, many packages support it and we’ll go over the different methods.

When data sets are ping-ponged across an organization, in order to limit network and storage usage, they often come in a compressed format. Instead of losing time unzipping the file manually, it’s perfectly fine to load these files directly into R.

Using base code, loading a compressed file containing one or two CSV files can be done using the unz function. You can even load files that are within a folder inside that ZIP file.

read.csv(unz('twofiles.zip','second_file.csv'), header = T)
read.csv(unz('onefile.zip','only_file.csv'), header = T)
read.csv(unz('twofiles_in_folder.zip','twofiles/mtcars2.csv'), header = T)

Read a zipped file using data.table‘s fread() can be done by specifying a CLI command. You need to have (g)unzip in your PATH variable, or have (g)unzip in your project folder. By the way, you can achieve the same with 7-zip.

fread(cmd = 'unzip -p onefile.zip') # Windows
fread(cmd = 'gunzip -cq onefile.zip') # Linux
fread(cmd = '7z e -so onefile.Zip') # 7-zip

Using vroom, loading a single zipped file is even easier because you don’t need to specify any commands, at all.


Great success!

