Home » How to read ZIP files in R

How to read ZIP files in R

Tags:

In this blog post I discuss how you can load compressed CSV files, such as .zip and .tar.gz. Nowadays, many packages support it and we’ll go over the different methods.

When data sets are ping-ponged across an organization, in order to limit network and storage usage, they often come in a compressed format. Instead of losing time unzipping the file manually, it’s perfectly fine to load these files directly into R.

Using base code, loading a compressed file containing one or two CSV files can be done using the unz function. You can even load files that are within a folder inside that ZIP file.

read.csv(unz('twofiles.zip','second_file.csv'), header = T)
read.csv(unz('twofiles_in_folder.zip','twofiles/mtcars2.csv'), header = T)

Read a zipped file using data.table‘s fread() can be done by specifying a CLI command. You need to have (g)unzip in your PATH variable, or have (g)unzip in your project folder. By the way, you can achieve the same with 7-zip.

fread(cmd = 'unzip -p onefile.zip') # Windows
fread(cmd = 'gunzip -cq onefile.zip') # Linux
fread(cmd = '7z e -so onefile.Zip') # 7-zip

Using vroom, loading a single zipped file is even easier because you don’t need to specify any commands, at all.

vroom('onefile.zip')

By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!

Great success!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

2 thoughts on “How to read ZIP files in R”

1. Thanks a lot!!!

1. Vroom is boom.. thanks a ton