Home » How to read ZIP files in R

How to read ZIP files in R

  • by
  • 2 min read

In this blog post I discuss how you can load compressed CSV files, such as .zip and .tar.gz. Nowadays, many packages support it and we’ll go over the different methods.

When data sets are ping-ponged across an organization, in order to limit network and storage usage, they often come in a compressed format. Instead of losing time unzipping the file manually, it’s perfectly fine to load these files directly into R.

Using base code, loading a compressed file containing one or two CSV files can be done using the unz function. You can even load files that are within a folder inside that ZIP file.

read.csv(unz('twofiles.zip','second_file.csv'), header = T)
read.csv(unz('onefile.zip','only_file.csv'), header = T)
read.csv(unz('twofiles_in_folder.zip','twofiles/mtcars2.csv'), header = T)

Read a zipped file using data.table‘s fread() can be done by specifying a CLI command. You need to have (g)unzip in your PATH variable, or have (g)unzip in your project folder. By the way, you can achieve the same with 7-zip.

fread(cmd = 'unzip -p onefile.zip') # Windows
fread(cmd = 'gunzip -cq onefile.zip') # Linux
fread(cmd = '7z e -so onefile.Zip') # 7-zip

Using vroom, loading a single zipped file is even easier because you don’t need to specify any commands, at all.


By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!

Great success!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Leave a Reply

Your email address will not be published. Required fields are marked *