Home » How to upload files to Google Colab

How to upload files to Google Colab

  • by
  • 4 min read

It’s one of the first hurdles you run into when you use Google Colab: how do I get my data in there? A good question, because other cloud notebook solutions (like Azure Notebooks) allow you to upload your files through the interface. Google Colab does not, but its deep integration with Google Drive offers opportunities. This blog post helps you get this solved in no time.

I will elaborate on the three most convenient options: uploading the file, using PyDrive and mounting your Google Drive.

Option 1: Upload it

The first solution is pretty straightforward. By using files from the google.colab package, you can manually select upload files from your computer to your notebook kernel’s local variables. Keep in mind that if your kernel is restarted, you’ll have to reupload the files again.

from google.colab import files
import pandas as pd

uploaded = files.upload()
pd.read_csv(io.StringIO(uploaded['train.csv'].decode('utf-8')))

Clearly, this is a quick and dirty solution. If you plan on working on a project for a couple of weeks, this might not be the best option for you.

Option 2: Use PyDrive and Google Drive

PyDrive is a wrapper for the Python Google Drive API. It offers many functionalities, including interacting with files that are stored in Google Drive.

When you choose to use PyDrive and run the following code, you’ll be redirected to an authentication page, which will return a key that you can use to identify yourself within Colab. Once you paste the key in the input field in Colab, you can copy files from your Google Drive and use them in your kernel.

import pandas as pd
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

# Authenticate with Google
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)

def read_csv_from_drive(file_id, file_name):
    dl = drive.CreateFile({'id': file_id})
    dl.GetContentFile(file_name)
    return pd.read_csv(file_name)

train = read_csv_from_drive('<file_id>', 'train.csv')

To download files to your kernel, you’ll need to know the file ID. The easiest way is to generate a sharing link and get it from the returned URL.

I really like this solution, myself. Because it is reproducible you only need to map the files once. This is especially useful if you’re dealing with a multitude of files. Even more importantly: you can share the files with your collaborators and they too will be able to access them and properly run the notebook.

But there’s a drawback. If you’re working with huge files, you might not want to download the files from your Drive to your kernel. It could take a while. That’s why there’s a third solution.

Option 3: mount your drive

The third and final solution is to mount your complete Google Drive to the kernel. This way, your Google Drive will be treated like it’s a local disk in your kernel. It’s really easy:

from google.colab import drive
drive.mount('/content/drive')
pd.read_csv('<path>')

To get the path to a file, you simply copy the path from the file explorer in Colab on the left-hand side of the interface.

By the way, you can also mount your drive with the click of a button now.

Just like the PyDrive method, it’s reproducible (for yourself) and is a good way to handle a lot of files. The biggest drawback here is that I currently don’t see how you can share files between your collaborators.

Conclusion

RequirementUploadPyDriveMounting
Quick and dirtyyesnono
Many filesnoyesyes
Reproduciblenoyesyes
Sharingnoyesno
Huge filesnoyes, but noyes

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Leave a Reply

Your email address will not be published. Required fields are marked *