Home ยป Data Science in the Cloud: Azure Notebooks + GitHub

Data Science in the Cloud: Azure Notebooks + GitHub

  • by
azure-notebooks-integration-github
Want to do a random act of kindness? Share this post.

I recently bought a Google Chromebook. It’s light, it’s fast, great battery, but it doesn’t support all the development tools that I’m used to when working on a Windows computer. There’s no Visual Studio Code (Python) and no RStudio (R). Me being a data scientist, that hurt in the beginning. In this blog post I explain how I moved my data science workflow to the cloud.

Azure Notebooks with Version Control in GitHub

I prefer the desolation of an unclutered editor, that’s why I wasn’t really a big fan of notebooks. However, moving to the cloud, it seemed like the logical thing to do, unless you want to set up your own Rstudio Server on a GCP Compute Engine instance, for example.

There are multiple services that offer Jupyter notebooks in the cloud:

They all have their pro’s and cons. I really wanted to work with Google Colab but there are two features missing: it doesn’t support R and it’s not possible to import complete GitHub Repositories. So that’s why I chose Azure Notebooks. Here are its main advantages:

  • It’s free
  • It supports the following languages: Python, R and F#
  • You have access to the terminal
  • Complete version control with GitHub using the terminal
  • Basic point & click version control with GitHub
  • You can install additional packages
  • The same keyboard shortcuts as traditional Jupyter notebooks
  • Hardware: 4GB RAM and 1GB disk space per project
  • Access to additional Azure resources

The only thing that is missing are collaboration features. There’s no way to work on a file with multiple peeople at the same time. The only way to do this properly is by working out version control procedures between you and your colleagues. Oh and finally, the interface is ridiculously ugly, I can’t even close the EU cookie banner.

Collaboration brings me to the next tool/platform: GitHub. Microsoft acquired the popular version control platform for 7.5 billion dollar. Although there is already some minor integration between Azure Notebooks and GitHub, we can expect both platforms to integrate further in the future. If you know your basic git commands, Azure Notebooks and GitHub work perfectly together. In the next section of this blog post I explain how I use both tools.

The field of data science is fairly new. New technologies and tools are emerging constantly. Want to make a small difference on your next job interview? Tell them that you know git. DataCamp offers a very intuitive introductory course to git. I even completed it myself. Try it for free or pay $25 a month to get access to hands-on data science training in a wide range of topics.

Workflow

Create a new repository on GitHub.

Next, give your repository a name and description. Next, you can set your project to private or public. Potentially add a README. If you will be working with Python, you should add a .gitignore, so your git client won’t suggest to add unwanted files to the repo. Finally, click Create Repository.

Now we are ready to clone this repository into Azure Notebooks. On the top right, click Clone or download and copy the web URL.

On the My Projects page of Azure Notebooks, you can create a New Project or Upload GitHub Repo. There is a caveat here: Azure Notebooks doesn’t support private GitHub repositories (yet — december 2019). So you will have to start from a new project and use the terminal to import your repository.

Project Name doesn’t have to be the same as in your GitHub repo, but it makes sense to be consistent ofcourse. Since we initialized a README in the repo, we don’t have to do that here again. Click Create

Next, open up the terminal. It takes a couple of seconds for the terminal to load properly.

Change to the project folder that you can access through the My Projects interface. Finally, clone the link you got from the GitHub repository page.

You can now sync with GitHub for version control purposes, sharing with your colleagues or any other purpose. Final note, your repository will not be the root folder of your project. It will be within a folder.

Don’t forget to configure your name and email address.

You are now ready to keep your Azure Notebooks project and your GitHub remote repository in sync.

And when you use git push, you’ll be asked to enter your GitHub username and password.

Great success!

Want to do a random act of kindness? Share this post.