Roel Peters

How to take a row-wise sum of columns in Pandas

by roelpi
January 3, 2022January 10, 2022
3 min read

In this blog post, I explore the options for taking the row-wise sum of a subset of columns from a Pandas DataFrame. Let’s try to take the row-wise sum of the columns first_column, second_column, and third_column. This means we’re leaving other_column out. But first, let’s take a step back. If…

by roelpi
December 16, 2021February 25, 2022
11 Comments
3 min read

I’ve been working with Terraform recently for deploying the required architecture for a data pipeline in Google Cloud Platform (GCP). It’s my first project with Terraform and I decided to dive into its random_something resources, because I couldn’t have duplicate resource names. Here are some findings. Randomize a resource name…

by roelpi
December 15, 2021December 15, 2021
1 min read

In this blog post we’ll do a small thing to remove trailing or leading whitespaces from a DataFrame’s column names. This is something that can occur when working with files that haven’t been properly formatted. There are multiple things you can do to fix this. First, let’s create some dummy…

by roelpi
November 11, 2021November 12, 2021
5 Comments
2 min read

In this article, you’ll learn how to execute shell commands using the subprocess package in Python. This article is part of a two-part series related to running shell commands from within Python. Part 1: Execute shell commands with the os package Part 2: Execute shell commands with the subprocess package…

by roelpi
November 11, 2021November 11, 2021
2 min read

Running shell (e.g. Bash) commands in Python is fairly easy using the os package. In this article, I outline two ways to run shell commands in Python: using the system method and the popen method. This article is part of a two-part series related to running shell commands from within…

by roelpi
November 3, 2021November 3, 2021
1 Comment
3 min read

In this article, we’ll discuss an error I ran into when trying to create SSH keys for a service account in Google Cloud Platform (GCP). Nevertheless, I assume it’s something that can happen in many situations where you’re trying to perform an operation as a service account. I hope to…

by roelpi
November 1, 2021November 3, 2021
2 min read

I would like to briefly elaborate on an error I ran into, which is due to limited documentation on the Google Cloud Platform API. With this article, I hope to save you the time and effort to find the solution. I’ve been following an infrastructure-as-code approach to deploying GCP resources…

by roelpi
October 18, 2021October 19, 2021
4 min read

Because there are (too) many ways to cast a Pandas DataFrame from long to wide format, I decided to list four ways to achieve that goal. The four functions I describe in this article are the following. Function Object aggregation Can handle NaNs pivot DataFrame no no pivot_table DataFrame yes…

by roelpi
October 11, 2021October 11, 2021
1 Comment
2 min read

In this article we elaborate on the multiple ways to remove the final line of a CSV in Python when loading it with Pandas’ read_csv function. Remove final row(s) after loading the file Removing the final row from a Pandas DataFrame can be done with a simple slice. The following…

by roelpi
September 13, 2021September 13, 2021
3 min read

I’ve been working with Airflow recently, and I needed to configure an operator by passing it the execution time. Or at least I thought so. I really felt like writing findings down, because if you’re in the dark here, like I was today, there are some fundamental Airflow principles you…

« Previous
1
2
3
4
5
…
24
Next »

How to take a row-wise sum of columns in Pandas

How to generate random strings in Terraform

How to remove trailing whitespaces from column headers in Pandas

Python: run shell commands using the subprocess package

Python: execute shell commands (and get the output) with the os package

Fix “End user credentials must match the user specified in the request.” in GCP

Fix “Source url of disk is missing” when attaching disk to instance in GCP using Python

Four ways to cast a Pandas DataFrame from long to wide format

Skip final row when reading a CSV with Pandas

Solve: “name ‘ds’ is not defined” in Airflow