Skip to content
Home » How to check if a column exists in a pandas DataFrame

How to check if a column exists in a pandas DataFrame

In some situations, especially when adding some basic error handling to your Python scripts, you want to check if a column exists before performing operations on it. In this blog post I tell you how.

For starters, let’s load the iris dataset from the Seaborn package on GitHub. I’ll be using it for demonstrating several ways of checking if a particular colmn or multiple columns exists in pandas.

import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

Check if a column exists in a Pandas DataFrame

Checking if one column exists is really easy. The most straightforward way is through the following line of code

'sepal_width' in df

Why does this work? The pandas documentation gives an intuitive explanation:

“You can treat a DataFrame semantically like a dict of like-indexed Series objects. Getting, setting, and deleting columns works with the same syntax as the analogous dict operations.”

So if we can treat a pandas DataFrame as a dictionary, let’s just do that. In the following example, the string is matched to all the keys of a dictionary and also returns True, exactly the same as in the previous example with the DataFrame.

'sepal_width' in {'sepal_width': 0, 'sepal_length': 0, 'petal_width': 0, 'petal_length': 0}

For readability purposes, you will more often encounter the following. Instead of matching the string with the DataFrame, it is matched with the column names explicitly.

'sepal_width' in df.columns

This works, because the columns are an Index, and pandas indices are simply (NumPy) arrays. You can simply match a string with an array.

Check if multiple columns exist in a Pandas DataFrame

There are two ways to interpret this title. Do we want to check if all the columns exist, or do we want to check which columns exist? We can loosely interpret this as an OR and an AND.

OR

List comprehension is our best friend. By looping over the column names you want to match and the columns of the data frame, you will get a list of Trues and Falses.

[c in df.columns for c in ['sepal_width','sepal_length']]

AND

Hands-down the most elegant way is by using Python sets and the issubset function.

{'sepal_width', 'sepal_length'}.issubset(df.columns)

But, more in line with the OR solution (infra), you can use list comprehension and the all function.

all([c in df.columns for c in ['sepal_width','sepal_length']])

By the way, I didn’t necessarily come up with this solution myself. Although I’m grateful you’ve visited this blog post, you should know I get a lot from websites like StackOverflow and I have a lot of coding books. This one by Matt Harrison (on Pandas 1.x!) has been updated in 2020 and is an absolute primer on Pandas basics. If you want something broad, ranging from data wrangling to machine learning, try “Mastering Pandas” by Stefanie Molin.

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Great success!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

1 thought on “How to check if a column exists in a pandas DataFrame”

  1. Najlepsza aplikacja do kontroli rodzicielskiej, aby chronić swoje dzieci – potajemnie tajny monitor GPS, SMS-y, połączenia, WhatsApp, Facebook, lokalizacja. Możesz zdalnie monitorować aktywność telefonu komórkowego po pobraniu i zainstalowaniu apk na telefonie docelowym.

Leave a Reply

Your email address will not be published. Required fields are marked *