Skip to content
Home » Solve Pandas “ValueError: cannot reindex from a duplicate axis”

Solve Pandas “ValueError: cannot reindex from a duplicate axis”

Recently, I’ve been working with Pandas DataFrames that had a DateTime as the index. When I tried reindexing the DataFrame (using the reindex method), I bumped into an error. Let’s find out what causes it and how to solve it.

The Python error I’m talking about is:

ValueError: cannot reindex from a duplicate axis

A “duplicate axis”? My first assumption was that my DataFrame had the same index in for the columns and the rows, which makes no sense.

Apparently, the python error is the result of doing operations on a DataFrame that has duplicate index values. Operations that require unique index values need to align the values with the index. Joining with another DataFrame, reindexing a DataFrame, resampling a DataFrame simply will not work.

It makes one wonder why Pandas even supports duplicate values in the index. Doing some research, I found out it is something the Pandas team actively contemplated:

If you’re familiar with SQL, you know that row labels are similar to a primary key on a table, and you would never want duplicates in a SQL table. But one of pandas’ roles is to clean messy, real-world data before it goes to some downstream system. And real-world data has duplicates, even in fields that are supposed to be unique.

Unlike many other data wrangling libraries and solutions, Pandas acknowledges that the data you’ll be working with messy data. However, it wants to help you clean it up.

Test if an index contains duplicate values

Simply testing if the values in a Pandas DataFrame are unique is extremely easy. They’ve even created a method to it:

df.index.is_unique

This will return a boolean: True if the index is unique. False if there are duplicate values.

Test which values in an index are duplicate

To test which values in an index are duplicate, one can use the duplicated method, which returns an array of boolean values to identify if a value has been encountered more than once.

df.index.duplicated()

Drop rows with duplicate index values

Using duplicated(), we can also remove values that are duplicates. Using the following line of code, when multiple rows share the same index, only the first one encountered will remain — following the same order in which the DataFrame is ordered, from top to bottom. All the others will be deleted.

df.loc[~df.index.duplicated(), :]

Prevent duplicate values in a DataFrame index

To make sure a Pandas DataFrame cannot contain duplicate values in the index, one can set a flag. Setting the allows_duplicate_labels flag to False will prevent the assignment of duplicate values.

df.flags.allows_duplicate_labels = False

Applying this flag to a DataFrame with duplicate values, or assigning duplicate values will result in the following error:

DuplicateLabelError: Index has duplicates.

Duplicate column names

Columns names are indices too. That’s why each of these methods also apply to columns.

df.columns.is_unique
df.columns.duplicated()
df.loc[:, ~df.columns.duplicated()]

By the way, I didn’t necessarily come up with this solution myself. Although I’m grateful you’ve visited this blog post, you should know I get a lot from websites like StackOverflow and I have a lot of coding books. This one by Matt Harrison (on Pandas 1.x!) has been updated in 2020 and is an absolute primer on Pandas basics. If you want something broad, ranging from data wrangling to machine learning, try “Mastering Pandas” by Stefanie Molin.

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Good luck on cleaning your data!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

6 thoughts on “Solve Pandas “ValueError: cannot reindex from a duplicate axis””

Leave a Reply

Your email address will not be published. Required fields are marked *