Site icon Roel Peters

Pandas: Solve ‘You are trying to merge on object and int64 columns’

Pandas is the go-to package for anything data science in Python. However, if you’re used to R and the convenience of dplyr or data.table, pandas can be confusing, now and then.

For example, the following error is a real newb issue.

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

It can occur in two scenarios:

  1. When using the join method: you are probably joining DataFrames on labels and not on indices
  2. When using the merge method: you are probably joining DataFrames on two columns that are not of the same type

You are trying to join on labels and not on indices using the join method

This is an example that generates the error:

data_x.join(data_y, on='key')

In the first scenario, you can edit your code to join on the index. In the following code, I set the index on the columns I want to join.

data_x.set_index('key').join(data_y.set_index('key'))

You could just change the column type. But then you’ll run into another error, requiring you to specify a suffix for both data frames.

ValueError: columns overlap but no suffix specified: Index([‘key’], dtype=’object’)

If you want to know why, read my other blog post on this topic. If you find changing indices unnecessary, find out if the next paragraph might be more of a help — by using the merge method on columns of the same type.

You are joining on columns of different types using the merge method

An example that generates the error:

data_x.merge(data_y, on='key')

In this second scenario, you can simply change the column type of one of the columns — or both. A convenient way is through the astype method.

data_x.key.astype(int)
data_y.key.astype(int)
data_x.merge(data_y, on='key')

By the way, I didn’t necessarily come up with this solution myself. Although I’m grateful you’ve visited this blog post, you should know I get a lot from websites like StackOverflow and I have a lot of coding books. This one by Matt Harrison (on Pandas 1.x!) has been updated in 2020 and is an absolute primer on Pandas basics. If you want something broad, ranging from data wrangling to machine learning, try “Mastering Pandas” by Stefanie Molin.

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Great succes!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Exit mobile version