For example, the following error is a real newb issue.
ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
It can occur in two scenarios:
- When using the join method: you are probably joining DataFrames on labels and not on indices
- When using the merge method: you are probably joining DataFrames on two columns that are not of the same type
You are trying to join on labels and not on indices using the join method
This is an example that generates the error:
In the first scenario, you can edit your code to join on the index. In the following code, I set the index on the columns I want to join.
You could just change the column type. But then you’ll run into another error, requiring you to specify a suffix for both data frames.
ValueError: columns overlap but no suffix specified: Index([‘key’], dtype=’object’)
If you want to know why, read my other blog post on this topic. If you find changing indices unnecessary, find out if the next paragraph might be more of a help — by using the merge method on columns of the same type.
You are joining on columns of different types using the merge method
An example that generates the error:
In this second scenario, you can simply change the column type of one of the columns — or both. A convenient way is through the astype method.
data_x.key.astype(int) data_y.key.astype(int) data_x.merge(data_y, on='key')
By the way, I didn’t necessarily come up with this solution myself. Although I’m grateful you’ve visited this blog post, you should know I get a lot from websites like StackOverflow and I have a lot of coding books. This one by Matt Harrison (on Pandas 1.x!) has been updated in 2020 and is an absolute primer on Pandas basics. If you want something broad, ranging from data wrangling to machine learning, try “Mastering Pandas” by Stefanie Molin.