Pandas can be somewhat puzzling, sometimes. It has a ton load of functionalities, but that can make the syntax and methods obscure. Simply judging from the method name, the ‘join’ and ‘merge’ method could be the same thing. However, they aren’t.
Here’s an error that I used to run into a lot as a beginning data scientist, exploring Python.
ValueError: You are trying to merge on object and float64 columns. If you wish to proceed you should use pd.concat
It can occur in two scenarios:
- When using the join method: you are joining DataFrames on one of their columns and not on their indices that are not of the same type
- When using the merge method: you are joining DataFrames on columns that are not of the same type
You are trying to join on labels and not on indices using the join method
This is an example that generates the error:
In the first scenario, you can edit your code to join on the index. In the following code, I set the index on the columns I want to join, effectively turning them into strings.
Why don’t we just change the type of the column? Because then, you’d run into another error when you’re still using the join method.
ValueError: columns overlap but no suffix specified: Index([‘key’], dtype=’object’)
If you want to know why, read my other blog post on this topic. If you find changing indices unnecessary, maybe you should try the merge method. But make sure your columns are of the same type, as described below.
You are joining on columns of different types using the merge method
An example that generates the error:
In this second scenario, you can simply change the column type of one of the columns — or both. A convenient way is through the astype method.
df_x.key.astype(int) df_y.key.astype(int) df_x.merge(data_y, on='key')