Without a doubt, Pandas is one of the most popular Python packages within Data Science. However, coming from an R background, discovering the difference between the join and merge method was mind-boggling. Especially if it generates confusing errors that urge you to use pd.concat, which you shouldn’t.
This is the error I’m talking about:
ValueError: You are trying to merge on datetime64[ns] and object columns. If you wish to proceed you should use pd.concat
Both the merge and join method can trigger it:
- When using the join method: you are joining DataFrames on labels and not on indices, the labels are of a different type
- When using the merge method: you are joining DataFrames on two columns that are not of the same type
Using ;join
E.g.: this code will produce the error.
data_x.join(data_y, on='key')
In this scenario, you can edit your code to join on the index, rather than on a column. In the following code, I set the index on the columns of both data frames:
data_x.set_index('key').join(data_y.set_index('key'))
You could just change the column type. But then you’ll run into the following error, requiring you to set a suffix for both data frames.
ValueError: columns overlap but no suffix specified: Index([‘key’], dtype=’object’)
If you are interest, read my other blog post on this topic. In the next paragraph, you’ll find out if merge is more of a help.
Using .merge
This is a chunk of code that generates the error:
data_x.merge(data_y, on='key')
In this second scenario, with merge, you can simply change the column type of one of the columns. A convenient way is through the astype method. Since we’re joining dates, you’ll use datetime64[ns].
data_x.key.astype('datetime64[ns]')
data_y.key.astype('datetime64[ns]')
data_x.merge(data_y, on='key')
Good luck!
Thanks but I tried without equating to the same data frame and it didn’t work but equating to the same data frame helped.
data_x=data_x.key.astype(‘datetime64[ns]’)
data_y=data_y.key.astype(‘datetime64[ns]’)
data_x.merge(data_y, on=’key’)
Hope it helps!