Pandas is the go-to package for anything data science in Python. However, if you’re used to R and the convenience of dplyr or data.table, pandas can be a real pain in the ###, sometimes.
For example, the following error is a real newb issue.
You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat
It can occur in two scenarios:
- using the join method: you are probably joining DataFrames on labels and not on indices
- using the merge method: you are probably joining DataFrames on two columns that are not of the same type
You are trying to join on labels and not on indices using the join method
In the first scenario, you can edit your code to join on the index. In the following code, I set the index on the columns I want to join.
But what might even be more simple, is replacing the join method with the merge method.
You are joining on columns of different types using the merge method
In this second scenario, you can simply change the column type of one of the columns — or both. A convenient way is through the astype method.
data_x.key.astype(int) data_y.key.astype(int) data_x.merge(data_y, on='key')