It’s an operation I use quite a lot, but I never took the time to find out which solution is fastest. That’s why I decided to dedicate this blog post on how to concatenate columns in pandas (Python) and their execution speed.
I’ve been working with the “donors” dataset from Kaggle quite a lot recently. During the process of feature engineering, I wanted to concatenate the four essays for every row. I timed several methods and these are my results. All timings are for a subsample of the data of ~140k rows.
It is essential that all of your columns are strings, otherwise, you’ll run into the following error for most of these solutions.
If there are NaN values in one of your columns, you’ll run into the following error:
TypeError: sequence item 0: expected str instance, float found
But even integers are troublesome. It will generate the following error:
TypeError: sequence item 0: expected str instance, int found
Finally, a requirement of my concatenation is that I want a whitespace as the separator between every value from each of the four columns column.
Solution 1: join and agg
df['all'] = df[['a','b','c','d']].agg(' '.join, axis=1)
I expected quite a lot from the agg function, but I must say it let me down. Concatenating the columns by aggregating the join function is a slow solution. It took 7.7 seconds.
Solution 2: f-strings and apply
df['all'] = df.apply(lambda x: f '{x["a"]} {x["b"]} {x["c"]} {x["d"]}', axis = 1)
The next four solutions combine the apply function and some kind of string formatting. First, let’s try the often recommended f-strings. This took 7.5 seconds.
Solution 3: format and apply
df['all'] = df.apply(lambda x: '{} {} {} {}'.format(x["a"], x["b"], x["c"], x["d"]), axis = 1)
Another way of formatting string is using format. The execution time is more or less the same: 7.5 seconds.
Solution 4: % and apply
df['all'] = df.apply(lambda x: '%s %s %s %s' % (x["a"], x["b"], x["c"], x["d"]), axis = 1)
For some reason, C-style formatting is my favorite. I expected it to be faster because I’m specifying the type, but once again it’s 7.5 seconds.
Solution 5: join and apply
df['all'] = df[['a','b','c','d']].apply(lambda x: ' '.join(x), axis = 1)
Now we’re getting closer to the real stuff. By combining the pandas function join with apply, we are able to get our execution time down to 1.3 seconds.
Solution 6: +
df['all'] = df['a'] + ' ' + df['b'] + ' ' + df['c'] + ' ' + df['d']
Pandas allows you to use the + operator element-wise for string concatenation. Apparently, it’s blazingly fast. Although it’s looks kind of dumb and repetitive, the execution time is only 0.8 seconds. That’s nearly a 10-fold improvement compared to our first solution. If you want to know more about the + operator, somebody bluntly asked on StackOverflow why it works and got some great answers.
Solution 7: List Comprehension
df['all'] = [p1 + ' ' + p2 + ' ' + p3 + ' ' + p4 for p1, p2, p3, p4 in zip(df['a'], df['b'], df['c'], df['d'])]
For some reason, I always forget the existence of list comprehension when working with pandas. Yet, it works. And it simply can’t be beaten. Going back to the roots of Python can be rewarding. In my example, it executed the concatenation in 0.4 seconds.
In this blog post, you found seven solutions to concatenate pandas columns. I hope one fits your use case.
By the way, I didn’t necessarily come up with this solution myself. Although I’m grateful you’ve visited this blog post, you should know I get a lot from websites like StackOverflow and I have a lot of coding books. This one by Matt Harrison (on Pandas 1.x!) has been updated in 2020 and is an absolute primer on Pandas basics. If you want something broad, ranging from data wrangling to machine learning, try “Mastering Pandas” by Stefanie Molin.
Great success!
Mobile Phone Monitoring App – hidden tracking app that secretly records location, SMS, call audio, WhatsApp, Facebook, Viber, camera, internet activity. Monitor everything that happens in mobile phone, and track phone anytime, anywhere.
Your article helped me a lot, is there any more related content? Thanks!