Skip to content
Home ยป Concatenate columns in pandas (as fast as possible)

Concatenate columns in pandas (as fast as possible)

It’s an operation I use quite a lot, but I never took the time to find out which solution is fastest. That’s why I decided to dedicate this blog post on how to concatenate columns in pandas (Python) and their execution speed.

I’ve been working with the “donors” dataset from Kaggle quite a lot recently. During the process of feature engineering, I wanted to concatenate the four essays for every row. I timed several methods and these are my results. All timings are for a subsample of the data of ~140k rows.

It is essential that all of your columns are strings, otherwise, you’ll run into the following error for most of these solutions.

If there are NaN values in one of your columns, you’ll run into the following error:

TypeError: sequence item 0: expected str instance, float found

But even integers are troublesome. It will generate the following error:

TypeError: sequence item 0: expected str instance, int found

Finally, a requirement of my concatenation is that I want a whitespace as the separator between every value from each of the four columns column.

Solution 1: join and agg

df['all'] = df[['a','b','c','d']].agg(' '.join, axis=1)

I expected quite a lot from the agg function, but I must say it let me down. Concatenating the columns by aggregating the join function is a slow solution. It took 7.7 seconds.

Solution 2: f-strings and apply

df['all'] = df.apply(lambda x: f '{x["a"]} {x["b"]} {x["c"]} {x["d"]}', axis = 1)

The next four solutions combine the apply function and some kind of string formatting. First, let’s try the often recommended f-strings. This took 7.5 seconds.

Solution 3: format and apply

df['all'] = df.apply(lambda x: '{} {} {} {}'.format(x["a"], x["b"], x["c"], x["d"]), axis = 1)

Another way of formatting string is using format. The execution time is more or less the same: 7.5 seconds.

Solution 4: % and apply

df['all'] = df.apply(lambda x: '%s %s %s %s' % (x["a"], x["b"], x["c"], x["d"]), axis = 1)

For some reason, C-style formatting is my favorite. I expected it to be faster because I’m specifying the type, but once again it’s 7.5 seconds.

Solution 5: join and apply

df['all'] = df[['a','b','c','d']].apply(lambda x: ' '.join(x), axis = 1)

Now we’re getting closer to the real stuff. By combining the pandas function join with apply, we are able to get our execution time down to 1.3 seconds.

Solution 6: +

df['all'] = df['a'] + ' ' + df['b'] + ' ' + df['c'] + ' ' + df['d']

Pandas allows you to use the + operator element-wise for string concatenation. Apparently, it’s blazingly fast. Although it’s looks kind of dumb and repetitive, the execution time is only 0.8 seconds. That’s nearly a 10-fold improvement compared to our first solution. If you want to know more about the + operator, somebody bluntly asked on StackOverflow why it works and got some great answers.

Solution 7: List Comprehension

df['all'] = [p1 + ' ' + p2 + ' ' + p3 + ' ' + p4 for p1, p2, p3, p4 in zip(df['a'], df['b'], df['c'], df['d'])]

For some reason, I always forget the existence of list comprehension when working with pandas. Yet, it works. And it simply can’t be beaten. Going back to the roots of Python can be rewarding. In my example, it executed the concatenation in 0.4 seconds.

In this blog post, you found seven solutions to concatenate pandas columns. I hope one fits your use case.

By the way, I didn’t necessarily come up with this solution myself. Although I’m grateful you’ve visited this blog post, you should know I get a lot from websites like StackOverflow and I have a lot of coding books. This one by Matt Harrison (on Pandas 1.x!) has been updated in 2020 and is an absolute primer on Pandas basics. If you want something broad, ranging from data wrangling to machine learning, try “Mastering Pandas” by Stefanie Molin.

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Great success!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

1 thought on “Concatenate columns in pandas (as fast as possible)”

  1. Mobile Phone Monitoring App – hidden tracking app that secretly records location, SMS, call audio, WhatsApp, Facebook, Viber, camera, internet activity. Monitor everything that happens in mobile phone, and track phone anytime, anywhere.

Leave a Reply

Your email address will not be published. Required fields are marked *