Skip to content
Home » Concatenating column values in pandas

Concatenating column values in pandas

Having R as my data science lingua franca, some things in pandas feel less intuitive. As I’m slowly transitioning to pandas, I’ll be posting a lot of examples over the next couple of weeks. Here’s the first one: concatenating columns values of a pandas DataFrame into another (new) column.

First let’s create some dummy data. It contains tastes, colors and fruits. The goal of this example is to make a column that has a concatenation of a taste, a color and a fruit on each row. For example: ‘sweet red apple’.

import pandas as pd
from timeit import timeit

taste = ['sweet','sour','sweet','bitter']
color = ['red','green','yellow','red']
fruit = ['apple','pear','banana','cherry']

data = {'taste': taste, 'color': color, 'fruit': fruit}
df = pd.DataFrame(data)

First, let’s go with map(), which works element-wise. We apply it to the first pandas series we’d like to concatenate, and then use the + operator to concatenate each of the values to corresponding values in the other series. Since it’s element-wise, it is rather slow. — 1.5 microseconds

df['option1'] = df['taste'].map(str) + ' ' + df['color'] + ' ' + df['fruit']

A faster solution is by working column-wise, using the apply function (yay, something that feels like R). Although it’s faster, it feels like clunky coding. But that’s rather subjective. — 1.2 microseconds

df['option2'] = df.apply(lambda x: '%s %s %s' % (x['taste'], x['color'], x['fruit']), axis = 1)

When all else fails, there’s always list comprehension. It works on pandas series and it’s also superfast. — 320 nanoseconds

df['option3'] = ['%s %s %s' % (x, y, z) for x, y, z in zip(df['taste'], df['color'], df['fruit'])]

If you’re a fan of R’s data.table, the next solution feels the most elegant. By accessing the numpy representation of each column, we can use numpy’s string operations, which supports the + operator. — 250 nanoseconds, we have a winner!

df['option4'] = df['taste'].values + ' ' + df['color'].values + ' ' + df['fruit'].values

There are some variants to this last solution, but they are somewhat slower. First variant is switching .values for .to_numpy(). Which is only slightly slower, but is the recommended (and consistent) way according to the pandas documentation (0.24+).

df['option4a'] = df['taste'].to_numpy() + ' ' + df['color'].to_numpy() + ' ' + df['fruit'].to_numpy()

I’ve been using C-style string formatting a lot, so we can do that here to. To my surprise, it’s three times slower.

df['option4b'] = ('%s %s %s' % (df['taste'].values, df['color'].values, df['fruit']))

Great success!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

4 thoughts on “Concatenating column values in pandas”

  1. Очень трендовые новинки моды.
    Исчерпывающие новости всемирных подуимов.
    Модные дома, торговые марки, haute couture.
    Новое место для стильныех людей.

  2. Полностью трендовые новости индустрии.
    Абсолютно все эвенты всемирных подуимов.
    Модные дома, торговые марки, гедонизм.
    Самое приятное место для трендовых хайпбистов.

  3. Полностью трендовые события модного мира.
    Абсолютно все события известнейших подуимов.
    Модные дома, торговые марки, высокая мода.
    Свежее место для модных хайпбистов.

Leave a Reply

Your email address will not be published. Required fields are marked *