Home » How to: Concatenate strings from multiple rows by group (Pandas)

How to: Concatenate strings from multiple rows by group (Pandas)

  • by
  • 2 min read

There are many situations that you have an 1:n relationship, where you would have wanted an 1:1 relationship. Sometimes, it makes sense to concatenate the strings from rows belonging to the same group. Let’s do that in Python, using Pandas.

Imagine, you have the following table that contains an ID and a fruit variety. Each ID corresponds to a specific kind of fruit: apples (32), pears (84) and grapes (12). Let’s say you’re trying to join this table with its ID on another table.

IDVariety
32Jonagold
32Golden
32Granny Smith
84Conference
84Doyenné
12Thompson Seedless

However, you want a 1:1 relationship, so your table should look like this:

IDVarieties
32Jonagold, Golden, Granny Smith
84Conference, Doyenné
12Thompson Seedless

You can see that the table is grouped by its ID and the varieties are joined and separated by a comma.

df.head() # our table that contains the fruit
df['Variety'] = df.groupby(['ID'])['Variety'].transform(lambda x: ', '.join(x))

To achieve this, we group the DataFrame by the ID, and select the Variety column. We transform by applying a lambda function on all the rows. This lambda function concatenates all the values within the group, separated by a comma and space.

You now have the following table.

IDVarieties
32Jonagold, Golden, Granny Smith
32 Jonagold, Golden, Granny Smith
32 Jonagold, Golden, Granny Smith
84Conference, Doyenné
84 Conference, Doyenné
12Thompson Seedless

Finally, deduplicate the table, and reset the index.

df = df.drop_duplicates().reset_index(drop = True)

Congratulations, you have concatenated strings from multiple rows by category.

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Leave a Reply

Your email address will not be published. Required fields are marked *