Home ยป Replacing multiple values in a pandas DataFrame column

Replacing multiple values in a pandas DataFrame column

  • by
python pandas replace multiple values
Want to do a random act of kindness? Share this post.

Without going into detail, here’s something I truly hate in R: replacing multiple values. In Python’s pandas, it’s really easy. In this blog post I try several methods: list comprehension, apply(), replace() and map().

First, let’s create some dummy data.

import pandas as pd
from timeit import timeit
import re

taste = ['sweet','sour','sweet','bitter'] * 1000
color = ['red','green','yellow','red'] * 1000
fruit = ['apple','pear','banana','cherry'] * 1000

data = {'taste': taste, 'color': color, 'fruit': fruit}

df = pd.DataFrame(data)

val = {'red':'vermillion','green':'emerald'}

First, let’s try with list comprehension. The get() function tries to find the initial color from my dictionary (first x) and replaces it with the corresponding value. The second x is what it should be replaced with if the key cannot be found. — 6.8 milliseconds

pd.Series([val.get(x,x) for x in df['color']])

Of course, we can also use an apply function. You’ll discover it is rather slow. — 10.8 milliseconds.

df['color'].apply(lambda x: val.get(x,x))

But wait, there are some pandas-native functions that are available for this purpose. replace() definitely seems to be the most elegant way. But, it’s also not very fast. If you go through the code, you’ll see that this function involves a lot of conversions. — 10.1 milliseconds

df['color'].replace(val)

map() is faster than replace. Its code is a lot more comprehensive. To my surprise, it’s also faster than apply(). — just like list comprehension: 6.8 milliseconds. We have a winner.

df['color'].map(val, na_action = 'ignore')

Great success!

Want to do a random act of kindness? Share this post.