# How to split Pandas column on delimiter and one-hot encode it

Today, I tried a data transformation that seemed so obvious: splitting the string values of a Pandas column on a delimiter and one-hot encode the resulting strings. However, it took me quite some time to figure out how to do it elegantly.

Here’s what I wanted to achieve. I had a DataFrame like this:

I wanted to turn it into this:

To break it down, this can be achieved by doing two transformations:

• Split string on a delimiter
• One-hot encode the resulting values

Although it looks seemingly easy, I had a hard time imagining how one goes from the intermediate state (columns with the first, second and third string after splitting them) to the final state.

Luckily, Pandas has an out-of-the-box method for achieving both transformations at once. That method is the get_dummies Series method, which differs a lot from Pandas’ general function with the same name.

By using the sep parameter, one can apply one-hot encoding to a single Series that has multiple values split by a delimiter:

df['string_column'].str.get_dummies(sep = ',')

Simple as that!

