Skip to content
Home » How to split Pandas column on delimiter and one-hot encode it

How to split Pandas column on delimiter and one-hot encode it

Today, I tried a data transformation that seemed so obvious: splitting the string values of a Pandas column on a delimiter and one-hot encode the resulting strings. However, it took me quite some time to figure out how to do it elegantly.

Here’s what I wanted to achieve. I had a DataFrame like this:

indexstring_column
1apple,pear,banana
2apple
3pear,apple

I wanted to turn it into this:

Indexstring_columnapplepearbanana
1apple,pear,banana111
2apple100
3pear,apple011

To break it down, this can be achieved by doing two transformations:

  • Split string on a delimiter
  • One-hot encode the resulting values

Although it looks seemingly easy, I had a hard time imagining how one goes from the intermediate state (columns with the first, second and third string after splitting them) to the final state.

Luckily, Pandas has an out-of-the-box method for achieving both transformations at once. That method is the get_dummies Series method, which differs a lot from Pandas’ general function with the same name.

By using the sep parameter, one can apply one-hot encoding to a single Series that has multiple values split by a delimiter:

df['string_column'].str.get_dummies(sep = ',')

Simple as that!

By the way, I didn’t necessarily come up with this solution myself. Although I’m grateful you’ve visited this blog post, you should know I get a lot from websites like StackOverflow and I have a lot of coding books. This one by Matt Harrison (on Pandas 1.x!) has been updated in 2020 and is an absolute primer on Pandas basics. If you want something broad, ranging from data wrangling to machine learning, try “Mastering Pandas” by Stefanie Molin.

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

17 thoughts on “How to split Pandas column on delimiter and one-hot encode it”

  1. May I simply say what a comfort to discover somebody who genuinely knows what they are talking about over the internet. You actually understand how to bring a problem to light and make it important. More people ought to check this out and understand this side of the story. I cant believe you arent more popular because you surely possess the gift.

  2. Very great post. I simply stumbled upon your weblog and wanted to
    say that I’ve really enjoyed browsing your weblog posts.
    In any case I’ll be subscribing in your rss feed
    and I’m hoping you write once more very soon!

  3. Oh my goodness! Incredible article dude! Many thanks, However I
    am having difficulties with your RSS. I don’t understand the reason why I cannot subscribe to it.
    Is there anyone else getting the same RSS issues? Anybody who knows the answer can you kindly respond?
    Thanx!!

  4. A fascinating discussion is definitely worth comment.

    There’s no doubt that that you need to write more about this subject,
    it may not be a taboo subject but typically folks don’t talk about such
    topics. To the next! Cheers!!

  5. This is really interesting, You’re a very skilled blogger. I’ve joined your feed and look forward to seeking more of your magnificent post. Also, I’ve shared your site in my social networks!

Leave a Reply

Your email address will not be published. Required fields are marked *