How to flatten a MultiIndex Pandas DataFrame

Coming from R, I’m not a big fan of MultiIndex DataFrames in Pandas. I definitely see the merits, but it just doesn’t feel right within a machine learning and feature engineering context. However, sometimes you will end up with a MultiIndex DataFrame, after some ninja line of code. In this blog post I explain how to flatten a MultiIndex DataFrame.

🐱‍🏍Update (2021-09-03): blog post that uses to_flat_index!

First, some dummy date to demonstrate.

import pandas as pd
df = pd.DataFrame(data = {'id': ['aaa', 'aaa', 'bbb', 'bbb', 'ccc'],
                          'val': [4, 5, 10, 3, 1]})

A typical situation that results in a MultiIndex DataFrame is when you use groupby and apply multiple aggregation functions to a column. Like this:

df.groupby(['id'], as_index = True).agg({'val': ['mean', 'median', 'sum']})

This is the result:

What if I want column names like this: val_mean, val_median and val_sum? Well, this is how you can do it:

df.columns = ['_'.join(col).strip() for col in df.columns.values]

Why does this work?

Well, let’s take a look at df.columns.values. What happens is the following. The column level names get returned as tuples. By looping over these tuples and joining the values within the tuples, tfor each column, the different column level names are concatenated in one column name.

By the way, I didn’t necessarily come up with this solution myself. Although I’m grateful you’ve visited this blog post, you should know I get a lot from websites like StackOverflow and I have a lot of coding books. This one by Matt Harrison (on Pandas 1.x!) has been updated in 2020 and is an absolute primer on Pandas basics. If you want something broad, ranging from data wrangling to machine learning, try “Mastering Pandas” by Stefanie Molin.

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Great success!

How to flatten a MultiIndex Pandas DataFrame

Say thanks, ask questions or give feedback

Say thanks, ask questions or give feedback

1 thought on “How to flatten a MultiIndex Pandas DataFrame”

Leave a Reply Cancel reply

How to flatten a MultiIndex Pandas DataFrame

Say thanks, ask questions or give feedback

Say thanks, ask questions or give feedback

1 thought on “How to flatten a MultiIndex Pandas DataFrame”

Leave a Reply Cancel reply

Related Posts

How to do a SUMIF in PySpark

Check if Python logger already exists

Spark 3.0: Solving the “dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z” error