Home » How to flatten a MultiIndex Pandas DataFrame

How to flatten a MultiIndex Pandas DataFrame

  • by
  • 2 min read

Coming from R, I’m not a big fan of MultiIndex DataFrames in Pandas. I definitely see the merits, but it just doesn’t feel right within a machine learning and feature engineering context. However, sometimes you will end up with a MultiIndex DataFrame, after some ninja line of code. In this blog post I explain how to flatten a MultiIndex DataFrame.

First, some dummy date to demonstrate.

import pandas as pd
df = pd.DataFrame(data = {'id': ['aaa', 'aaa', 'bbb', 'bbb', 'ccc'],
                          'val': [4, 5, 10, 3, 1]})

A typical situation that results in a MultiIndex DataFrame is when you use groupby and apply multiple aggregation functions to a column. Like this:

df.groupby(['id'], as_index = True).agg({'val': ['mean', 'median', 'sum']})

This is the result:

What if I want column names like this: val_mean, val_median and val_sum? Well, this is how you can do it:

df.columns = ['_'.join(col).strip() for col in df.columns.values]

Why does this work?

Well, let’s take a look at df.columns.values. What happens is the following. The column level names get returned as tuples. By looping over these tuples and joining the values within the tuples, tfor each column, the different column level names are concatenated in one column name.

Great success!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Leave a Reply

Your email address will not be published. Required fields are marked *