There are very few occasions when you really need a hierarchical column index in Pandas. Mostly, you just want to identify a column with a single column name. In this blog post, we’ll learn how to properly flatten a DataFrame with MultiIndex columns.
One of the things that annoys me a lot in Pandas, is how it returns MultiIndex columns, after using agg() when you apply multiple functions to one column. To get rid of the MultiIndex, we need to take two steps.
๐ Side note: make sure you have Pandas >= 0.24.
Two steps to flatten MultiIndex columns
โ Step 1: flatten the index
Using the to_flat_index function, we can make sure that all columns contain all levels of the index.
df.columns = df.columns.to_flat_index()
โ Step 2: concatenate all levels of the index
After you have flattened the index, you can apply this list comprehension over the index. For each column, it will join the different levels of the index as one string, with a separating _ in between.
df.columns = ['_'.join(column) for column in df.columns]
Or put it all together
If you want to apply the two steps in one go, you can simply do this:
df.columns = ['_'.join(column) for column in df.columns.to_flat_index()]
Good job!