**When you’re an R poweruser, pivoting tables in pandas feels unnecessarily complex. Why are there two pivot functions? Why does it return an index when you wanted a column? Why does it generate multi index columns? Those are the questions I tackle in this blog post.**

💥 This blog post has received an update over here.

To answer some questions about pivoting in pandas, I first generate some dummy data.

```
import random
import pandas as pd
import numpy as np
fruits = random.choices(['apple','pear','plum'], k = 1000)
colors = random.choices(['green','yellow','red','orange','brown'], k = 1000)
tastes = random.choices(['sour','bitter','sweet'], k = 1000)
sizes = random.choices(['small','big'], k = 1000)
calories = np.round(np.random.normal(50,10,1000), decimals = 0)
df = pd.DataFrame({'fruit': fruits,
'color': colors,
'taste': tastes,
'size': sizes,
'calories': calories})
df.head()
```

Let’s say, we want to turn the colors into columns by pivoting them using the *pivot_table()* function. For every fruit we want to know the amount per color. If we do this analogously to how we use dcast in R, we would do something like this. First of all, if we don’t want the fruit as the index, but as a column we have to use the *reset_index()* function. Resetting the index is not necessary. However, as an R user, it feels more natural to me. If you like stacking and unstacking DataFrames, you shouldn’t reset the index.

`df.pivot_table(columns = 'color', index = 'fruit', aggfunc = len).reset_index()`

But more importantly, we get this strange result. Every column we didn’t use in our *pivot_table()* function has been used to calculate the number of fruits per color and the result is constructed in a hierarchical DataFrame. That wasn’t supposed to happen.

There’s two ways we can solve this. First, we can select one column that we want to feed to the *len()* function (the aggfunc parameter). The second option is to limit the amount of columns you want to work with before applying the *pivot_table()* function to it.

```
df.pivot_table(columns = 'color', index = 'fruit', values = 'taste', aggfunc = len).reset_index() # option 1
df[['color','fruit']].pivot_table(columns = 'color', index = 'fruit', aggfunc = len).reset_index() #option 2
```

Both solutions will produce the same result.

Okay, but what does the *pivot()* function offer? This is what the documentation says:

Reshape data (produce a “pivot” table) based on column values. Uses unique values.

from specified index / columns to form axes of the resulting DataFrameThis function does not support data aggregation, multiple values will result in a MultiIndex in the columns.

Basically, the *pivot_table()* function is a generalization of the *pivot()* function that allows aggregation of values — for example, through the *len() *function in the previous example. Pivot only works — or makes sense — if you need to pivot a table and show values without any aggregation. Here’s an example.

```
fruits = ['apple','apple','apple','pear','pear','pear','prune','prune','prune']
colors = ['red','green','orange','red','green','orange','red','green','orange']
calories = np.round(np.random.normal(50,10,9), decimals = 0)
df = pd.DataFrame({'fruit': fruits,
'color': colors,
'calories': calories})
df.pivot(index = 'fruit', columns = 'color', values = 'calories').reset_index()
```

Nevertheless, you can get the same result using pivot_table, but it’s a bit silly to take the mean of a single value.

`df.pivot_table(index = 'fruit', columns = 'color', values = 'calories', aggfunc = np.mean).reset_index()`

Pivotting in pandas offers a lot more functionalities than in R. As a pandas starter, these features felt somewhat overwhelming to me. By sharing my struggles, I hope you have learned a thing or two.

Great success!