**In this blog post, I explore the options for taking the row-wise sum of a subset of columns from a Pandas DataFrame. **

import numpy as np import pandas as pd df = pd.DataFrame({ 'first_column': np.random.randint(0,100,50000), 'second_column': np.random.randint(0,100,50000), 'third_column': np.random.randint(0,100,50000), 'other_column': np.random.randint(0,100,50000) })

Let’s try to take the row-wise sum of the columns *first_column*, *second_column,* and *third_column*. This means we’re leaving *other_column* out.

But first, let’s take a step back. If you’d like to sum all the columns, you simply have to use the sum method and set the *axis* parameter to 1.

df.sum(axis = 1)

If we’d like to sum a subset of the columns, we have to select them, before calling the *sum* method. This can be done in multiple ways.

## Row-wise sum of a subset of columns via integer indices

First, we’ll sum the columns by subsetting them by their integer index. For this, we use the iloc method.

If the columns are in a sequential position, you can use **a slice** object.

df.iloc[:,0:3].sum(axis = 1) # 500 microseconds

If the columns are not in a sequential position, you can use **a list**. Keep in mind that this subsetting method is twice as slow as using a slice.

df.iloc[:,[0,1,2]].sum(axis = 1) # 1 millisecond

You can also use the DataFrame as a NumPy array object, by using *df.values*. On that object, you can use a slice (which uses basic index) or a list indexer (which uses advanced/fancy index mechanisms, slowing it down).

df.values[:,0:3].sum(axis = 1) # 160 microseconds df.values[:,[0,1,2]].sum(axis = 1) # 350 microseconds

**You can see that is by far the fastest way of subsetting columns and summing them.**

## Row-wise sum of a subset of columns via labels

Next thing we try is summing a subset of columns that we selected via their labels. Once again, this is twice to 2.5 times as slow as using a slice. You can use *loc*, or drop it.

df.loc[:,['first_column', 'second_column', 'third_column']].sum(axis = 1) df[['first_column', 'second_column', 'third_column']].sum(axis = 1) # 1.3 milliseconds

Pandas also supports a slice via labels. The following is more concise and once again, a lot faster.

df.loc[:,'first_column':'third_column'].sum(axis = 1) # 650 microseconds

An alternative is to turn each column into a pandas Series and sum them individually. In my opinion, this isn’t a very elegant solution. It’s faster than the list subset but not as fast as the slice subset.

df['first_column'] + df['second_column'] + df['third_column'] # 900 microseconds