**In a previous post, I explained how you can sample two Pandas DataFrame exactly the same way. In this blog post, I want to use that helper function to undersample your predictors and target variable. **

When you are working with an imbalanced data set, it’s often good practice to under- or oversample your data for training your model. While there are some great Python packages to under- and oversample your datasets, none are fully built with DataFrames in mind. That’s why I wrote a simple undersample function that returns an undersampled version of your DataFrames.

For the coding, I assume two things (but feel free to tailor the code to your specific needs — and share it in the comments).

- you have your predictors and target variables in separate data frames.
- you are working on a binary classification problem.

First, let’s load the helper function from the previous blog post.

import pandas as pd
import random
def sample_together(n, X, y):
rows = random.sample(np.arange(0,len(X.index)).tolist(),n)
return X.iloc[rows,], y.iloc[rows,]

Next, we get to the *undersample* function. It takes three arguments: a predictor DataFrame, a target DataFrame and the label of the minority class.

def undersample(X, y, under = 0):
y_min = y[y.project_is_approved == under]
y_max = y[y.project_is_approved != under]
X_min = X.filter(y_min.index,axis = 0)
X_max = X.filter(y_max.index,axis = 0)
X_under, y_under = sample_together(len(y_min.index), X_max, y_max)
X = pd.concat([X_under, X_min])
y = pd.concat([y_under, y_min])
return X, y
X_train, y_train = undersample(X_train, y_train)

What happens:

- Both DataFrame get split in two: one for the majority and one for the minority class.
- The
*sample_together* function is used and the sample size of the majority class is set to the minority class sample size. The resampled DataFrames for the majority class are returned. - I union the DataFrames of the minority and the majority class and return them.

There you have it: a function to easily undersample a Pandas DataFrame for a binary classification problem.

Great success!

### Say thanks, ask questions or give feedback

**Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.**