Home » Subsetting a Pandas DataFrame on multiple conditions, Part 2: Parentheses

Subsetting a Pandas DataFrame on multiple conditions, Part 2: Parentheses

  • by
  • 2 min read

This blog post is the second post in a two-part series on subsetting Pandas DataFrame rows using chained conditions. In this post, we tackle the following TypeError.

TypeError: cannot compare a dtyped [object] array with a scalar of type [bool]


Filtering (or subsetting) a DataFrame can easily be done using the loc property, which can access a group of rows and columns by label(s) or a boolean array. To filter rows, one can also drop loc completely, and implicitly call it by putting the conditioning booleans between square brackets.

💥 Watch out, if your conditions are a list of strings, it will filter the columns.

# explicit row and column filter using loc
df.loc[row_condition, column_condition]

# implicit row filter by passing a list of booleans
condition = [True, False, True]
df[condition] # will filter rows

# implicit column filter by passing a list of booleans
condition = ['column_a', 'column_b']
df[condition] # will filter columns

By chaining conditions, you can filter on multiple conditions, all at once. Nevertheless, make sure to use proper parentheses.

First, let’s try not using parentheses around our conditions, as can be seen in the chunk below.

df.loc[df.column_a == 'some_value' & df.column_b == 'another_value']

The error you’ll run into is the following:

TypeError: cannot compare a dtyped [object] array with a scalar of type [bool]

This happens because & has higher precedence than ==. Here’s what the Pandas documentation has to say about it.

Another common operation is the use of boolean vectors to filter the data. The operators are: | for or, & for and, and ~ for not. These must be grouped by using parentheses, since by default Python will evaluate an expression such as df[‘A’] > 2 & df[‘B’] < 3 as df[‘A’] > (2 & df[‘B’]) < 3, while the desired evaluation order is (df[‘A’] > 2) & (df[‘B’] < 3).

The correct way to combine multiple conditions (whether it’s an and or an or), is by adding the necessary parentheses, as follows.

df.loc[(df.column_a == 'some_value') & (df.column_b == 'another_value')]

This will make sure that == is processed before & and that no errors are thrown.

As you can see, I’m using the bitwise operator, and not the boolean operator. More details can be found in part 1 of this post.

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Leave a Reply

Your email address will not be published. Required fields are marked *