Skip to content
Home » Solve DtypeWarning: Columns have mixed types. Specify dtype option on import or set low_memory=False in Pandas

Solve DtypeWarning: Columns have mixed types. Specify dtype option on import or set low_memory=False in Pandas

The title of this blog post says it all: let’s solve a warning. It’s one that occurs when you’re using Pandas’ read_csv. Let’s start by just doing that.

import pandas as pd
pd.read_csv('file.csv')

Let’s say you run into the following error:

Solve DtypeWarning: Columns (X,X) have mixed types. Specify dtype option on import or set low_memory=False in Pandas

When you get this warning when using Pandas’ read_csv, it basically means you are loading in a CSV that has a column that consists out of multiple dtypes. For example: 1,5,a,b,c,3,2,a has a mix of strings and integers.

Pandas is really nice because instead of stopping altogether, it guesses which dtype a column has. However, by default, Pandas has the low_memory=True argument. This means that the CSV file gets split up in multiple chunks and the guess is done for every chunk, resulting in a column with multiple dtypes. Pandas is so kind to let you know it was confused and something might have happened.

Ok, so how do we solve this error? Basically, you hint Pandas what to do when it runs into this kind of situation. Three solutions.

Set the low_memory argument of read_csv to False

This is the lazy solution, and it’s simply bad practice. By setting the low_memory argument to False, you’re basically telling Pandas not to be efficient, and process the whole file, all at once. You can imagine this is an issue for really big files. Also, this doesn’t fix the error, it simply silences it.

import pandas as pd
pd.read_csv('file.csv', low_memory=False)

Specify the dtype of the confusing columns manually

A better solution is to specify the column dtypes.

import pandas as pd
pd.read_csv('file.csv', dtype={'first_column': 'str', 'second_column': 'str'})

But be careful: if your column contains strings and you’re manually defining it as an integer, you’ll run into an error.

ValueError: invalid literal for int() with base 10

Use converters

You can fix the previous error as follows. Instead of specifying the dtypes manually, you specify how to convert the values that are read from the CSV file.

def convert_dtype(x):
    if not x:
        return ''
    try:
        return str(x)   
    except:        
        return ''

pd.read_csv('file.csv',converters={'first_column': convert_dtype,'second_column': convert_dtype})

This is a solution I really like when specifying the dtypes generates errors. Keep in mind that converting is a fairly slow process, and can take a while for very large files.

Happy coding!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

9 thoughts on “Solve DtypeWarning: Columns have mixed types. Specify dtype option on import or set low_memory=False in Pandas”

  1. Hi, I am curious to know how to convert float64 to float32, to avoid memory error.

    MemoryError: Unable to allocate 231. GiB for an array with shape (176058, 176058) and data type float64

  2. Hi Roel,
    Maybe it has to do with Pandas versioning, but I spotted two errors in the specify dtype code above: 1) A closing curly bracket is missing and 2) I think you need to use a colon instead of an equals sign, i.e.:
    dtype={“user_id”: int, “username”: “string”})

    1. Thanks, Brad! Thanks for the suggestions. Apparently, when I “dumb down” the examples after pasting them into the blog post, I make mistakes without a proper code editor.

Leave a Reply

Your email address will not be published. Required fields are marked *