Solve DtypeWarning: Columns have mixed types. Specify dtype option on import or set low_memory=False in Pandas

The title of this blog post says it all: let’s solve a warning. It’s one that occurs when you’re using Pandas’ read_csv. Let’s start by just doing that.

import pandas as pd
pd.read_csv('file.csv')

Let’s say you run into the following error:

Solve DtypeWarning: Columns (X,X) have mixed types. Specify dtype option on import or set low_memory=False in Pandas

When you get this warning when using Pandas’ read_csv, it basically means you are loading in a CSV that has a column that consists out of multiple dtypes. For example: 1,5,a,b,c,3,2,a has a mix of strings and integers.

Pandas is really nice because instead of stopping altogether, it guesses which dtype a column has. However, by default, Pandas has the low_memory=True argument. This means that the CSV file gets split up in multiple chunks and the guess is done for every chunk, resulting in a column with multiple dtypes. Pandas is so kind to let you know it was confused and something might have happened.

Ok, so how do we solve this error? Basically, you hint Pandas what to do when it runs into this kind of situation. Three solutions.

Set the low_memory argument of read_csv to False

This is the lazy solution, and it’s simply bad practice. By setting the low_memory argument to False, you’re basically telling Pandas not to be efficient, and process the whole file, all at once. You can imagine this is an issue for really big files. Also, this doesn’t fix the error, it simply silences it.

import pandas as pd
pd.read_csv('file.csv', low_memory=False)

Specify the dtype of the confusing columns manually

A better solution is to specify the column dtypes.

import pandas as pd
pd.read_csv('file.csv', dtype={'first_column': 'str', 'second_column': 'str'})

But be careful: if your column contains strings and you’re manually defining it as an integer, you’ll run into an error.

ValueError: invalid literal for int() with base 10

Use converters

You can fix the previous error as follows. Instead of specifying the dtypes manually, you specify how to convert the values that are read from the CSV file.

def convert_dtype(x):
    if not x:
        return ''
    try:
        return str(x)   
    except:        
        return ''

pd.read_csv('file.csv',converters={'first_column': convert_dtype,'second_column': convert_dtype})

This is a solution I really like when specifying the dtypes generates errors. Keep in mind that converting is a fairly slow process, and can take a while for very large files.

Happy coding!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

9 thoughts on “Solve DtypeWarning: Columns have mixed types. Specify dtype option on import or set low_memory=False in Pandas”

cindy August 22, 2020 at 6:18 am

Keep on writing, great job!

Onkar November 20, 2020 at 3:02 am

Hi, I am curious to know how to convert float64 to float32, to avoid memory error.

MemoryError: Unable to allocate 231. GiB for an array with shape (176058, 176058) and data type float64

1. roelpi November 21, 2020 at 2:44 pm
  
  Hey Onkar
  
  Could you share your code and some info in your dataset? Reach out to hallo [at] roelpeters.be.
  
  Best regards
  Roel
  
Brad April 6, 2021 at 9:33 am

Hi Roel,
Maybe it has to do with Pandas versioning, but I spotted two errors in the specify dtype code above: 1) A closing curly bracket is missing and 2) I think you need to use a colon instead of an equals sign, i.e.:
dtype={“user_id”: int, “username”: “string”})

1. roelpi April 17, 2021 at 2:40 pm
  
  Thanks, Brad! Thanks for the suggestions. Apparently, when I “dumb down” the examples after pasting them into the blog post, I make mistakes without a proper code editor.
  
roy April 29, 2021 at 11:18 am

Hi ! what do i need to fill inside the converts code? (str/int)?

Jose June 30, 2021 at 6:20 pm

Thank you so much for the time you spent sharing this. Really helped me.

rajendra July 4, 2021 at 9:03 am

thanks

Rastrear teléfono February 10, 2024 at 1:44 pm

Instalación simple y descarga gratuita, no se requieren conocimientos técnicos y no se requiere raíz.Grabacion de llamadas, Grabacion de entorno, Ubicaciones GPS, Mensajes Whatsapp y Facebook, Mensajes SMS y muchas características mas. https://www.mycellspy.com/es/tutorials/

Solve DtypeWarning: Columns have mixed types. Specify dtype option on import or set low_memory=False in Pandas

Set the low_memory argument of read_csv to False

Specify the dtype of the confusing columns manually

Use converters

Say thanks, ask questions or give feedback

9 thoughts on “Solve DtypeWarning: Columns have mixed types. Specify dtype option on import or set low_memory=False in Pandas”

Leave a Reply Cancel reply

Related Posts

How to do a SUMIF in PySpark

Check if Python logger already exists

Spark 3.0: Solving the “dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z” error