The title of this blog post says it all: let’s solve a warning. It’s one that occurs when you’re using Pandas’ read_csv. Let’s start by just doing that.
import pandas as pd
pd.read_csv('file.csv')
Let’s say you run into the following error:
Solve DtypeWarning: Columns (X,X) have mixed types. Specify dtype option on import or set low_memory=False in Pandas
When you get this warning when using Pandas’ read_csv, it basically means you are loading in a CSV that has a column that consists out of multiple dtypes. For example: 1,5,a,b,c,3,2,a has a mix of strings and integers.
Pandas is really nice because instead of stopping altogether, it guesses which dtype a column has. However, by default, Pandas has the low_memory=True argument. This means that the CSV file gets split up in multiple chunks and the guess is done for every chunk, resulting in a column with multiple dtypes. Pandas is so kind to let you know it was confused and something might have happened.
Ok, so how do we solve this error? Basically, you hint Pandas what to do when it runs into this kind of situation. Three solutions.
Set the low_memory argument of read_csv to False
This is the lazy solution, and it’s simply bad practice. By setting the low_memory argument to False, you’re basically telling Pandas not to be efficient, and process the whole file, all at once. You can imagine this is an issue for really big files. Also, this doesn’t fix the error, it simply silences it.
import pandas as pd
pd.read_csv('file.csv', low_memory=False)
Specify the dtype of the confusing columns manually
A better solution is to specify the column dtypes.
import pandas as pd
pd.read_csv('file.csv', dtype={'first_column': 'str', 'second_column': 'str'})
But be careful: if your column contains strings and you’re manually defining it as an integer, you’ll run into an error.
ValueError: invalid literal for int() with base 10
Use converters
You can fix the previous error as follows. Instead of specifying the dtypes manually, you specify how to convert the values that are read from the CSV file.
def convert_dtype(x):
if not x:
return ''
try:
return str(x)
except:
return ''
pd.read_csv('file.csv',converters={'first_column': convert_dtype,'second_column': convert_dtype})
This is a solution I really like when specifying the dtypes generates errors. Keep in mind that converting is a fairly slow process, and can take a while for very large files.
Happy coding!
Keep on writing, great job!
Hi, I am curious to know how to convert float64 to float32, to avoid memory error.
MemoryError: Unable to allocate 231. GiB for an array with shape (176058, 176058) and data type float64
Hey Onkar
Could you share your code and some info in your dataset? Reach out to hallo [at] roelpeters.be.
Best regards
Roel
Hi Roel,
Maybe it has to do with Pandas versioning, but I spotted two errors in the specify dtype code above: 1) A closing curly bracket is missing and 2) I think you need to use a colon instead of an equals sign, i.e.:
dtype={“user_id”: int, “username”: “string”})
Thanks, Brad! Thanks for the suggestions. Apparently, when I “dumb down” the examples after pasting them into the blog post, I make mistakes without a proper code editor.
Hi ! what do i need to fill inside the converts code? (str/int)?
Thank you so much for the time you spent sharing this. Really helped me.
thanks
Instalación simple y descarga gratuita, no se requieren conocimientos técnicos y no se requiere raíz.Grabacion de llamadas, Grabacion de entorno, Ubicaciones GPS, Mensajes Whatsapp y Facebook, Mensajes SMS y muchas características mas. https://www.mycellspy.com/es/tutorials/
medicijnen verkrijgbaar in Belgische apotheek Cassara Heemskerk médicaments de qualité pharmaceutique en vente en ligne
médicaments : Ce que vous devez savoir sur l’achat en ligne Bluefish
Arzano acheter médicaments en Wallonie
Acheter médicaments pas cher Sandoz Barendrecht
Waar kan ik medicijnen kopen in België?
I am sure this article has touched all the internet users, its really really pleasant post on building up new blog.
acquistare farmaci Polipharm Casoria Medikamente in der Apotheke in den Niederlanden