Solve Pandas read_csv: UnicodeDecodeError: ‘utf-8’ codec can’t decode byte […] in position […] invalid continuation byte

by roelpi
August 30, 2021August 30, 2021
38 Comments
2 min read

Reading CSVs is always a little bit living on the edge, especially when multiple regions are involved in producing them. In this blog post, we’re solving UnicodeDecodeError: ‘utf-8’ codec can’t decode byte […] in position […]: invalid continuation byte.

Important, I’m assuming you got the error when you used Pandas’ read_csv() to read a CSV file into memory.

df = pd.read_csv('your_file.csv')

When Pandas reads a CSV, by default it assumes that the encoding is UTF-8. When the following error occurs, the CSV parser encounters a character that it can’t decode.

UnicodeDecodeError: 'utf-8' codec can't decode byte [...] in position [...]: invalid continuation byte.

😐 Okay, so how do I solve it?

If you know the encoding of the file, you can simply pass it to the read_csv function, using the encoding parameter. Here’s a list of all the encodings that are accepted in Python.

df = pd.read_csv('your_file.csv', encoding = 'ISO-8859-1')

If you don’t know the encoding, there are multiple things you can do.

Use latin1: In the example below, I use the latin1 encoding. Latin1 is known for interpreting basically every character (but not necessarily as the character you’d expect). Consequently, the chances are that latin1 will be able to read the file without producing errors.

df = pd.read_csv('your_file.csv', encoding = 'latin1')

Manual conversion: Your next option would be to manually convert the CSV file to UTF-8. For example, in Notepad++, you can easily do that by selecting Convert to UTF-8 in the Encoding menu.

Automatic detection: However, a much easier solution would be to use Python’s chardet package, aka “The Universal Character Encoding Detector”. In the following code chunk, the encoding of the file is stored in the enc variable, which can be retrieved using enc[‘encoding’].

import chardet
import pandas as pd

with open('your_file.csv', 'rb') as f:
    enc = chardet.detect(f.read())  # or readline if the file is large
    
pd.read_csv('your_file.csv', encoding = enc['encoding'])

Great success!

By the way, I didn’t necessarily come up with this solution myself. Although I’m grateful you’ve visited this blog post, you should know I get a lot from websites like StackOverflow and I have a lot of coding books. This one by Matt Harrison (on Pandas 1.x!) has been updated in 2020 and is an absolute primer on Pandas basics. If you want something broad, ranging from data wrangling to machine learning, try “Mastering Pandas” by Stefanie Molin.

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

38 thoughts on “Solve Pandas read_csv: UnicodeDecodeError: ‘utf-8’ codec can’t decode byte […] in position […] invalid continuation byte”

Rayen December 30, 2021 at 3:04 pm

enc[“encoding”] returns the wrong encoding for some reason.

RuntimeError: Other(“encoding not ascii not implemented.”)

Reply
1. Garett April 8, 2022 at 3:21 pm
  
  I received a similar error. Chardec detected ‘ascii’ at confidence = 1.0, and pandas returned the encoding error. Ultimately, both ‘latin1’ and ‘windows-1252’ worked for me, with the latin1 solution coming from this site and ‘windows-1252’ coming from StackExchange.
  
  Reply
Gary Li June 25, 2022 at 2:33 pm

Same as Garett, I used the codec ‘latin1’ and the problem was resolved. Thank you author for kindly sharing your experiences!

Reply
1. Isaque November 17, 2022 at 1:41 pm
  
  Saved my life too
  
  Reply
amar December 7, 2022 at 12:49 pm

Thanks for the solution. Saved lot of time

Reply
Leroux May 3, 2023 at 3:44 pm

Thanks !

Reply
seeker November 10, 2023 at 1:07 pm

Writе more, thats aⅼl I haｖe tⲟ say. Ꮮiterally, it seems as though you relied
on the video to make youг point. You obviously know what youre
talking about, why waste your intelⅼigence ⲟn jᥙst posting videos to your weblog
when you could be giving us something enliցhtening to reɑd?

Reply
Glucotrust December 27, 2023 at 12:35 pm

GlucoTrust is a revolutionary blood sugar support solution that eliminates the underlying causes of type 2 diabetes and associated health risks.

Reply
uniland.ir February 3, 2024 at 8:48 pm

هودی ریک و مورتی برشکا، که بیشتر این هودی از جنس پنبه بوده و بسیار نرم می‌باشد.
این هودی، دوام…

توجه: اکثر سایز های این هودی
به فروش رفته‌اند. پیش از ثبت سفارش از طریق یکی
از راه های ارتباطی با ما سایز مورد نظر خود را استعلام کنید.

Reply
Robertnof March 29, 2024 at 6:18 pm

Строительство домов из профилированного бруса в Твери: качество и надежность

Reply
Robertnof March 30, 2024 at 3:24 am

АРМАПРИВОД: профессиональное производство и надежные решения в сфере запорной арматуры и деталей трубопровода

Reply
Robertnof March 30, 2024 at 4:12 pm

Видеопродакшн: ключевой инструмент в продвижении онлайн школы

Reply
Robertnof March 30, 2024 at 11:17 pm

Идеальное жилье для одного или небольшой семьи: однокомнатная квартира

Reply
Robertnof March 31, 2024 at 3:34 pm

Производство зеркал и стекол на заказ от стекольной мастерской «Мир Стекла»

Reply
Robertnof April 1, 2024 at 5:48 pm

Все о дизайне интерьера

Reply
Robertnof April 2, 2024 at 1:18 am

Кухни по индивидуальным размерам от Вита Кухни: идеальное сочетание стиля и функциональности

Reply
Robertnof April 2, 2024 at 4:41 pm

Все о современных технологиях

Reply
Robertnof April 3, 2024 at 1:47 am

Доступные альтернативы: турецкая плитка для среднего класса

Reply
Robertnof April 3, 2024 at 5:06 pm

Дистанционное банкротство: чем оно выгодно

Reply
Robertnof April 4, 2024 at 1:17 am

Строительство коттеджа в Подмосковье: идеальное место для вашего загородного дома

Reply
Robertnof April 4, 2024 at 2:47 pm

Бизнес на Amazon: как начать, масштабировать и успешно продавать на крупнейшей онлайн-платформе

Reply
Robertnof April 5, 2024 at 2:29 am

Плоские кровли в Москве: все, что вам нужно знать

Reply
Robertnof April 5, 2024 at 1:49 pm

Все о современных технологиях

Reply
Robertnof April 5, 2024 at 11:35 pm

Покупка качественной плитки в гипермаркете SANBERG.RU

Reply
Robertnof April 6, 2024 at 5:08 pm

Все о современных технологиях

Reply
Robertnof April 7, 2024 at 3:44 am

Остекление квартир – современные технологии для комфорта и уюта

Reply
Robertnof April 7, 2024 at 3:16 pm

VSDC Free Video Editor: Your Ultimate Tool for Creating Stunning Videos

Reply
Robertnof April 8, 2024 at 3:54 pm

Художественный дизайн: творческий подход к созданию уникальных проектов

Reply
Robertnof April 9, 2024 at 3:27 am

Строительство жилых домов: качественное жилье для вашей семьи

Reply
Robertnof April 9, 2024 at 2:25 pm

Инновационные технологии в строительстве: революция в отрасли

Reply
Robertnof April 11, 2024 at 10:44 pm

Все о современных технологиях

Reply
papa4d April 12, 2024 at 11:57 am

With this option you might be able utilize the information (date date, time, and place of purchase).
My Website : papa4d

Reply
Tylerker April 14, 2024 at 10:21 pm

Нерудные строительные материалы в Санкт-Петербурге и Ленинградской области: инновационные решения для вашего проекта

Reply
Tylerker April 16, 2024 at 11:25 am

Все о современных технологиях

Reply
Robertnof April 19, 2024 at 7:27 pm

Все о дизайне интерьера

Reply
Robertnof April 21, 2024 at 11:30 am

Все о дизайне интерьера

Reply
Robertnof April 24, 2024 at 11:21 pm

Как выбрать дренажную помпу: руководство по выбору оптимального решения

Reply
Danielshera June 25, 2024 at 8:52 pm

Несомненно актуальные новости мировых подиумов.
Исчерпывающие мероприятия лучших подуимов.
Модные дома, бренды, высокая мода.
Самое лучшее место для стильныех людей.
https://fashionablelook.ru

Reply

Solve Pandas read_csv: UnicodeDecodeError: ‘utf-8’ codec can’t decode byte […] in position […] invalid continuation byte

Say thanks, ask questions or give feedback

38 thoughts on “Solve Pandas read_csv: UnicodeDecodeError: ‘utf-8’ codec can’t decode byte […] in position […] invalid continuation byte”

Leave a Reply Cancel reply

Related Posts

How to do a SUMIF in PySpark

Check if Python logger already exists

Spark 3.0: Solving the “dates before 1582-10-15 or timestamps before 1900-01-01T00:00:00Z” error