Home » Removing Ï.., I and two dots or umlaut, when using read.csv in R

Removing Ï.., I and two dots or umlaut, when using read.csv in R

  • by
I Umlaut column names BOM
Want to do a random act of kindness? Share this post.

Here’s something I used to bump in a lot when working with external files that I receive from clients: some gibberish prepended to the first column name of a data frame when using read.csv. However, there’s a good reason why this happens.

The first character is a magical character, invisible to the human eye, but readible by a computer. It is the byte order mark (or BOM) and it’s telling the computer that the characters that follow are encoded in Unicode.

However, text editors might interpret this character as something else: namely . There are two ways two solve it. The first one, just changing the fileEncoding parameter, doesn’t seem to work for everyone.

read.csv('file.csv', fileEncoding = 'UTF-8-BOM')

So here’s how I always solved it. I simply removed the first three characters of the first column name.

colnames(df)[1] <- gsub('^...','',colnames(df)[1])

Great success!

Want to do a random act of kindness? Share this post.