Home » Solving R’s “NAs introduced by coercion”

Solving R’s “NAs introduced by coercion”

  • by
na-introduced-by-coercion
Tags:
Want to do a random act of kindness? Share this post.

In this blog post, I will elaborate on a specific warning, the contexts in which it occurs and how you can solve or prevent it. It’s definitely in my top three of generic warnings that I bump into:

NAs introduced by coercion

Apparently, some NAs were added to my data because of something that is called coercion. What is coercion? Let’s look it up in Joseph Adler’s R in a Nutshell — pdf available here.

When you call a function with an argument of the wrong type,
R will try to coerce values to a different type so that the function will work.

Joseph Adler, R in a Nutshell (2 ed.), p56

When you receive the warning that NAs were introduced by coercion, R has coerced values to a different type, but warns us that it wasn’t able to coerce all of them. The following example is straightforward: I try to convert strings to numeric and it fails.

z <- c('apple','pear','orange')
as.numeric(z)

But the error might show while trying to execute other functions.

In the following example, I’m using data.table’s shift() function and I get the same error. As you can see, I’m trying to create a leading variable from a vector of integers. By using the parameter fill, I’m also trying to insert ‘NO VALUE’ where the leading variable is NA. This is a bad idea, as the vector x is of type integer. Funny thing: even if no NAs were added to the output, you will still get the warning.

library(data.table)
x <- c(0,1,2,3,4,5)
shift(x, n = 3, type = 'lead', fill = 'NO VALUE')

Here’s another one I found online. In the following example, I’m trying to create a distance matrix (although it’s a silly example) from a data frame using the dist() function. In order to succeed, R eliminates the character column by coercing all these values to NAs.

x <- c(10,9,4)
y <- c(5,8,12)
z <- c('apple','pear','orange')

df <- data.table(fruit = z, sold_today = y, sold_yesterday = x)
dist(as.matrix(df))

Here’s another silly example. I’m trying to extract everything after the fifth character using the substring() function. Because I pass a character, not an integer, the function returns NA and I get the warning. However, passing ‘5’ as a character will work, because the coercion succeeds. Finally, because TRUE is treated as 1, it will also work.

text <- 'this is an apple'
substring(text, first = 'five')
substring(text, first = '5') # This works
substring(text, first = TRUE) # This works

Great success!

Want to do a random act of kindness? Share this post.