Skip to content
Home ยป Solving R’s “NAs introduced by coercion”

Solving R’s “NAs introduced by coercion”

Tags:

In this blog post, I will elaborate on a specific warning, the contexts in which it occurs and how you can solve or prevent it. It’s definitely in my top three of generic warnings that I bump into:

NAs introduced by coercion

Apparently, some NAs were added to my data because of something that is called coercion. What is coercion? Let’s look it up in Joseph Adler’s R in a Nutshell — pdf available here.

When you call a function with an argument of the wrong type,
R will try to coerce values to a different type so that the function will work.

Joseph Adler, R in a Nutshell (2 ed.), p56

When you receive the warning that NAs were introduced by coercion, R has coerced values to a different type, but warns us that it wasn’t able to coerce all of them. The following example is straightforward: I try to convert strings to numeric and it fails.

z <- c('apple','pear','orange')
as.numeric(z)

But the error might show while trying to execute other functions.

In the following example, I’m using data.table’s shift() function and I get the same error. As you can see, I’m trying to create a leading variable from a vector of integers. By using the parameter fill, I’m also trying to insert ‘NO VALUE’ where the leading variable is NA. This is a bad idea, as the vector x is of type integer. Funny thing: even if no NAs were added to the output, you will still get the warning.

library(data.table)
x <- c(0,1,2,3,4,5)
shift(x, n = 3, type = 'lead', fill = 'NO VALUE')

Here’s another one I found online. In the following example, I’m trying to create a distance matrix (although it’s a silly example) from a data frame using the dist() function. In order to succeed, R eliminates the character column by coercing all these values to NAs.

x <- c(10,9,4)
y <- c(5,8,12)
z <- c('apple','pear','orange')

df <- data.table(fruit = z, sold_today = y, sold_yesterday = x)
dist(as.matrix(df))

Here’s another silly example. I’m trying to extract everything after the fifth character using the substring() function. Because I pass a character, not an integer, the function returns NA and I get the warning. However, passing ‘5’ as a character will work, because the coercion succeeds. Finally, because TRUE is treated as 1, it will also work.

text <- 'this is an apple'
substring(text, first = 'five')
substring(text, first = '5') # This works
substring(text, first = TRUE) # This works

Great success!

By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

2 thoughts on “Solving R’s “NAs introduced by coercion””

  1. Hi,
    i am beginner in R, when i tried to execute the below simple code
    x<-readline()
    x<-as.integer(x)
    y<-readline()
    y<-as.integer(y)
    sum<-x+y
    print(sum)

    i keep getting this error:-
    Error in x + y : non-numeric argument to binary operator
    please help me out

Leave a Reply

Your email address will not be published. Required fields are marked *