In this blog post, I will elaborate on a specific warning, the contexts in which it occurs and how you can solve or prevent it. It’s definitely in my top three of generic warnings that I bump into:
NAs introduced by coercion
Apparently, some NAs were added to my data because of something that is called coercion. What is coercion? Let’s look it up in Joseph Adler’s R in a Nutshell — pdf available here.
When you call a function with an argument of the wrong type,
Joseph Adler, R in a Nutshell (2 ed.), p56
R will try to coerce values to a different type so that the function will work.
When you receive the warning that NAs were introduced by coercion, R has coerced values to a different type, but warns us that it wasn’t able to coerce all of them. The following example is straightforward: I try to convert strings to numeric and it fails.
z <- c('apple','pear','orange')
as.numeric(z)
But the error might show while trying to execute other functions.
In the following example, I’m using data.table’s shift() function and I get the same error. As you can see, I’m trying to create a leading variable from a vector of integers. By using the parameter fill, I’m also trying to insert ‘NO VALUE’ where the leading variable is NA. This is a bad idea, as the vector x is of type integer. Funny thing: even if no NAs were added to the output, you will still get the warning.
library(data.table)
x <- c(0,1,2,3,4,5)
shift(x, n = 3, type = 'lead', fill = 'NO VALUE')
Here’s another one I found online. In the following example, I’m trying to create a distance matrix (although it’s a silly example) from a data frame using the dist() function. In order to succeed, R eliminates the character column by coercing all these values to NAs.
x <- c(10,9,4)
y <- c(5,8,12)
z <- c('apple','pear','orange')
df <- data.table(fruit = z, sold_today = y, sold_yesterday = x)
dist(as.matrix(df))
Here’s another silly example. I’m trying to extract everything after the fifth character using the substring() function. Because I pass a character, not an integer, the function returns NA and I get the warning. However, passing ‘5’ as a character will work, because the coercion succeeds. Finally, because TRUE is treated as 1, it will also work.
text <- 'this is an apple'
substring(text, first = 'five')
substring(text, first = '5') # This works
substring(text, first = TRUE) # This works
Great success!
By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!
Hi,
i am beginner in R, when i tried to execute the below simple code
x<-readline()
x<-as.integer(x)
y<-readline()
y<-as.integer(y)
sum<-x+y
print(sum)
i keep getting this error:-
Error in x + y : non-numeric argument to binary operator
please help me out
Thank you for your sharing. I am worried that I lack creative ideas. It is your article that makes me full of hope. Thank you. But, I have a question, can you help me? https://accounts.binance.com/ar-BH/register?ref=V2H9AFPY