In this blog post, I will elaborate on a specific warning, the contexts in which it occurs and how you can solve or prevent it. It’s definitely in my top three of generic warnings that I bump into:
NAs introduced by coercion
Apparently, some NAs were added to my data because of something that is called coercion. What is coercion? Let’s look it up in Joseph Adler’s R in a Nutshell — pdf available here.
When you call a function with an argument of the wrong type,Joseph Adler, R in a Nutshell (2 ed.), p56
R will try to coerce values to a different type so that the function will work.
When you receive the warning that NAs were introduced by coercion, R has coerced values to a different type, but warns us that it wasn’t able to coerce all of them. The following example is straightforward: I try to convert strings to numeric and it fails.
z <- c('apple','pear','orange') as.numeric(z)
But the error might show while trying to execute other functions.
In the following example, I’m using data.table’s shift() function and I get the same error. As you can see, I’m trying to create a leading variable from a vector of integers. By using the parameter fill, I’m also trying to insert ‘NO VALUE’ where the leading variable is NA. This is a bad idea, as the vector x is of type integer. Funny thing: even if no NAs were added to the output, you will still get the warning.
library(data.table) x <- c(0,1,2,3,4,5) shift(x, n = 3, type = 'lead', fill = 'NO VALUE')
Here’s another one I found online. In the following example, I’m trying to create a distance matrix (although it’s a silly example) from a data frame using the dist() function. In order to succeed, R eliminates the character column by coercing all these values to NAs.
x <- c(10,9,4) y <- c(5,8,12) z <- c('apple','pear','orange') df <- data.table(fruit = z, sold_today = y, sold_yesterday = x) dist(as.matrix(df))
Here’s another silly example. I’m trying to extract everything after the fifth character using the substring() function. Because I pass a character, not an integer, the function returns NA and I get the warning. However, passing ‘5’ as a character will work, because the coercion succeeds. Finally, because TRUE is treated as 1, it will also work.
text <- 'this is an apple' substring(text, first = 'five') substring(text, first = '5') # This works substring(text, first = TRUE) # This works
By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!