randomForest gives NA/NaN/Inf in foreign function call and how to solve it

Personally, Random Forest is one of my favorite algorithms for supervised learning. It’s quick and dirty and still allows for some interpretation. However, R and the RandomForest package are somewhat cryptic when it comes to requirements not met to properly train the algorithm. I bumped a lot into this error message.

Error in randomForest.default(m,y,...) : 
   NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning messages:
1: In data.matrix(x): NAs introduced by coercion
2: In data.matrix(x): NAs introduced by coercion

In this blog post I would like to present you a solution. However, there could be multiple solutions. Because here’s what could have gone wrong:

Your data contains NAs
Your data contains NaNs
Your data contains Infs
Your data contains columns of type ‘character’

In the following paragraphs, I explain how you can check your data table for these issues. Let’s create a sample data set: a 10 by 10 data frame with normal data.

set.seed(19880303) # Setting the seed to my birthday
library(data.table) # install.packages('data.table') if necessary

norms <- list() # Create an empty list

for (i in 1:10) {
  norms[[i]] <- data.table(t(rnorm(10,0,1))) # 10 data tables with 10 norms
}

dt <- rbindlist(norms) # binding it all together in a data table
rm(norms) # remove the list

# Now, let's add some issues to our data
dt[5,5] <- Inf # Add an Inf to the data set
dt[4,10] <- NA # Add an NA to the data set
dt[8,3] <- NaN # Add an NaN to the data set
dt[,V9 := as.character(V9)] # add character column
dt$V9 <- sample(c('a','b','c'),10, replace=T)
dt[,V6 := as.character(V6)] # add character column
dt$V6 <- sample(c('a','b','c'),10, replace=T)

This gives me the following data set:

We start of with checking for NAs and NaNs. once you find them you can use multiple techniques to impute the data. The following code will print all the lines that contain an NA or an NaN:

dt[!complete.cases(dt)]

For me personally, this happens a lot when I create features that are ratios and there are some divisions by zero in there. Because in R, division by zero returns in an infinite. If your data table only contains numeric numbers, you can simply do a colSums. However, if your data does not exclusively contain numerical data, these lines of code will print an Inf if there are Inf values in a specific column.

for (i in 1:ncol(dt)) { # For all columns...
  if (is.numeric(dt[[i]])) { # if the column is numeric...
    print(sum(dt[[i]])) # print the sum of the column.
  }
}

Finally, to find all the character columns and automatically convert them to factor columns, the following lines of code should do the trick.

for (i in 1:ncol(dt)) { # For every column...
  if (typeof(dt[[i]]) == 'character') { # if the column type is character...
    dt[[i]] <- as.factor(dt[[i]]) # Convert it to factor. 
  }
}

By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!

Good luck!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

randomForest gives NA/NaN/Inf in foreign function call and how to solve it

Say thanks, ask questions or give feedback

1 thought on “randomForest gives NA/NaN/Inf in foreign function call and how to solve it”

Leave a Reply Cancel reply

randomForest gives NA/NaN/Inf in foreign function call and how to solve it

Say thanks, ask questions or give feedback

1 thought on “randomForest gives NA/NaN/Inf in foreign function call and how to solve it”

Leave a Reply Cancel reply

Related Posts

Starting a remote Selenium server in R

How to set the package directory in R

Counting, adding or subtracting business days in R