Applying as.factor (or numeric) to multiple columns in R

In this blog post I tackle a problem for which I have been looking for an of-the-shelve solution: Converting the columns of a certain type of class to another class, while preserving the data frame. To this day I haven’t found it. That’s why I wrote this function.

Let’s say you have a data frame (data.table) named dt. It contains some characters and logicals that you need as factors, and it contains some integers that you want as numeric. Here’s the full code I wrote to do it:

library(data.table)

convert_columns <- function(x,from,to) { # (1)
  
  column_order <- colnames(x) # (2)
  column_selection <- grepl(from,sapply(x,class))
  
  if (sum(column_selection) > 0) {
    columns_needed <- colnames(x)[column_selection] # (3)
    columns_not_needed <- colnames(x)[!column_selection]

    # (4)
    dt1 <- x[,lapply(.SD,get(paste0('as.',to))), .SDcols = columns_needed]
    dt2 <- x[,.SD, .SDcols = columns_not_needed]
    
    x <- cbind(dt1,dt2)
    setcolorder(x,column_order)
  }
  x
}

dt <- convert_columns(dt,'character|logical','factor')
dt <- convert_columns(dt,'integer','numeric')

(1) Let’s go over it, piece by piece. You need data.table. I call the function twice, to convert the characters/logicals and a second time for the integers.

library(data.table)

convert_columns <- function(x,from,to) {
  # CODE HERE
}

test <- convert_columns(test,'character|logical','factor')
test <- convert_columns(test,'integer','numeric')

(2) I store the order of the columns somewhere (so we can return it later in the same order), and next I make a selection of the columns that I need to convert.

column_order <- colnames(x)
column_selection <- grepl(from,sapply(x,class))

(3) Once we have checked if there are actually any columns to convert (not in the above code), we select the column names that should be converted and the once that shouldn’t be.

columns_needed <- colnames(x)[column_selection]
columns_not_needed <- colnames(x)[!column_selection]

(4) The following chunk of code actually has its basis in something I wrote about earlier. I use the get function to run the function as.X by its name, and I do this for all the columns that were selected. I make a new data frame out of this.

I also make a data frame that consists of the leftover columns. In the last lines of code in this chunk, I bind both data frames together, and I reorder the columns back to their original order.

dt1 <- x[,lapply(.SD,get(paste0('as.',to))), .SDcols = columns_needed]
dt2 <- x[,.SD, .SDcols = columns_not_needed]

x <- cbind(dt1,dt2)
setcolorder(x,column_order)

By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!

Great success!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Applying as.factor (or numeric) to multiple columns in R

Say thanks, ask questions or give feedback

2 thoughts on “Applying as.factor (or numeric) to multiple columns in R”

Leave a Reply Cancel reply

Applying as.factor (or numeric) to multiple columns in R

Say thanks, ask questions or give feedback

2 thoughts on “Applying as.factor (or numeric) to multiple columns in R”

Leave a Reply Cancel reply

Related Posts

Starting a remote Selenium server in R

How to set the package directory in R

Counting, adding or subtracting business days in R