Home ยป Applying as.factor (or numeric) to multiple columns in R

Applying as.factor (or numeric) to multiple columns in R

  • by
manipulate columns in R
Want to do a random act of kindness? Share this post.

In this blog post I tackle a problem for which I have been looking for an of-the-shelve solution: Converting the columns of a certain type of class to another class, while preserving the data frame. To this day I haven’t found it. That’s why I wrote this function.

Let’s say you have a data frame (data.table) named dt. It contains some characters and logicals that you need as factors, and it contains some integers that you want as numeric. Here’s the full code I wrote to do it:

library(data.table)

convert_columns <- function(x,from,to) { # (1)
  
  column_order <- colnames(x) # (2)
  column_selection <- grepl(from,sapply(x,class))
  
  if (sum(column_selection) > 0) {
    columns_needed <- colnames(x)[column_selection] # (3)
    columns_not_needed <- colnames(x)[!column_selection]

    # (4)
    dt1 <- x[,lapply(.SD,get(paste0('as.',to))), .SDcols = columns_needed]
    dt2 <- x[,.SD, .SDcols = columns_not_needed]
    
    x <- cbind(dt1,dt2)
    setcolorder(x,column_order)
  }
  x
}

dt <- convert_columns(dt,'character|logical','factor')
dt <- convert_columns(dt,'integer','numeric')

(1) Let’s go over it, piece by piece. You need data.table. I call the function twice, to convert the characters/logicals and a second time for the integers.

library(data.table)

convert_columns <- function(x,from,to) {
  # CODE HERE
}

test <- convert_columns(test,'character|logical','factor')
test <- convert_columns(test,'integer','numeric')

(2) I store the order of the columns somewhere (so we can return it later in the same order), and next I make a selection of the columns that I need to convert.

column_order <- colnames(x)
column_selection <- grepl(from,sapply(x,class))

(3) Once we have checked if there are actually any columns to convert (not in the above code), we select the column names that should be converted and the once that shouldn’t be.

columns_needed <- colnames(x)[column_selection]
columns_not_needed <- colnames(x)[!column_selection]

(4) The following chunk of code actually has its basis in something I wrote about earlier. I use the get function to run the function as.X by its name, and I do this for all the columns that were selected. I make a new data frame out of this.

I also make a data frame that consists of the leftover columns. In the last lines of code in this chunk, I bind both data frames together, and I reorder the columns back to their original order.

dt1 <- x[,lapply(.SD,get(paste0('as.',to))), .SDcols = columns_needed]
dt2 <- x[,.SD, .SDcols = columns_not_needed]

x <- cbind(dt1,dt2)
setcolorder(x,column_order)

Great success!

Want to do a random act of kindness? Share this post.