In this blog post I tackle a problem for which I have been looking for an of-the-shelve solution: Converting the columns of a certain type of class to another class, while preserving the data frame. To this day I haven’t found it. That’s why I wrote this function.
Let’s say you have a data frame (data.table) named dt. It contains some characters and logicals that you need as factors, and it contains some integers that you want as numeric. Here’s the full code I wrote to do it:
library(data.table)
convert_columns <- function(x,from,to) { # (1)
column_order <- colnames(x) # (2)
column_selection <- grepl(from,sapply(x,class))
if (sum(column_selection) > 0) {
columns_needed <- colnames(x)[column_selection] # (3)
columns_not_needed <- colnames(x)[!column_selection]
# (4)
dt1 <- x[,lapply(.SD,get(paste0('as.',to))), .SDcols = columns_needed]
dt2 <- x[,.SD, .SDcols = columns_not_needed]
x <- cbind(dt1,dt2)
setcolorder(x,column_order)
}
x
}
dt <- convert_columns(dt,'character|logical','factor')
dt <- convert_columns(dt,'integer','numeric')
(1) Let’s go over it, piece by piece. You need data.table. I call the function twice, to convert the characters/logicals and a second time for the integers.
library(data.table)
convert_columns <- function(x,from,to) {
# CODE HERE
}
test <- convert_columns(test,'character|logical','factor')
test <- convert_columns(test,'integer','numeric')
(2) I store the order of the columns somewhere (so we can return it later in the same order), and next I make a selection of the columns that I need to convert.
column_order <- colnames(x)
column_selection <- grepl(from,sapply(x,class))
(3) Once we have checked if there are actually any columns to convert (not in the above code), we select the column names that should be converted and the once that shouldn’t be.
columns_needed <- colnames(x)[column_selection]
columns_not_needed <- colnames(x)[!column_selection]
(4) The following chunk of code actually has its basis in something I wrote about earlier. I use the get function to run the function as.X by its name, and I do this for all the columns that were selected. I make a new data frame out of this.
I also make a data frame that consists of the leftover columns. In the last lines of code in this chunk, I bind both data frames together, and I reorder the columns back to their original order.
dt1 <- x[,lapply(.SD,get(paste0('as.',to))), .SDcols = columns_needed]
dt2 <- x[,.SD, .SDcols = columns_not_needed]
x <- cbind(dt1,dt2)
setcolorder(x,column_order)
By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!
Great success!
Pingback: buy links
Your article helped me a lot, is there any more related content? Thanks! https://www.binance.info/join?ref=UM6SMJM3
Your article helped me a lot, is there any more related content? Thanks!