Home » How to only select categorical or numerical columns in R

How to only select categorical or numerical columns in R

Let’s say you want to use principal component analysis on the numerical columns in your data set to reduce the amount of features in your model and get rid of multicollinearity. For that, you’d need to select the numerical columns only. Now how could you do that properly?

In the following piece of code I assume I have a data table dt. I do a grepl that returns a logical vector: a TRUE when the column class matches factor, logical or character and FALSE when it doesn’t. I invert this using the ! operator.

!grepl('factor|logical|character',sapply(dt,class))

We can use this inverted vector to select the column names of the columns that are of a numerical class.

colnames(dt)[!grepl('factor|logical|character',sapply(dt,class))]

Finally, we can put this expression again in the original data table to actually select the data from the columns we are after.

dt[,colnames(dt)[grepl('factor|logical|character',sapply(dt,class))],with=F]

For this, we use the ‘with’ parameter, so we can refer to the column names using the vector of strings. From the data.table documentation:

“The argument is named with after the R function with() because of similar functionality. […] Setting with = FALSE disables the ability to refer to columns as if they are variables, thereby restoring the “data.frame mode”.

Putting it all together:

library(data.table)
dt <- fread('XXX.csv')

dt_categorical <- dt[,colnames(dt)[grepl('factor|logical|character',sapply(dt,class))],with=F]

Great success!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

1 thought on “How to only select categorical or numerical columns in R”

  1. Pingback: Applying as.factor (or numeric) to multiple columns in R — Roel Peters

Leave a Reply

Your email address will not be published. Required fields are marked *