Let’s say you want to use principal component analysis on the numerical columns in your data set to reduce the amount of features in your model and get rid of multicollinearity. For that, you’d need to select the numerical columns only. Now how could you do that properly?
In the following piece of code I assume I have a data table dt. I do a grepl that returns a logical vector: a TRUE when the column class matches factor, logical or character and FALSE when it doesn’t. I invert this using the ! operator.
!grepl('factor|logical|character',sapply(dt,class))
We can use this inverted vector to select the column names of the columns that are of a numerical class.
colnames(dt)[!grepl('factor|logical|character',sapply(dt,class))]
Finally, we can put this expression again in the original data table to actually select the data from the columns we are after.
dt[,colnames(dt)[grepl('factor|logical|character',sapply(dt,class))],with=F]
For this, we use the ‘with’ parameter, so we can refer to the column names using the vector of strings. From the data.table documentation:
“The argument is named
with
after the R functionwith()
because of similar functionality. […] Settingwith = FALSE
disables the ability to refer to columns as if they are variables, thereby restoring the “data.frame
mode”. “
Putting it all together:
library(data.table)
dt <- fread('XXX.csv')
dt_categorical <- dt[,colnames(dt)[grepl('factor|logical|character',sapply(dt,class))],with=F]
By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!
Great success!
Pingback: Applying as.factor (or numeric) to multiple columns in R — Roel Peters
Your article gave me a lot of inspiration, I hope you can explain your point of view in more detail, because I have some doubts, thank you.
Pingback: here
Your article helped me a lot, is there any more related content? Thanks! https://www.binance.info/zh-TC/join?ref=GJY4VW8W
Your article helped me a lot, is there any more related content? Thanks!