Home » Calculating shares by row in R using data.table

Calculating shares by row in R using data.table

  • by
  • 1 min read

In R, it often happens that I need to calculate the share of each column by row. In this very simple example I would like to update the following table:

applesorangesbananaspineapples
2463
1092
5123

and I would like to be:

applesorangesbananaspineapples
0.130.270.40.2
0.0800.750.17
0.50.10.20.2

As you can see, every cell now contains the share its absolute value accounts for in the row. Using data.table, there is an easy way to do this.

rsums <- rowSums(fruit])
fruit <- fruit[,lapply(.SD,function(x) {x / rsums})]
rm(rsums)

In the data.table package, the .SD acronym stands for “subset of the data”. By doing a lapply over .SD, without specifying .SDcols, we are applying the function over all the columns. However, if you would only want to apply the function to apples and oranges, one would use:

rsums <- rowSums(fruit])
fruit <- fruit[,lapply(.SD,function(x) {x / rsums}), .SDCols = c('apples','oranges')]
rm(rsums)

By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!

Good luck!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Leave a Reply

Your email address will not be published. Required fields are marked *