Home ยป Padding: leading & lagged variables in R data frames

Padding: leading & lagged variables in R data frames

  • by
Tags:
Want to do a random act of kindness? Share this post.

Oftentimes, it happens I need to calculate the difference of something between two periods. If each row represents a period, the fastest thing to do is to lag the variable you need to perform calculations on.

In R, the lag() function from the stats package is an option, but you’ll notice it won’t work in data frames.

There’s a manual, not so elegant solution. For example, I can create a lagged variable with an offset of one simply by adding an NA in front of a vector and removing the last item. The leading variable with an offset of one is completely analogous.

library(data.table)
dt <- data.table(base = seq(1,10,1))

dt$lagged_manual <- c(NA,dt$base[1:(nrow(dt)-1)])
dt$leading_manual <- c(dt$base[2:nrow(dt)],NA)

See, not so elegant. However, the data.table package offers some really nice functionalities to create leading and lagged variables. The shift() function provides lagging/leading capabilities with an easy to use interface.

dt[,lagged_base := shift(base, 1, type = 'lag')]
dt[,leading_base := shift(base, 1, type = 'lead')]

Great success!

Want to do a random act of kindness? Share this post.