Home ยป Padding: leading & lagged variables in R data frames

Padding: leading & lagged variables in R data frames

  • by
Tags:

Oftentimes, it happens I need to calculate the difference of something between two periods. If each row represents a period, the fastest thing to do is to lag the variable you need to perform calculations on.

In R, the lag() function from the stats package is an option, but you’ll notice it won’t work in data frames.

There’s a manual, not so elegant solution. For example, I can create a lagged variable with an offset of one simply by adding an NA in front of a vector and removing the last item. The leading variable with an offset of one is completely analogous.

library(data.table)
dt <- data.table(base = seq(1,10,1))

dt$lagged_manual <- c(NA,dt$base[1:(nrow(dt)-1)])
dt$leading_manual <- c(dt$base[2:nrow(dt)],NA)

See, not so elegant. However, the data.table package offers some really nice functionalities to create leading and lagged variables. The shift() function provides lagging/leading capabilities with an easy to use interface.

dt[,lagged_base := shift(base, 1, type = 'lag')]
dt[,leading_base := shift(base, 1, type = 'lead')]

Great success!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

Leave a Reply

Your email address will not be published. Required fields are marked *