Skip to content
Home » Padding: leading & lagged variables in R data frames

Padding: leading & lagged variables in R data frames

Tags:

Oftentimes, it happens I need to calculate the difference of something between two periods. If each row represents a period, the fastest thing to do is to lag the variable you need to perform calculations on.

In R, the lag() function from the stats package is an option, but you’ll notice it won’t work in data frames.

There’s a manual, not so elegant solution. For example, I can create a lagged variable with an offset of one simply by adding an NA in front of a vector and removing the last item. The leading variable with an offset of one is completely analogous.

library(data.table)
dt <- data.table(base = seq(1,10,1))

dt$lagged_manual <- c(NA,dt$base[1:(nrow(dt)-1)])
dt$leading_manual <- c(dt$base[2:nrow(dt)],NA)

See, not so elegant. However, the data.table package offers some really nice functionalities to create leading and lagged variables. The shift() function provides lagging/leading capabilities with an easy to use interface.

dt[,lagged_base := shift(base, 1, type = 'lag')]
dt[,leading_base := shift(base, 1, type = 'lead')]

By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!

Great success!

Say thanks, ask questions or give feedback

Technologies get updated, syntax changes and honestly… I make mistakes too. If something is incorrect, incomplete or doesn’t work, let me know in the comments below and help thousands of visitors.

2 thoughts on “Padding: leading & lagged variables in R data frames”

  1. Pingback: facebook comments blog seo

Leave a Reply

Your email address will not be published. Required fields are marked *