Oftentimes, it happens I need to calculate the difference of something between two periods. If each row represents a period, the fastest thing to do is to lag the variable you need to perform calculations on.
In R, the lag() function from the stats package is an option, but you’ll notice it won’t work in data frames.
There’s a manual, not so elegant solution. For example, I can create a lagged variable with an offset of one simply by adding an NA in front of a vector and removing the last item. The leading variable with an offset of one is completely analogous.
library(data.table) dt <- data.table(base = seq(1,10,1)) dt$lagged_manual <- c(NA,dt$base[1:(nrow(dt)-1)]) dt$leading_manual <- c(dt$base[2:nrow(dt)],NA)
See, not so elegant. However, the data.table package offers some really nice functionalities to create leading and lagged variables. The shift() function provides lagging/leading capabilities with an easy to use interface.
dt[,lagged_base := shift(base, 1, type = 'lag')] dt[,leading_base := shift(base, 1, type = 'lead')]
By the way, if you’re having trouble understanding some of the code and concepts, I can highly recommend “An Introduction to Statistical Learning: with Applications in R”, which is the must-have data science bible. If you simply need an introduction into R, and less into the Data Science part, I can absolutely recommend this book by Richard Cotton. Hope it helps!