Home ยป Decomposition of time series with pandas and statsmodels

Decomposition of time series with pandas and statsmodels

  • by
Decomposition of time series with pandas and statsmodels
Want to do a random act of kindness? Share this post.

In this blog post I decompose a time series of monthly data using the pandas and statsmodels package in Python.

You can find the data that I use in this blog post in my github repo. It is a monthly average of daily car counts on different hubs on the Belgian highways.

I start of with importing the necessary Python packages and loading in the data. I also filter the the data to only contain traffic from one particular segment on the Belgian highways.

import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt
from pylab import rcParams

data = pd.read_csv('all_volume.csv')
data = data[data.identifier == 1092002419]

In the next lines of code I concatenate the year (jaar) and month (maand) column, I also add the first day of the month using the ‘-01’ substring. Finally, I convert the time series to a Timestamp.

data['datum'] = data['jaar'].astype(str) + '-' + data['maand'].astype(str) + '-01'
data['datum'] = pd.to_datetime(data['datum'], yearfirst = True)
data.head()

It is crucial that you convert your date column to a datetime, otherwise, when decomposing the time series later on, you will get the following error:

AttributeError: ‘Index’ object has no attribute ‘inferred_freq’

Finally, before the decomposition, I only select the datum and the voertuigen column and set the datum column as the index of the DataFrame.

data = data[['datum','voertuigen']]
data = data.set_index('datum')

Which brings us to our final goal, the decomposition of the time series into a seasonal component and a trend component.

rcParams['figure.figsize'] = 18, 8
decomposition = sm.tsa.seasonal_decompose(data, model='additive',freq=12, extrapolate_trend = 12)
fig = decomposition.plot()
plt.show()

In this case has, our time series index had no ‘freq’. Not setting it returns in the following error:

ValueError: You must specify a freq or x must be a pandas object with a timeseries index with a freq not set to None

Since I’m working with monthly data, setting the frequency to 12 seemed like a logical thing to do. To avoid NaN, I also set the extrapolate_trend parameters to 12.

Great success!

Want to do a random act of kindness? Share this post.