You can find the data that I use in this blog post in my github repo. It is a monthly average of daily car counts on different hubs on the Belgian highways.
I start of with importing the necessary Python packages and loading in the data. I also filter the the data to only contain traffic from one particular segment on the Belgian highways.
import pandas as pd import statsmodels.api as sm import matplotlib.pyplot as plt from pylab import rcParams data = pd.read_csv('all_volume.csv') data = data[data.identifier == 1092002419]
In the next lines of code I concatenate the year (jaar) and month (maand) column, I also add the first day of the month using the ‘-01’ substring. Finally, I convert the time series to a Timestamp.
data['datum'] = data['jaar'].astype(str) + '-' + data['maand'].astype(str) + '-01' data['datum'] = pd.to_datetime(data['datum'], yearfirst = True) data.head()
It is crucial that you convert your date column to a datetime, otherwise, when decomposing the time series later on, you will get the following error:
AttributeError: ‘Index’ object has no attribute ‘inferred_freq’
Finally, before the decomposition, I only select the datum and the voertuigen column and set the datum column as the index of the DataFrame.
data = data[['datum','voertuigen']] data = data.set_index('datum')
Which brings us to our final goal, the decomposition of the time series into a seasonal component and a trend component.
rcParams['figure.figsize'] = 18, 8 decomposition = sm.tsa.seasonal_decompose(data, model='additive',freq=12, extrapolate_trend = 12) fig = decomposition.plot() plt.show()
In this case has, our time series index had no ‘freq’. Not setting it returns in the following error:
ValueError: You must specify a freq or x must be a pandas object with a timeseries index with a freq not set to None
Since I’m working with monthly data, setting the frequency to 12 seemed like a logical thing to do. To avoid NaN, I also set the extrapolate_trend parameters to 12.