In the last lab, we learnt how to visualize and manipulate time series data, and how to use the ARIMA modeling to produce forecasts from time-series data. We also learnt how the conclude a correct parametrization of ARIMA models. This can be a complicated process, and while statistical programming languages such a R provide automated ways to solve this issue, but those have yet to be officially ported over to Python.
Fortunately, the Data Science team at Facebook recently published a new method called prophet, which enables data analysts and developers alike to perform forecasting at scale in Python. We would encourage you to read this article by Facebook explaining how prophet simplifies the forecasting process and provides an improved predictive ability.
- Understand the difference between ARIMA and Additive Synthesis for time series forecasting
- Model a time series object using prophet library
- Make predictions for future and compare the approach with previously seen techniques
Facebook prophet uses an elegant yet simple method for analyzing and predicting periodic data known as the additive modeling. The idea is straightforward: represent a time-series as a combination of patterns at different scales such as daily, weekly, seasonally, and yearly, along with an overall trend. Your energy use might rise in the summer and decrease in the winter, but have an overall decreasing trend as you increase the energy efficiency of your home. An additive model can show us both patterns/trends and make predictions based on these observations.
The following image shows an additive model decomposition of a time-series into an overall trend, yearly trend, and weekly trend.
“Prophet has been a key piece to improving Facebook’s ability to create a large number of trustworthy forecasts used for decision-making and even in product features.”
In order to compute its forecasts, the fbprophet library relies on the STAN programming language. Before installing fbprophet, we need to make sure that the pystan Python wrapper to STAN is installed. We shall first install pystan
and fbprophet
using !pip install
.
#!pip install pystan
#!pip install fbprophet
Let's start by reading in our time-series data. We shall cover some data manipulation using pandas, accessing financial data using the Quandl
library and, and plotting with matplotlib.
#Import necessary libraries
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
# Matplotlib for plotting
import matplotlib.pyplot as plt
import matplotlib
import seaborn as sns
%matplotlib inline
from matplotlib.pylab import rcParams
plt.style.use('fivethirtyeight')
from fbprophet import Prophet as proph
# Import passengers.csv and set it as a time-series object.
The prophet library also imposes the strict condition that the input columns be named ds
(the time column) and y
(the metric column), so let's rename the columns in our ts
dataframe.
# Rename the columns [Month, AirPassengers] to [ds, y]
# ds y
# 1949-01-01 112
# 1949-02-01 118
# 1949-03-01 132
# 1949-04-01 129
# 1949-05-01 121
# Plot the timeseries
In this section, we shall learn how to use the Prophet library to predict future values of our time-series. The Facebook team has abstracted away many of the inherent complexities of time series forecasting and made it more intuitive for analysts and developers alike to work with time series data.
To begin, we will create a new prophet object with proph()
and provide a number of arguments. For example, we can specify the desired range of our uncertainty interval by setting the interval_width
parameter.
# set the uncertainty interval to 95% (the Prophet default is 80%)
Now that our model has been initialized, we can call its fit
method with our DataFrame ts
as input. The model fitting should take no longer than a few seconds.
# Fit the timeseries into Model
In order to obtain forecasts of our time series, we must provide the model with a new dataframe containing a ds
column that holds the dates for which we want predictions. Conveniently, we do not have to concern ourselves with manually creating this dataframe because prophet provides the make_future_dataframe
helper function. We will call this function to generate 36 datestamps in the future. The documentation for this function is available HERE.
It is also important to consider the frequency of our time series. Because we are working with monthly data, we clearly specified the desired frequency of the timestamps (in this case, MS is the start of the month). Therefore, the make_future_dataframe
will generate 36 monthly timestamps for us. In other words, we are looking to predict future values of our time series 3 years into the future.
# USe make_future_dataframe with a monthly frequency and periods = 36 for 3 years
# ds
# 175 1963-08-01
# 176 1963-09-01
# 177 1963-10-01
# 178 1963-11-01
# 179 1963-12-01
This future dates dataframe can now be used as input to the predict
method of the fitted model.
# Predict the values for future dates and take the head of forecast
# ds trend trend_lower trend_upper yhat_lower yhat_upper additive_terms additive_terms_lower additive_terms_upper multiplicative_terms multiplicative_terms_lower multiplicative_terms_upper yearly yearly_lower yearly_upper yhat
# 0 1949-01-01 106.390966 106.390966 106.390966 40.066461 128.916059 -21.935305 -21.935305 -21.935305 0.0 0.0 0.0 -21.935305 -21.935305 -21.935305 84.455661
# 1 1949-02-01 108.569855 108.569855 108.569855 33.931775 120.662906 -30.703975 -30.703975 -30.703975 0.0 0.0 0.0 -30.703975 -30.703975 -30.703975 77.865881
# 2 1949-03-01 110.537884 110.537884 110.537884 65.902441 152.751003 -0.486998 -0.486998 -0.486998 0.0 0.0 0.0 -0.486998 -0.486998 -0.486998 110.050887
# 3 1949-04-01 112.716774 112.716774 112.716774 65.488925 149.317057 -5.184948 -5.184948 -5.184948 0.0 0.0 0.0 -5.184948 -5.184948 -5.184948 107.531826
# 4 1949-05-01 114.825377 114.825377 114.825377 67.562029 153.611413 -3.782347 -3.782347 -3.782347 0.0 0.0
We can see that prophet returns a large table with many interesting columns, but we subset our output to the columns most relevant to forecasting, which are:
ds
: the datestamp of the forecasted valueyhat
: the forecasted value of our metric (in Statistics, yhat is a notation traditionally used to represent the predicted values of a value y)yhat_lower
: the lower bound of our forecastsyhat_upper
: the upper bound of our forecasts
# Subset above mentioned columns and view the tail
# ds yhat yhat_lower yhat_upper
# 175 1963-08-01 649.787427 604.921338 695.757506
# 176 1963-09-01 602.260711 557.213400 645.642244
# 177 1963-10-01 566.233600 524.324314 608.815224
# 178 1963-11-01 534.258296 488.622666 578.243727
# 179 1963-12-01 563.846779 516.242796 609.748779
A variation in values from the output presented above is to be expected as Prophet relies on Markov chain Monte Carlo (MCMC) methods to generate its forecasts. MCMC is a stochastic process, so values will be slightly different each time.
Prophet also provides a convenient function to quickly plot the results of our forecasts.
# Use prophet's plot function to plot the predictions
Prophet plots the observed values of the time-series (the black dots), the forecasted values (blue line) and the uncertainty intervals of our forecasts (the blue shaded regions).
One other particularly strong feature of Prophet is its ability to return the components of our forecasts. This can help reveal how daily, weekly and yearly patterns of the time series contribute to the overall forecasted values. We can use plot_components()
function to view the individual components.
# Plot model components
Since we are working with monthly data, Prophet will plot the trend and the yearly seasonality but if you were working with daily data, you would also see a weekly seasonality plot included.
From the trend and seasonality, we can see that the trend is a playing a large part in the underlying time series and seasonality comes into play mostly toward the beginning and the end of the year. With this information, we've been able to quickly model and forecast some data to get a feel for what might be coming our way in the future from this particular data set.
In this lab, you learned how to use the Prophet library to perform time series forecasting in Python. We have been using out-of-the box parameters, but Prophet enables us to specify many more arguments. In particular, Prophet provides the functionality to bring your own knowledge about time series to the table.