14-ch14.Rmd

# Introduction to Time Series Regression and Forecasting {#ittsraf}

```{r, warnings = FALSE, message=FALSE, echo=F, purl=F}
library(dynlm)
library(stargazer)
library(scales)
library(readxl)
library(urca)
```

Time series data is data is collected for a single entity over time. This is fundamentally different from cross-section data which is data on multiple entities at the same point in time. Time series data allows estimation of the effect on $Y$ of a change in $X$ *over time*. This is what econometricians call a *dynamic causal effect*. Let us go back to the application to cigarette consumption of Chapter \@ref(ivr) where we were interested in estimating the effect on cigarette demand of a price increase caused by a raise of the general sales tax. One might use time series data to assess the causal effect of a tax increase on smoking both initially and in subsequent periods.

Another application of time series data is forecasting. For example, weather services use time series data to predict tomorrow's temperature by inter alia using today's temperature and temperatures of the past. To motivate an economic example, central banks are interested in forecasting next month's unemployment rates.

The remainder of Chapters in the book deals with the econometric techniques for the analysis of time series data and applications to forecasting and estimation of dynamic causal effects. This section covers the basic concepts presented in Chapter 14 of the book, explains how to visualize time series data and demonstrates how to estimate simple autoregressive models, where the regressors are past values of the dependent variable or other variables. In this context we also discuss the concept of stationarity, an important property which has far-reaching consequences. 

Most empirical applications in this chapter are concerned with forecasting and use data on U.S. macroeconomic indicators or financial time series like Gross Domestic Product (GDP), the unemployment rate or excess stock returns.

The following packages and their dependencies are needed for reproduction of the code chunks presented throughout this chapter:

+ `r ttcode("AER")` [@R-AER]
+ `r ttcode("dynlm")` [@R-dynlm]
+ `r ttcode("forecast")` [@R-forecast]
+ `r ttcode("readxl")` [@R-readxl]
+ `r ttcode("stargazer")` [@R-stargazer]
+ `r ttcode("scales")` [@R-scales]
+ `r ttcode("quantmod")` [@R-quantmod]
+ `r ttcode("urca")` [@R-urca]

Please verify that the following code chunk runs on your machine without any errors.

```{r, warning=FALSE, message=FALSE, eval=FALSE}
library(AER)
library(dynlm)
library(forecast)
library(readxl)
library(stargazer)
library(scales)
library(quantmod)
library(urca)
```

## Using Regression Models for Forecasting

What is the difference between estimating models for assessment of causal effects and forecasting? Consider again the simple example of estimating the casual effect of the student-teacher ratio on test scores introduced in Chapter \@ref(lrwor).

```{r, warning = FALSE, message=FALSE}
library(AER)
data(CASchools)   
CASchools$STR <- CASchools$students/CASchools$teachers       
CASchools$score <- (CASchools$read + CASchools$math)/2

mod <- lm(score ~ STR, data = CASchools)
mod
```

As has been stressed in Chapter \@ref(rmwmr), the estimate of the coefficient on the student-teacher ratio does not have a causal interpretation due to omitted variable bias. However, in terms of deciding which school to send her child to, it might nevertheless be appealing for a parent to use `r ttcode("mod")` for forecasting test scores in schooling districts where no public data about on scores are available.

As an example, assume that the average class in a district has $25$ students. This is not a perfect forecast but the following one-liner might be helpful for the parent to decide.

```{r}
predict(mod, newdata = data.frame("STR" = 25))
```

In a time series context, the parent could use data on present and past years test scores to forecast next year's test scores --- a typical application for an autoregressive model.

## Time Series Data and Serial Correlation {#tsdasc}

GDP is commonly defined as the value of goods and services produced over a given time period. The data set `r ttcode("us_macro_quarterly.xlsx")` is provided by the authors and can be downloaded [here](http://wps.pearsoned.co.uk/wps/media/objects/16103/16489878/data3eu/us_macro_quarterly.xlsx). It provides quarterly data on U.S. real (i.e. inflation adjusted) GDP from 1947 to 2004.

As before, a good starting point is to plot the data. The package `r ttcode("quantmod")` provides some convenient functions for plotting and computing with time series data. We also load the package `r ttcode("readxl")` to read the data into `r ttcode("R")`.

```{r, warning=FALSE, message=FALSE}
# attach the package 'quantmod'
library(quantmod)
```

We begin by importing the data set.

```{r}
# load US macroeconomic data
USMacroSWQ <- read_xlsx("Data/us_macro_quarterly.xlsx",
                         sheet = 1,
                         col_types = c("text", rep("numeric", 9)))

# format date column
USMacroSWQ$...1 <- as.yearqtr(USMacroSWQ$...1, format = "%Y:0%q")

# adjust column names
colnames(USMacroSWQ) <- c("Date", "GDPC96", "JAPAN_IP", "PCECTPI", 
                          "GS10", "GS1", "TB3MS", "UNRATE", "EXUSUK", "CPIAUCSL")
```

We the first column of `r ttcode("us_macro_quarterly.xlsx")` contains text and the remaining ones are numeric. Using `r ttcode('col_types = c("text", rep("numeric", 9))')`  we tell `r ttcode("read_xlsx()")` take this into account when importing the data.

It is useful to work with time-series objects that keep track of the frequency of the data and are extensible. In what follows we will use objects of the class `r ttcode("xts")`, see `?xts`. Since the data in `r ttcode("USMacroSWQ")` are in quarterly frequency we convert the first column to `r ttcode("yearqtr")` format before generating the `r ttcode("xts")` object `r ttcode("GDP")`.

```{r}
# GDP series as xts object
GDP <- xts(USMacroSWQ$GDPC96, USMacroSWQ$Date)["1960::2013"]

# GDP growth series as xts object
GDPGrowth <- xts(400 * log(GDP/lag(GDP)))
```

The following code chunks reproduce Figure 14.1 of the book.


```{r, fig.align='center'}
# reproduce Figure 14.1 (a) of the book
plot(log(as.zoo(GDP)),
     col = "steelblue",
     lwd = 2,
     ylab = "Logarithm",
     xlab = "Date",
     main = "U.S. Quarterly Real GDP")
```


```{r, fig.align='center'}
# reproduce Figure 14.1 (b) of the book
plot(as.zoo(GDPGrowth),
     col = "steelblue",
     lwd = 2,
     ylab = "Logarithm",
     xlab = "Date",
     main = "U.S. Real GDP Growth Rates")
```


### Notation, Lags, Differences, Logarithms and Growth Rates {-}

For observations of a variable $Y$ recorded over time, $Y_t$ denotes the value observed at time $t$. The period between two sequential observations $Y_t$ and $Y_{t-1}$ is a unit of time: hours, days, weeks, months, quarters, years etc. Key Concept 14.1 introduces the essential terminology and notation for time series data we use in the subsequent sections.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC14.1">
<h3 class = "right"> Key Concept 14.1 </h3>
<h3 class = "left"> Lags, First Differences, Logarithms and Growth Rates </h3>

- Previous values of a time series are called *lags*. The first lag of $Y_t$ is $Y_{t-1}$. The $j^{th}$ lag of $Y_t$ is $Y_{t-j}$. In <tt>r ttcode("R")</tt>, lags of univariate or multivariate time series objects are conveniently computed by <tt>lag()</tt>, see <tt>?lag</tt>.

- Sometimes we work with a differenced series. The first difference of a series is $\\Delta Y_{t} = Y_t - Y_{t-1}$, the difference between periods $t$ and $t-1$. If <tt>Y</tt> is a time series, the series of first differences is computed as <tt>diff(Y)</tt>.

- It may be convenient to work with the first difference in logarithms of a series. We denote this by $\\Delta \\log(Y_t) = \\log(Y_t) - \\log(Y_{t-1})$. For a time series <tt>Y</tt>, this is obtained using <tt>log(Y/lag(Y))</tt>. 

- $100 \\Delta \\log (Y_t)$ is an approximation for the percentage change between $Y_t$ and $Y_{t-1}$.

</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[Lags, First Differences, Logarithms and Growth Rates]{14.1}
\\begin{itemize}
\\item Previous values of a time series are called \\textit{lags}. The first lag of $Y_t$ is $Y_{t-1}$. The $j^{th}$ lag of $Y_t$ is $Y_{t-j}$. In \\texttt{R}, lags of univariate or multivariate time series objects are conveniently computed by \\texttt{lag()}, see \\texttt{?lag}.
\\item Sometimes we work with differenced series. The first difference of a series is $\\Delta Y_{t} = Y_t - Y_{t-1}$, the difference between periods $t$ and $t-1$. If \\texttt{Y} is a time series, the series of first differences is computed as \\texttt{diff(Y)}.
\\item It may be convenient to work with the first difference in logarithms of a series. We denote this by $\\Delta \\log(Y_t) = \\log(Y_t) - \\log(Y_{t-1})$. For a time series \\texttt{Y}, this is obtained using \\texttt{log(Y/lag(Y))}. 
\\item $100 \\Delta \\log (Y_t)$ is an approximation for the percentage change between $Y_t$ and $Y_{t-1}$.
\\end{itemize}
\\end{keyconcepts}
')
```

The definitions made in Key Concept 14.1 are useful because of two properties that are common to many economic time series:

- Exponential growth: some economic series grow approximately exponentially such that their logarithm is approximately linear.

- The standard deviation of many economic time series is approximately proportional to their level. Therefore, the standard deviation of the logarithm of such a series is approximately constant.

Furthermore, it is common to report growth rates in macroeconomic series which is why $\log$-differences are often used. 

Table 14.1 of the book presents the quarterly U.S. GDP time series, its logarithm, the annualized growth rate and the first lag of the annualized growth rate series for the period 2012:Q1 - 2013:Q1. The following simple function can be used to compute these quantities for a quarterly time series `r ttcode("series")`.

```{r}
# compute logarithms, annual growth rates and 1st lag of growth rates
quants <- function(series) {
  s <- series
  return(
    data.frame("Level" = s,
               "Logarithm" = log(s),
               "AnnualGrowthRate" = 400 * log(s / lag(s)),
               "1stLagAnnualGrowthRate" = lag(400 * log(s / lag(s))))
    )
}
```

The annual growth rate is computed using the approximation $$Annual Growth Y_t = 400 \cdot\Delta\log(Y_t)$$ since $100\cdot\Delta\log(Y_t)$ is an approximation of the quarterly percentage changes, see Key Concept 14.1.

We call `r ttcode("quants()")` on observations for the period 2011:Q3 - 2013:Q1.

```{r}
# obtain a data.frame with level, logarithm, annual growth rate and its 1st lag of GDP
quants(GDP["2011-07::2013-01"])
```

#### Autocorrelation {-}

Observations of a time series are typically correlated. This type of correlation is called *autocorrelation* or *serial correlation*. Key Concept 14.2 summarizes the concepts of population autocovariance and population autocorrelation and shows how to compute their sample equivalents.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC14.2">
<h3 class = "right"> Key Concept 14.2 </h3>
<h3 class = "left"> Autocorrelation and Autocovariance </h3>

The covariance between $Y_t$ and its $j^{th}$ lag, $Y_{t-j}$, is called the $j^{th}$ *autocovariance* of the series $Y_t$. The $j^{th}$ *autocorrelation coefficient*, also called the *serial correlation coefficient*, measures the correlation between $Y_t$ and $Y_{t-j}$.

We thus have
\\begin{align*}
  j^{th} \\text{autocovariance} =& \\, Cov(Y_t,Y_{t-j}), \\\\
  j^{th} \\text{autocorrelation} = \\rho_j =& \\, \\rho_{Y_t,Y_{t-j}} = \\frac{Cov(Y_t,Y_{t-j)}}{\\sqrt{Var(Y_t)Var(Y_{t-j})}}.
\\end{align*}

Population autocovariance and population autocorrelation can be estimated by $\\widehat{Cov(Y_t,Y_{t-j})}$, the sample autocovariance, and $\\widehat{\\rho}_j$, the sample autocorrelation:

\\begin{align*}
  \\widehat{Cov(Y_t,Y_{t-j})} =& \\, \\frac{1}{T} \\sum_{t=j+1}^T (Y_t - \\overline{Y}_{j+1:T})(Y_{t-j} - \\overline{Y}_{1:T-j}), \\\\
  \\widehat{\\rho}_j =& \\, \\frac{\\widehat{Cov(Y_t,Y_{t-j})}}{\\widehat{Var(Y_t)}}
\\end{align*}

$\\overline{Y}_{j+1:T}$ denotes the average of $Y_{j+1}, Y{j+2}, \\dots, Y_T$.

In <tt>R</tt> the function <tt>acf()</tt> from the package <tt>stats</tt> computes the sample autocovariance or the sample autocorrelation function.

</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[Autocorrelation and Autocovariance]{14.2}
The covariance between $Y_t$ and its $j^{th}$ lag, $Y_{t-j}$, is called the $j^{th}$ \\textit{autocovariance} of the series $Y_t$. The $j^{th}$ \\textit{autocorrelation coefficient}, also called the \\textit{serial correlation coefficient}, measures the correlation between $Y_t$ and $Y_{t-j}$. We thus have
\\begin{align*}
  j^{th} \\text{autocovariance} =& \\, Cov(Y_t,Y_{t-j}), \\\\
  j^{th} \\text{autocorrelation} = \\rho_j =& \\, \\rho_{Y_t,Y_{t-j}} = \\frac{Cov(Y_t,Y_{t-j)}}{\\sqrt{Var(Y_t)Var(Y_{t-j})}}.
\\end{align*}

Population autocovariance and population autocorrelation can be estimated by $\\widehat{Cov(Y_t,Y_{t-j})}$, the sample autocovariance, and $\\widehat{\\rho}_j$, the sample autocorrelation:
\\begin{align*}
  \\widehat{Cov(Y_t,Y_{t-j})} =& \\, \\frac{1}{T} \\sum_{t=j+1}^T (Y_t - \\overline{Y}_{j+1:T})(Y_{t-j} - \\overline{Y}_{1:T-j}), \\\\
  \\widehat{\\rho}_j =& \\, \\frac{\\widehat{Cov(Y_t,Y_{t-j})}}{\\widehat{Var(Y_t)}}.
\\end{align*}\\vspace{0.5cm}

$\\overline{Y}_{j+1:T}$ denotes the average of $Y_{j+1}, Y{j+2}, \\dots, Y_T$.\\newline

In \\texttt{R} the function \\texttt{acf()} from the package \\texttt{stats} computes the sample autocovariance or the sample autocorrelation function.
\\end{keyconcepts}
')
```

Using `r ttcode("acf()")` it is straightforward to compute the first four sample autocorrelations of the series `r ttcode("GDPGrowth")`.

```{r}
acf(na.omit(GDPGrowth), lag.max = 4, plot = F)
```

This is evidence that there is mild positive autocorrelation in the growth of GDP: if GDP grows faster than average in one period, there is a tendency for it to grow faster than average in the following periods.

#### Other Examples of Economic Time Series {-}

Figure 14.2 of the book presents four plots: the U.S. unemployment rate, the U.S. Dollar / British Pound exchange rate, the logarithm of the Japanese industrial production index as well as daily changes in the Wilshire 5000 stock price index, a financial time series. The next code chunk reproduces the plots of the three macroeconomic series and adds percentage changes in the daily values of the New York Stock Exchange Composite index as a fourth one (the data set `r ttcode("NYSESW")` comes with the `r ttcode("AER")` package).

```{r}
# define series as xts objects
USUnemp <- xts(USMacroSWQ$UNRATE, USMacroSWQ$Date)["1960::2013"]

DollarPoundFX <- xts(USMacroSWQ$EXUSUK, USMacroSWQ$Date)["1960::2013"]
  
JPIndProd <- xts(log(USMacroSWQ$JAPAN_IP), USMacroSWQ$Date)["1960::2013"]

# attach NYSESW data
data("NYSESW")  
NYSESW <- xts(Delt(NYSESW))
```


```{r, fig.align='center'}
# divide plotting area into 2x2 matrix
par(mfrow = c(2, 2))

# plot the series
plot(as.zoo(USUnemp),
     col = "steelblue",
     lwd = 2,
     ylab = "Percent",
     xlab = "Date",
     main = "US Unemployment Rate",
     cex.main = 1)

plot(as.zoo(DollarPoundFX),
     col = "steelblue",
     lwd = 2,
     ylab = "Dollar per pound",
     xlab = "Date",
     main = "U.S. Dollar / B. Pound Exchange Rate",
     cex.main = 1)

plot(as.zoo(JPIndProd),
     col = "steelblue",
     lwd = 2,
     ylab = "Logarithm",
     xlab = "Date",
     main = "Japanese Industrial Production",
     cex.main = 1)

plot(as.zoo(NYSESW),
     col = "steelblue",
     lwd = 2,
     ylab = "Percent per Day",
     xlab = "Date",
     main = "New York Stock Exchange Composite Index",
     cex.main = 1)
```


The series show quite different characteristics. The unemployment rate increases during recessions and declines during economic recoveries and growth. The Dollar/Pound exchange rates shows a deterministic pattern until the end of the Bretton Woods system. Japan's industrial production exhibits an upward trend and decreasing growth. Daily changes in the New York Stock Exchange composite index seem to fluctuate randomly around the zero line. The sample autocorrelations support this conjecture.

```{r}
# compute sample autocorrelation for the NYSESW series
acf(na.omit(NYSESW), plot = F, lag.max = 10)
```

The first 10 sample autocorrelation coefficients are very close to zero. The default plot generated by `acf()` provides further evidence.

```{r, fig.align='center'}
# plot sample autocorrelation for the NYSESW series
acf(na.omit(NYSESW), main = "Sample Autocorrelation for NYSESW Data")
```

The blue dashed bands represent values beyond which the autocorrelations are significantly different from zero at $5\%$ level. Even when the true autocorrelations are zero, we need to expect a few exceedences --- recall the definition of a type-I-error from Key Concept 3.5.
For most lags we see that the sample autocorrelation does not exceed the bands and there are only a few cases that lie marginally beyond the limits.

Furthermore, the `r ttcode("NYSESW")` series exhibits what econometricians call *volatility clustering*: there are periods of high and periods of low variance. This is common for many financial time series.

## Autoregressions

Autoregressive models are heavily used in economic forecasting. An autoregressive model relates a time series variable to its past values. This section discusses the basic ideas of autoregressions models, shows how they are estimated and discusses an application to forecasting GDP growth using `r ttcode("R")`.

#### The First-Order Autoregressive Model {-}

It is intuitive that the immediate past of a variable should have power to predict its near future. The simplest autoregressive model uses only the most recent outcome of the time series observed to predict future values. For a time series $Y_t$ such a model is called a first-order autoregressive model, often abbreviated AR(1), where the 1 indicates that the order of autoregression is one:
\begin{align*}
  Y_t = \beta_0 + \beta_1 Y_{t-1} + u_t
\end{align*}
is the AR(1) population model of a time series $Y_t$.

For the GDP growth series, an autoregressive model of order one uses only the information on GDP growth observed in the last quarter to predict a future growth rate. The first-order autoregression model of GDP growth can be estimated by computing OLS estimates in the regression of $GDPGR_t$ on $GDPGR_{t-1}$,
\begin{align}
  \widehat{GDPGR}_t = \hat\beta_0 + \hat\beta_1  GDPGR_{t-1}. (\#eq:GDPGRAR1)
\end{align}
Following the book we use data from 1962 to 2012 to estimate \@ref(eq:GDPGRAR1). This is easily done with the function `r ttcode("ar.ols()")` from the package `r ttcode("stats")`.

```{r}
# subset data
GDPGRSub <- GDPGrowth["1962::2012"]

# estimate the model
ar.ols(GDPGRSub, 
       order.max = 1, 
       demean = F, 
       intercept = T)
```

We can check that the computations done by `r ttcode("ar.ols()")` are the same as done by `r ttcode("lm()")`. 

```{r}
# length of data set
N <-length(GDPGRSub)

GDPGR_level <- as.numeric(GDPGRSub[-1])
GDPGR_lags <- as.numeric(GDPGRSub[-N])

# estimate the model
armod <- lm(GDPGR_level ~ GDPGR_lags)
armod
```

As usual, we may use `r ttcode("coeftest()")` to obtain a robust summary on the estimated regression coefficients.

```{r}
# robust summary
coeftest(armod, vcov. = vcovHC, type = "HC1")
```

Thus the estimated model is
\begin{align}
  \widehat{GDPGR}_t = \underset{(0.351)}{1.995} + \underset{(0.076)}{0.338} GDPGR_{t-1} (\#eq:gdpgrar1).
\end{align}

We omit the first observation for $GDPGR_{1962 \ Q1}$ from the vector of the dependent variable since $GDPGR_{1962 \ Q1 - 1} = GDPGR_{1961 \ Q4}$, is not included in the sample. Similarly, the last observation, $GDPGR_{2012 \ Q4}$, is excluded from the predictor vector since the data does not include $GDPGR_{2012 \ Q4 + 1} = GDPGR_{2013 \ Q1}$. Put differently, when estimating the model, one observation is lost because of the time series structure of the data.

#### Forecasts and Forecast Errors {-}

Suppose $Y_t$ follows an AR(1) model with an intercept and that you have an OLS estimate of the model on the basis of observations for $T$ periods. Then you may use the AR(1) model to obtain $\widehat{Y}_{T+1\vert T}$, a forecast for $Y_{T+1}$ using data up to period $T$ where
\begin{align*}
  \widehat{Y}_{T+1\vert T} = \hat{\beta}_0 + \hat{\beta}_1 Y_T.
\end{align*}

The forecast error is
\begin{align*}
  \text{Forecast error} = Y_{T+1} - \widehat{Y}_{T+1\vert T}.
\end{align*}

#### Forecasts and Predicted Values {-}

Forecasted values of $Y_t$ are *not* what we refer to as *OLS predicted values* of $Y_t$. Also, the forecast error is *not* an OLS residual. Forecasts and forecast errors are obtained using *out-of-sample* values while predicted values and residuals are computed for *in-sample* values that were actually observed and used in estimating the model.

The root mean squared forecast error (RMSFE) measures the typical size of the forecast error and is defined as
\begin{align*}
  RMSFE = \sqrt{E\left[\left(Y_{T+1} - \widehat{Y}_{T+1\vert T}\right)^2\right]}.
\end{align*}

The $RMSFE$ is composed of the future errors $u_t$ and the error made when estimating the coefficients. When the sample size is large, the former may be much larger than the latter so that $RMSFE \approx \sqrt{Var()u_t}$ which can be estimated by the standard error of the regression.

#### Application to GDP Growth {-}

Using \@ref(eq:gdpgrar1), the estimated AR(1) model of GDP growth, we perform the forecast for GDP growth for 2013:Q1 (remember that the model was estimated using data for periods 1962:Q1 - 2012:Q4, so 2013:Q1 is an out-of-sample period). Plugging $GDPGR_{2012:Q4} \approx 0.15$ into \@ref(eq:gdpgrar1),

\begin{align*}
  \widehat{GDPGR}_{2013:Q1} = 1.995 + 0.348 \cdot 0.15 = 2.047.
\end{align*}

The function `r ttcode("forecast()")` from the `r ttcode("forecast")` package has some useful features for forecasting time series data.

```{r, message=FALSE}
library(forecast)

# assign GDP growth rate in 2012:Q4
new <- data.frame("GDPGR_lags" = GDPGR_level[N-1])

# forecast GDP growth rate in 2013:Q1
forecast(armod, newdata = new)
```

Using `r ttcode("forecast()")`produces the same point forecast of about 2.0, along with $80\%$ and $95\%$ forecast intervals, see section \@ref(apatadlm). We conclude that our AR(1) model forecasts GDP growth to be $2\%$ in 2013:Q1.

How accurate is this forecast? The forecast error is quite large: $GDPGR_{2013:Q1} \approx 1.1\%$ while our forecast is $2\%$.
Second, by calling `summary(armod)` shows that the model explains only little of the variation in the growth rate of GDP and the $SER$ is about $3.16$. Leaving aside forecast uncertainty due to estimation of the model coefficients $\beta_0$ and $\beta_1$, the $RMSFE$ must be at least $3.16\%$, the estimate of the standard deviation of the errors. We conclude that this forecast is pretty inaccurate.

```{r}
# compute the forecast error
forecast(armod, newdata = new)$mean - GDPGrowth["2013"][1]

# R^2
summary(armod)$r.squared

# SER
summary(armod)$sigma
```

### Autoregressive Models of Order $p$ {-}

For forecasting GDP growth, the AR($1$) model \@ref(eq:gdpgrar1) disregards any information in the past of the series that is more distant than one period. An AR($p$) model incorporates the information of $p$ lags of the series. The idea is explained in Key Concept 14.3.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC14.3">
<h3 class = "right"> Key Concept 14.3 </h3>
<h3 class = "left"> Autoregressions </h3>
<p>
An AR($p$) model assumes that a time series $Y_t$ can be modeld by a linear function of the first $p$ of its lagged values.
\\begin{align*}
  Y_t = \\beta_0 + \\beta_1 Y_{t-1} + \\beta_2 Y_{t-2} + \\dots + \\beta_p Y_{t-p} + u_t
\\end{align*}
is an autoregressive model of order $p$ where $E(u_t\\vert Y_{t-1}, Y_{t-2}, \\dots,Y_{t-p})=0$.
</p>
</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[Autoregressions]{14.3}
An AR($p$) model assumes that a time series $Y_t$ can be modeled by a linear function of the first $p$ of its lagged values.
\\begin{align*}
  Y_t = \\beta_0 + \\beta_1 Y_{t-1} + \\beta_2 Y_{t-2} + \\dots + \\beta_p Y_{t-p} + u_t
\\end{align*}
is an autoregressive model of order $p$ where $E(u_t\\vert Y_{t-1}, Y_{t-2}, \\dots,Y_{t-p})=0$.
\\end{keyconcepts}
')
```

Following the book, we estimate an AR($2$) model of the GDP growth series from 1962:Q1 to 2012:Q4.

```{r}
# estimate the AR(2) model
GDPGR_AR2 <- dynlm(ts(GDPGR_level) ~ L(ts(GDPGR_level)) + L(ts(GDPGR_level), 2))

coeftest(GDPGR_AR2, vcov. = sandwich)
```

The estimation yields
\begin{align}
  \widehat{GDPGR}_t = \underset{(0.40)}{1.63} + \underset{(0.08)}{0.28} GDPGR_{t-1} + \underset{(0.08)}{0.18} GDPGR_{t-1}. (\#eq:GDPGRAR2)
\end{align}
We see that the coefficient on the second lag is significantly different from zero. The fit improves slightly: $\bar{R}^2$ grows from $0.11$ for the AR($1$) model to about $0.14$ and the $SER$ reduces to $3.13$.

```{r}
# R^2
summary(GDPGR_AR2)$r.squared

# SER
summary(GDPGR_AR2)$sigma
```

We may use the AR($2$) model to obtain a forecast for GDP growth in 2013:Q1 in the same manner as for the AR(1) model.

```{r}
# AR(2) forecast of GDP growth in 2013:Q1 
forecast <- c("2013:Q1" = coef(GDPGR_AR2) %*% c(1, GDPGR_level[N-1], GDPGR_level[N-2]))
```

This leads to a forecast error of roughly $-1\%$.

```{r}
# compute AR(2) forecast error 
GDPGrowth["2013"][1] - forecast
```

## Can You Beat the Market? (Part I) {#cybtmpi}

The theory of efficient capital markets states that stock prices embody all currently available information. If this hypothesis holds, it should not be possible to estimate a useful model for forecasting future stock returns using publicly available information on past returns (this is also referred to as the weak-form efficiency hypothesis): if it was possible to forecast the market, traders would be able to arbitrage, e.g., by relying on an AR($2$) model, they would use information that is not already priced-in which would push prices until the expected return is zero.

This idea is presented in the box *Can You Beat the Market? (Part I)* on p. 582 of the book. This section reproduces the estimation results.

We start by importing monthly data from 1931:1 to 2002:12 on excess returns of a broad-based index of stock prices, the CRSP value-weighted index. The data are provided by the authors of the book as an excel sheet which can be downloaded [here](http://wps.aw.com/wps/media/objects/11422/11696965/data3eu/Stock_Returns_1931_2002.xlsx).

```{r}
# read in data on stock returns
SReturns <- read_xlsx("Data/Stock_Returns_1931_2002.xlsx",
                      sheet = 1,
                      col_types = "numeric")
```

We continue by converting the data to an object of class `r ttcode("ts")`.

```{r}
# convert to ts object
StockReturns <- ts(SReturns[, 3:4], 
                   start = c(1931, 1), 
                   end = c(2002, 12), 
                   frequency = 12)
```

Next, we estimate AR($1$), AR($2$) and AR($4$) models of excess returns for the time period 1960:1 to 2002:12.

```{r}
# estimate AR models:

# AR(1)
SR_AR1 <- dynlm(ExReturn ~ L(ExReturn), 
      data = StockReturns, start = c(1960, 1), end = c(2002, 12))

# AR(2)
SR_AR2 <- dynlm(ExReturn ~ L(ExReturn) + L(ExReturn, 2), 
      data = StockReturns, start = c(1960, 1), end = c(2002, 12))

# AR(4)
SR_AR4 <- dynlm(ExReturn ~ L(ExReturn) + L(ExReturn, 1:4), 
      data = StockReturns, start = c(1960, 1), end = c(2002, 12))
```

After computing robust standard errors, we gather the results in a table generated by `r ttcode("stargazer()")`.

```{r}
# compute robust standard errors
rob_se <- list(sqrt(diag(sandwich(SR_AR1))),
               sqrt(diag(sandwich(SR_AR2))),
               sqrt(diag(sandwich(SR_AR4))))
```

```{r, message=F, warning=F, results='asis', eval=F}
# generate table using 'stargazer()'
stargazer(SR_AR1, SR_AR2, SR_AR4,
  title = "Autoregressive Models of Monthly Excess Stock Returns",
  header = FALSE, 
  model.numbers = F,
  omit.table.layout = "n",
  digits = 3, 
  column.labels = c("AR(1)", "AR(2)", "AR(4)"),
  dep.var.caption  = "Dependent Variable: Excess Returns on the CSRP Value-Weighted Index",
  dep.var.labels.include = FALSE,
  covariate.labels = c("$excess return_{t-1}$", "$excess return_{t-2}$", 
                       "$excess return_{t-3}$", "$excess return_{t-4}$", 
                       "Intercept"),
  se = rob_se,
  omit.stat = "rsq") 
```

<!--html_preserve-->

```{r, message=F, warning=F, results='asis', echo=F, purl=F, eval=my_output=="html"}
stargazer(SR_AR1, SR_AR2, SR_AR4,
  header = FALSE, 
  type = "html",
  model.numbers = F,
  omit.table.layout = "n",
  digits = 3, 
  column.labels = c("AR(1)", "AR(2)", "AR(4)"),
  dep.var.caption  = "Dependent Variable: Excess returns on the CSRP Value-Weighted Index",
  dep.var.labels.include = FALSE,
  covariate.labels = c("$excess return_{t-1}$", "$excess return_{t-2}$", "$excess return_{t-3}$", "$excess return_{t-4}$", "Intercept"),
  se = rob_se,
  omit.stat = c("rsq")
  )

stargazer_html_title("Autoregressive Models of Monthly Excess Stock Returns", "amomesr")
```

<!--/html_preserve-->

```{r, message=F, warning=F, results='asis', echo=F, purl=F, eval=my_output=="latex"}
stargazer(SR_AR1, SR_AR2, SR_AR4,
  title = "\\label{tab:amomesr} Autoregressive Models of Monthly Excess Stock Returns",
  header = FALSE, 
  type = "latex",
  model.numbers = F,
  omit.table.layout = "n",
  digits = 3, 
  column.labels = c("AR(1)", "AR(2)", "AR(4)"),
  dep.var.caption  = "Dependent Variable: Excess returns on the CSRP Value-Weighted Index",
  dep.var.labels.include = FALSE,
  covariate.labels = c("$excess return_{t-1}$", "$excess return_{t-2}$", "$excess return_{t-3}$", "$excess return_{t-4}$", "Intercept"),
  se = rob_se,
  omit.stat = c("rsq")
  ) 
```

The results are consistent with the hypothesis of efficient financial markets: there are no statistically significant coefficients in any of the estimated models and the hypotheses that all coefficients are zero cannot be rejected. $\bar{R}^2$ is almost zero in all models and even negative for the AR($4$) model. This suggests that none of the models are useful for forecasting stock returns.

## Additional Predictors and The ADL Model {#apatadlm}

Instead of only using the dependent variable's lags as predictors, an autoregressive distributed lag (ADL) model also uses lags of other variables for forecasting. The general ADL model is summarized in Key Concept 14.4:

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC14.4">
<h3 class = "right"> Key Concept 14.4 </h3>
<h3 class = "left"> The Autoregressive Distributed Lag Model </h3>
<p>
An ADL($p$,$q$) model assumes that a time series $Y_t$ can be represented by a linear function of $p$ of its lagged values and $q$ lags of another time series $X_t$:
\\begin{align*}
  Y_t =& \\, \\beta_0 + \\beta_1 Y_{t-1} + \\beta_2 Y_{t-2} + \\dots + \\beta_p Y_{t-p} \\\\ 
      &+ \\, \\delta_1 X_{t-1} + \\delta_2 X_{t-2} + \\dots + \\delta_q X_{t-q} X_{t-q} + u_t.
\\end{align*}
is an *autoregressive distributed lag model* with $p$ lags of $Y_t$ and $q$ lags of $X_t$ where $$E(u_t\\vert Y_{t-1}, Y_{t-2}, \\dots, X_{t-1}, X_{t-2}, \\dots)=0.$$
</p>
</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[The Autoregressive Distributed Lag Model]{14.4}
An ADL($p$,$q$) model assumes that a time series $Y_t$ can be represented by a linear function of $p$ of its lagged values and $q$ lags of another time series $X_t$:
\\begin{align*}
  Y_t =& \\, \\beta_0 + \\beta_1 Y_{t-1} + \\beta_2 Y_{t-2} + \\dots + \\beta_p Y_{t-p} \\\\ 
      &+ \\, \\delta_1 X_{t-1} + \\delta_2 X_{t-2} + \\dots + \\delta_q X_{t-q} X_{t-q} + u_t.
\\end{align*}
is an \\textit{autoregressive distributed lag model} with $p$ lags of $Y_t$ and $q$ lags of $X_t$ where $$E(u_t\\vert Y_{t-1}, Y_{t-2}, \\dots, X_{t-1}, X_{t-2}, \\dots)=0.$$
\\end{keyconcepts}
')
```

#### Forecasting GDP Growth Using the Term Spread {-}

Interest rates on long-term and short term treasury bonds are closely linked to macroeconomic conditions. While interest rates on both types of bonds have the same long-run tendencies, they behave quite differently in the short run.
The difference in interest rates of two bonds with distinct maturity is called the *term spread*. 

The following code chunks reproduce Figure 14.3 of the book which displays interest rates of 10-year U.S. Treasury bonds and 3-months U.S. Treasury bills from 1960 to 2012.

```{r}
# 3-months Treasury bills interest rate
TB3MS <- xts(USMacroSWQ$TB3MS, USMacroSWQ$Date)["1960::2012"]

# 10-years Treasury bonds interest rate
TB10YS <- xts(USMacroSWQ$GS10, USMacroSWQ$Date)["1960::2012"]

# term spread
TSpread <- TB10YS - TB3MS
```


```{r, fig.align='center'}
# reproduce Figure 14.2 (a) of the book
plot(merge(as.zoo(TB3MS), as.zoo(TB10YS)), 
     plot.type = "single", 
     col = c("darkred", "steelblue"),
     lwd = 2,
     xlab = "Date",
     ylab = "Percent per annum",
     main = "Interest Rates")

# define function that transform years to class 'yearqtr'
YToYQTR <- function(years) {
  return(
      sort(as.yearqtr(sapply(years, paste, c("Q1", "Q2", "Q3", "Q4"))))
  )
}

# recessions
recessions <- YToYQTR(c(1961:1962, 1970, 1974:1975, 1980:1982, 1990:1991, 2001, 2007:2008))
          
# add color shading for recessions
xblocks(time(as.zoo(TB3MS)), 
        c(time(TB3MS) %in% recessions), 
        col = alpha("steelblue", alpha = 0.3))

# add a legend
legend("topright", 
       legend = c("TB3MS", "TB10YS"),
       col = c("darkred", "steelblue"),
       lwd = c(2, 2))

# reproduce Figure 14.2 (b) of the book
plot(as.zoo(TSpread), 
     col = "steelblue",
     lwd = 2,
     xlab = "Date",
     ylab = "Percent per annum",
     main = "Term Spread")

# add color shading for recessions
xblocks(time(as.zoo(TB3MS)), 
        c(time(TB3MS) %in% recessions), 
        col = alpha("steelblue", alpha = 0.3))
```


Before recessions, the gap between interest rates on long-term bonds and short term bills narrows and consequently the term spread declines drastically towards zero or even becomes negative in times of economic stress. This information might be used to improve GDP growth forecasts of future.

We check this by estimating an ADL($2$, $1$) model and an ADL($2$, $2$) model of the GDP growth rate using lags of GDP growth and lags of the term spread as regressors. We then use both models for forecasting GDP growth in 2013:Q1.

```{r}
# convert growth and spread series to ts objects
GDPGrowth_ts <- ts(GDPGrowth, 
                  start = c(1960, 1), 
                  end = c(2013, 4), 
                  frequency = 4)

TSpread_ts <- ts(TSpread, 
                start = c(1960, 1), 
                end = c(2012, 4), 
                frequency = 4)

# join both ts objects
ADLdata <- ts.union(GDPGrowth_ts, TSpread_ts)
```

```{r}
# estimate the ADL(2,1) model of GDP growth
GDPGR_ADL21 <- dynlm(GDPGrowth_ts ~ L(GDPGrowth_ts) + L(GDPGrowth_ts, 2) + L(TSpread_ts), 
      start = c(1962, 1), end = c(2012, 4))

coeftest(GDPGR_ADL21, vcov. = sandwich)
```

The estimated equation of the ADL($2$, $1$) model is
\begin{align}
  \widehat{GDPGR}_t = \underset{(0.49)}{0.96} + \underset{(0.08)}{0.26} GDPGR_{t-1} + \underset{(0.08)}{0.19} GDPGR_{t-2} + \underset{(0.18)}{0.44} TSpread_{t-1} (\#eq:gdpgradl21)
\end{align}

All coefficients are significant at the level of $5\%$.

```{r}
# 2012:Q3 / 2012:Q4 data on GDP growth and term spread
subset <- window(ADLdata, c(2012, 3), c(2012, 4))

# ADL(2,1) GDP growth forecast for 2013:Q1
ADL21_forecast <- coef(GDPGR_ADL21) %*% c(1, subset[2, 1], subset[1, 1], subset[2, 2])
ADL21_forecast

# compute the forecast error
window(GDPGrowth_ts, c(2013, 1), c(2013, 1)) - ADL21_forecast
```
Model \@ref(eq:gdpgradl21) predicts the GDP growth in 2013:Q1 to be $2.24\%$ which leads to a forecast error of $-1.10\%$.

We estimate the ADL($2$,$2$) specification to see whether adding additional information on past term spread  improves the forecast.

```{r}
# estimate the ADL(2,2) model of GDP growth
GDPGR_ADL22 <- dynlm(GDPGrowth_ts ~ L(GDPGrowth_ts) + L(GDPGrowth_ts, 2) 
                     + L(TSpread_ts) + L(TSpread_ts, 2), 
                     start = c(1962, 1), end = c(2012, 4))

coeftest(GDPGR_ADL22, vcov. = sandwich)
```

We obtain
\begin{align}
  \begin{split}
    \widehat{GDPGR}_t =& \underset{(0.47)}{0.98} + \underset{(0.08)}{0.24} GDPGR_{t-1} \\
    & + \underset{(0.08)}{0.18} GDPGR_{t-2} -\underset{(0.42)}{0.14} TSpread_{t-1} + \underset{(0.43)}{0.66} TSpread_{t-2}.
  \end{split} (\#eq:gdpgradl22)
\end{align}

The coefficients on both lags of the term spread are not significant at the $10\%$ level.

```{r}
# ADL(2,2) GDP growth forecast for 2013:Q1
ADL22_forecast <- coef(GDPGR_ADL22) %*% c(1, subset[2, 1], subset[1, 1], subset[2, 2], subset[1, 2])
ADL22_forecast

# compute the forecast error
window(GDPGrowth_ts, c(2013, 1), c(2013, 1)) - ADL22_forecast
```

The ADL($2$,$2$) forecast of GDP growth in 2013:Q1 is $2.27\%$ which implies a forecast error of $1.14\%$.

Do the ADL models \@ref(eq:gdpgradl21) and \@ref(eq:gdpgradl22) improve upon the simple AR($2$) model \@ref(eq:GDPGRAR2)? The answer is yes: while $SER$ and $\bar{R}^2$ improve only slightly, an $F$-test on the term spread coefficients in \@ref(eq:gdpgradl22) provides evidence that the model does better in explaining GDP growth than the AR($2$) model as the hypothesis that both coefficients are zero cannot be rejected at the level of $5\%$.

```{r}
# compare adj. R2
c("Adj.R2 AR(2)" = summary(GDPGR_AR2)$r.squared,
  "Adj.R2 ADL(2,1)" = summary(GDPGR_ADL21)$r.squared,
  "Adj.R2 ADL(2,2)" = summary(GDPGR_ADL22)$r.squared)

# compare SER
c("SER AR(2)" = summary(GDPGR_AR2)$sigma,
  "SER ADL(2,1)" = summary(GDPGR_ADL21)$sigma,
  "SER ADL(2,2)" = summary(GDPGR_ADL22)$sigma)

# F-test on coefficients of term spread
linearHypothesis(GDPGR_ADL22, 
                 c("L(TSpread_ts)=0", "L(TSpread_ts, 2)=0"),
                 vcov. = sandwich)
```

#### Stationarity {-}

In general, forecasts can be improved by using multiple predictors --- just as in cross-sectional regression. When constructing time series models one should take into account whether the variables are *stationary* or *nonstationary*. Key Concept 14.5 explains what stationarity is. 

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC14.5">
<h3 class = "right"> Key Concept 14.5 </h3>
<h3 class = "left"> Stationarity </h3>
A time series $Y_t$ is stationary if its probability distribution is time independent, that is the joint distribution of $Y_{s+1}, Y_{s+2},\\dots,Y_{s+T}$ does not change as $s$ is varied, regardless of $T$.

Similarly, two time series $X_t$ and $Y_t$ are *jointly stationary* if the joint distribution of $(X_{s+1},Y_{s+1}, X_{s+2},Y_{s+2} \\dots, X_{s+T},Y_{s+T})$ does not depend on $s$, regardless of $T$.

Stationarity makes it easier to learn about the characteristics of past data.
</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[Stationarity]{14.5}
A time series $Y_t$ is stationary if its probability distribution is time independent, that is the joint distribution of $Y_{s+1}, Y_{s+2},\\dots,Y_{s+T}$ does not change as $s$ is varied, regardless of $T$.\\newline

Similarly, two time series $X_t$ and $Y_t$ are \\textit{jointly stationary} if the joint distribution of $(X_{s+1},Y_{s+1}, X_{s+2},Y_{s+2} \\dots, X_{s+T}, Y_{s+T})$ does not depend on $s$, regardless of $T$.\\newline

In a probabilistic sense, stationarity means that information about how a time series evolves in the future is inherent to its past. If this is not the case, we cannot use the past of a series as a reliable guideline for its future.\\vspace{0.5cm}

Stationarity makes it easier to learn about the characteristics of past data.
\\end{keyconcepts}
')
```

#### Time Series Regression with Multiple Predictors {-}

The concept of stationarity is a key assumption in the general time series regression model with multiple predictors. Key Concept 14.6 lays out this model and its assumptions.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC14.6">
<h3 class = "right"> Key Concept 14.6 </h3>
<h3 class = "left"> Time Series Regression with Multiple Predictors </h3>
The general time series regression model extends the ADL model such that multiple regressors and their lags are included. It uses $p$ lags of the dependent variable and $q_l$ lags of $l$ additional predictors where $l=1,\\dots,k$:

\\begin{equation}
  \\begin{aligned}
  Y_t =&  \\beta_0 + \\beta_1 Y_{t-1} + \\beta_2 Y_{t-2} + \\dots + \\beta_{p} Y_{t-p} \\\\
      &+  \\delta_{11} X_{1,t-1} + \\delta_{12} X_{1,t-2} + \\dots + \\delta_{1q} X_{1,t-q} \\\\
      &+  \\dots \\\\
      &+  \\delta_{k1} X_{k,t-1} + \\delta_{k2} X_{k,t-2} + \\dots + \\delta_{kq} X_{k,t-q} \\\\
      &+  u_t 
  \\end{aligned}
\\end{equation}

For estimation we make the following assumptions:

1. The error term $u_t$ has conditional mean zero given all regressors and their lags: $$E(u_t\\vert Y_{t-1}, Y_{t-2}, \\dots, X_{1,t-1}, X_{1,t-2} \\dots, X_{k,t-1}, X_{k,t-2}, \\dots)$$ This assumption is an extension of the conditional mean zero assumption used for AR and ADL models and guarantees that the general time series regression model stated above gives the best forecast of $Y_t$ given its lags, the additional regressors $X_{1,t},\\dots,X_{k,t}$ and their lags.  

2. The i.i.d. assumption for cross-sectional data is not (entirely) meaningful for time series data. We replace it by the following assumption witch consists of two parts:

    (a) The $(Y_{t}, X_{1,t}, \\dots, X_{k,t})$ have a stationary distribution (the "identically distributed" part of the i.i.d. assumption for cross-setional data). If this does not hold, forecasts *may* be biased and inference *can* be strongly misleading.    
  
    (b) $(Y_{t}, X_{1,t}, \\dots, X_{k,t})$ and $(Y_{t-j}, X_{1,t-j}, \\dots, X_{k,t-j})$ become independent as $j$ gets large (the "idependently" distributed part of the i.i.d. assumption for cross-sectional data). This assumption is also called *weak dependence*. It ensures that the WLLN and the CLT hold in large samples.
    
3. Large outliers are unlikely: $E(X_{1,t}^4), E(X_{2,t}^4), \\dots, E(X_{k,t}^4)$ and $E(Y_t^4)$ have nonzero, finite fourth moments.  

4. No perfect multicollinearity.

</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[Time Series Regression with Multiple Predictors]{14.6}
The general time series regression model extends the ADL model such that multiple regressors and their lags are included. It uses $p$ lags of the dependent variable and $q_l$ lags of $l$ additional predictors where $l=1,\\dots,k$:
\\begin{equation}
  \\begin{aligned}
  Y_t =& \\, \\beta_0 + \\beta_1 Y_{t-1} + \\beta_2 Y_{t-2} + \\dots + \\beta_{p} Y_{t-p} \\\\
      +& \\, \\delta_{11} X_{1,t-1} + \\delta_{12} X_{1,t-2} + \\dots + \\delta_{1q} X_{1,t-q} \\\\
      +& \\, \\dots \\\\
      +& \\, \\delta_{k1} X_{k,t-1} + \\delta_{k2} X_{k,t-2} + \\dots + \\delta_{kq} X_{k,t-q} \\\\
      +& \\, u_t 
  \\end{aligned}
\\end{equation}

For estimation we make the following assumptions:\\newline

\\begin{enumerate}
\\item The error term $u_t$ has conditional mean zero given all regressors and their lags: $$E(u_t\\vert Y_{t-1}, Y_{t-2}, \\dots, X_{1,t-1}, X_{1,t-2} \\dots, X_{k,t-1}, X_{k,t-2}, \\dots)$$ This assumption is an extension of the conditional mean zero assumption used for AR and ADL models and guarantees that the general time series regression model stated above gives the best forecast of $Y_t$ given its lags, the additional regressors $X_{1,t},\\dots,X_{k,t}$ and their lags.  
\\item The i.i.d. assumption for cross-sectional data is not (entirely) meaningful for time series data. We replace it by the following assumption witch consists of two parts:\\newline
\\begin{itemize}
    \\item[(a)] The $(Y_{t}, X_{1,t}, \\dots, X_{k,t})$ have a stationary distribution (the "identically distributed" part of the i.i.d. assumption for cross-setional data). If this does not hold, forecasts may be biased and inference can be strongly misleading.    
  
    \\item[(b)] $(Y_{t}, X_{1,t}, \\dots, X_{k,t})$ and $(Y_{t-j}, X_{1,t-j}, \\dots, X_{k,t-j})$ become independent as $j$ gets large (the "idependently" distributed part of the i.i.d. assumption for cross-sectional data). This assumption is also called \\textit{weak dependence}. It ensures that the WLLN and the CLT hold in large samples.
\\end{itemize}
    
\\item Large outliers are unlikely: $E(X_{1,t}^4), E(X_{2,t}^4), \\dots, E(X_{k,t}^4)$ and $E(Y_t^4)$ have nonzero, finite fourth moments.  

\\item No perfect multicollinearity.
\\end{enumerate}
\\end{keyconcepts}
')
```

Since many economic time series appear to be nonstationary, assumption two of Key Concept 14.6 is a crucial one in applied macroeconomics and finance which is why statistical test for stationarity or nonstationarity have been developed. Chapters \@ref(llsuic) and \@ref(nit) are devoted to this topic. 

#### Statistical inference and the Granger causality test {-}

If a $X$ is a useful predictor for $Y$, in a regression of $Y_t$ on lags of its own and lags of $X_t$, not all of the coefficients on the lags on $X_t$ are zero. This concept is called *Granger causality* and is an interesting hypothesis to test. Key Concept 14.7 summarizes the idea.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC14.7">
<h3 class = "right"> Key Concept 14.7 </h3>
<h3 class = "left"> Granger Causality Tests </h3>
The Granger causality test @granger1969 is an $F$ test of the null hypothesis that *all* lags of a variable $X$ included in a time series regression model do not have predictive power for $Y_t$. The Granger causality test does not test whether $X$ actually *causes* $Y$ but whether the included lags are informative in terms of predicting $Y$.
</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[Granger Causality Tests]{14.7}
The Granger causality test \\citep{granger1969} is an $F$ test of the null hypothesis that \\textit{all} lags of a variable $X$ included in a time series regression model do not have predictive power for $Y_t$. The Granger causality test does not test whether $X$ actually \\textit{causes} $Y$ but whether the included lags are informative in terms of predicting $Y$.
\\end{keyconcepts}
')
```

We have already performed a Granger causality test on the coefficients of term spread in \@ref(eq:gdpgradl22), the ADL($2$,$2$) model of GDP growth and concluded that at least one of the first two lags of term spread has predictive power for GDP growth.

### Forecast Uncertainty and Forecast Intervals {-}

In general, it is good practice to report a measure of the uncertainty when presenting results that are affected by the latter. Uncertainty is particularly of interest when forecasting a time series. For example, consider a simple ADL$(1,1)$ model
\begin{align*}
  Y_t = \beta_0 + \beta_1 Y_{t-1} + \delta_1 X_{t-1} + u_t
\end{align*}
where $u_t$ is a homoskedastic error term. The forecast error is
\begin{align*}
  Y_{T+1} - \widehat{Y}_{T+1\vert T} = u_{T+1} - \left[(\widehat{\beta}_0 - \beta_0) + (\widehat{\beta}_1 - \beta_1) Y_T + (\widehat{\delta_1} - \delta_1) X_T \right].
\end{align*}
The mean squared forecast error (MSFE) and the RMFSE are
\begin{align*}
  MFSE =& \, E\left[(Y_{T+1} - \widehat{Y}_{T+1\vert T})^2 \right] \\
       =& \, \sigma_u^2 + Var\left[ (\widehat{\beta}_0 - \beta_0) + (\widehat{\beta}_1 - \beta_1) Y_T + (\widehat{\delta_1} - \delta_1) X_T \right], \\
  RMFSE =& \, \sqrt{\sigma_u^2 + Var\left[ (\widehat{\beta}_0 - \beta_0) + (\widehat{\beta}_1 - \beta_1) Y_T + (\widehat{\delta_1} - \delta_1) X_T \right]}.
\end{align*}

A $95\%$ forecast interval is an interval that covers the true value of $Y_{T+1}$ in $95\%$ of repeated applications. There is a major difference in computing a confidence interval and a forecast interval: when computing a confidence interval of a point estimate we use large sample approximations that are justified by the CLT and thus are valid for a large range of error term distributions. For computation of a forecast interval of $Y_{T+1}$, however, we must make an additional assumption about the distribution of $u_{T+1}$, the error term in period $T+1$. Assuming that $u_{T+1}$ is normally distributed one can construct a $95\%$ *forecast interval* for $Y_{T+1}$ using $SE(Y_{T+1} - \widehat{Y}_{T+1\vert T})$, an estimate of the RMSFE: 
\begin{align*}
  \widehat{Y}_{T+1\vert T} \pm 1.96 \cdot SE(Y_{T+1} - \widehat{Y}_{T+1\vert T})
\end{align*}
Of course, the computation gets more complicated when the error term is heteroskedastic or if we are interested in computing a forecast interval for $T+s, s>1$. 

In some applications it is useful to report multiple forecast intervals for subsequent periods, see the box *The River of Blood* on p. 592 of the book. These can be visualized in a so-called fan chart. We will not replicate the fan chart presented in Figure 14.2 of book because the underlying model is by far more complex than the simple AR and ADL models treated here. Instead, in the example below we use simulated time series data and estimate an AR($2$) model which is then used for forecasting the subsequent $25$ future outcomes of the series. 


```{r, fig.align='center'}
# set seed
set.seed(1234)

# simulate the time series
Y <- arima.sim(list(order = c(2, 0, 0), ar = c(0.2, 0.2)),  n = 200)

# estimate an AR(2) model using 'arima()', see ?arima
model <- arima(Y, order = c(2, 0, 0))

# compute points forecasts and prediction intervals for the next 25 periods
fc <- forecast(model, h = 25, level = seq(5, 99, 10))

# plot a fan chart
plot(fc, 
     main = "Forecast Fan Chart for AR(2) Model of Simulated Data", 
     showgap = F, 
     fcol = "red",
     flty = 2)
```


`r ttcode("arima.sim()")` simulates autoregressive integrated moving average (ARIMA) models. AR models belong to this class of models. We use `r ttcode("list(order = c(2, 0, 0), ar = c(0.2, 0.2))")` so the DGP is $$Y_t = 0.2 Y_{t-1} + 0.2 Y_{t-2} + u_t.$$

We choose `r ttcode("level = seq(5, 99, 10)")` in the call of `r ttcode("forecast()")` such that forecast intervals with levels $5\%, 15\%, \dots, 95\%$ are computed for each point forecast of the series. 

The dashed red line shows point forecasts of the series for the next 25 periods based on an $ADL(1,1)$ model and the shaded areas represent the prediction intervals. The degree of shading indicates the level of the prediction interval. The darkest of the blue bands displays the $5\%$ forecast intervals and the color fades towards grey as the level of the intervals increases.

## Lag Length Selection Using Information Criteria {#llsuic}

The selection of lag lengths in AR and ADL models can sometimes be guided by economic theory. However, there are statistical methods that are helpful to determine how many lags should be included as regressors. In general, too many lags inflate the standard errors of coefficient estimates and thus imply an increase in the forecast error while omitting lags that should be included in the model may result in an estimation bias.

The order of an AR model can be determined using two approaches:

1. **The F-test approach**

    Estimate an AR($p$) model and test the significance of the largest lag(s). If the test rejects, drop the respective lag(s) from the model. This approach has the tendency to produce models where the order is too large: in a significance test we always face the risk of rejecting a true null hypothesis!

2. **Relying on an information criterion**

    To circumvent the issue of producing too large models, one may choose the lag order that minimizes one of the following two information criteria:
    
      + The *Bayes information criterion* (BIC):
      
        $$BIC(p) = \log\left(\frac{SSR(p)}{T}\right) + (p + 1) \frac{\log(T)}{T}$$
        
      + The *Akaike information criterion* (AIC):

        $$AIC(p) = \log\left(\frac{SSR(p)}{T}\right) + (p + 1) \frac{2}{T}$$

    Both criteria are estimators of the optimal lag length $p$. The lag order $\widehat{p}$ that minimizes the respective criterion is called the *BIC estimate* or the *AIC estimate* of the optimal model order. The basic idea of both criteria is that the $SSR$ decreases as additional lags are added to the model such that the first term decreases whereas the second increases as the lag order grows. One can show that the the $BIC$ is a consistent estimator of the true lag order while the AIC is not which is due to the differing factors in the second addend. Nevertheless, both estimators are used in practice where the $AIC$ is sometimes used as an alternative when the $BIC$ yields a model with "too few" lags.

The function `r ttcode("dynlm()")` does not compute information criteria by default. We will therefore write a short function that reports the $BIC$ (along with the chosen lag order $p$ and $R^2$) for objects of class `r ttcode("dynlm")`.

```{r}
# compute BIC for AR model objects of class 'dynlm'
BIC <- function(model) {
  
  ssr <- sum(model$residuals^2)
  t <- length(model$residuals)
  npar <- length(model$coef)
  
  return(
    round(c("p" = npar - 1,
          "BIC" = log(ssr/t) + npar * log(t)/t,
          "R2" = summary(model)$r.squared), 4)
  )
}
```

Table 14.3 of the book presents a breakdown of how the $BIC$ is computed for AR($p$) models of GDP growth with order $p=1,\dots,6$. The final result can easily be reproduced using `r ttcode("sapply()")` and the function `r ttcode("BIC()")` defined above.

```{r}
# apply the BIC() to an intercept-only model of GDP growth
BIC(dynlm(ts(GDPGR_level) ~ 1))

# loop BIC over models of different orders
order <- 1:6

BICs <- sapply(order, function(x) 
        "AR" = BIC(dynlm(ts(GDPGR_level) ~ L(ts(GDPGR_level), 1:x))))

BICs
```

Note that increasing the lag order increases $R^2$ because the $SSR$ decreases as additional lags are added to the model but according to the $BIC$, we should settle for the AR($2$) model instead of the AR($6$) model. It helps us to decide whether the decrease in $SSR$ is enough to justify adding an additional regressor. 

If we had to compare a bigger set of models, a convenient way to select the model with the lowest $BIC$ is using the function `r ttcode("which.min()")`.

```{r}
# select the AR model with the smallest BIC
BICs[, which.min(BICs[2, ])]
```

The $BIC$ may also be used to select lag lengths in time series regression models with multiple predictors. In a model with $K$ coefficients, including the intercept, we have
\begin{align*}
    BIC(K) = \log\left(\frac{SSR(K)}{T}\right) + K \frac{\log(T)}{T}.
\end{align*}
Notice that choosing the optimal model according to the $BIC$ can be computationally demanding because there may be many different combinations of lag lengths when there are multiple predictors. 

To give an example, we estimate ADL($p$,$q$) models of GDP growth where, as above, the additional variable is the term spread between short-term and long-term bonds. We impose the restriction that $p=q_1=\dots=q_k$ so that only $p_{max}$ models ($p=1,\dots,p_{max}$) need to be estimated. In the example below we choose $p_{max} = 12$. 

```{r}
# loop 'BIC()' over multiple ADL models 
order <- 1:12

BICs <- sapply(order, function(x) 
         BIC(dynlm(GDPGrowth_ts ~ L(GDPGrowth_ts, 1:x) + L(TSpread_ts, 1:x), 
                   start = c(1962, 1), end = c(2012, 4))))

BICs
```

From the definition of `r ttcode("BIC()")`, for ADL models with $p=q$ it follows that `r ttcode("p")` reports the number of estimated coefficients *excluding* the intercept. Thus the lag order is obtained by dividing `r ttcode("p")` by 2.

```{r}
# select the ADL model with the smallest BIC
BICs[, which.min(BICs[2, ])]
```

The $BIC$ is in favor of the ADL($2$,$2$) model \@ref(eq:gdpgradl22) we have estimated before.

## Nonstationarity I: Trends {#nit}

If a series is nonstationary, conventional hypothesis tests, confidence intervals and forecasts can be strongly misleading. The assumption of stationarity is violated if a series exhibits trends or breaks and the resulting complications in an econometric analysis depend on the specific type of the nonstationarity. This section focuses on time series that exhibit trends.

A series is said to exhibit a trend if it has a persistent long-term movement. One distinguishes between *deterministic* and *stochastic* trends.

+ A trend is *deterministic* if it is a nonrandom function of time.

+ A trend is said to be *stochastic* if it is a random function of time.

The figures we have produced in Chapter \@ref(tsdasc) reveal that many economic time series show a trending behavior that is probably best modeled by stochastic trends. This is why the book focuses on the treatment of stochastic trends.

#### The Random Walk Model of a Trend {-}

The simplest way to model a time series $Y_t$ that has stochastic trend is the *random walk*
\begin{align}
  Y_t = Y_{t-1} + u_t, (\#eq:randomwalk)
\end{align}
where the $u_t$ are i.i.d. errors with $E(u_t\vert Y_{t-1}, Y_{t-2}, \dots) = 0$. Note that
\begin{align*}
  E(Y_t\vert Y_{t-1}, Y_{t-2}\dots) =& \, E(Y_{t-1}\vert Y_{t-1}, Y_{t-2}\dots) + E(u_t\vert Y_{t-1}, Y_{t-2}\dots) \\
  =& \, Y_{t-1}
\end{align*}
so the best forecast for $Y_t$ is yesterday's observation $Y_{t-1}$. Hence the difference between $Y_t$ and $Y_{t-1}$ is unpredictable. The path followed by $Y_t$ consists of random steps $u_t$, hence it is called a random walk.

Assume that $Y_0$, the starting value of the random walk is $0$. Another way to write \@ref(eq:randomwalk) is
\begin{align*}
  Y_0 =& \, 0 \\
  Y_1 =& \, 0 + u_1 \\
  Y_2 =& \, 0 + u_1 + u_2 \\
  \vdots & \, \\
  Y_t =& \, \sum_{i=1}^t u_i.
\end{align*}
Therefore we have
\begin{align*}
  Var(Y_t) =& \, Var(u_1 + u_2 + \dots + u_t) \\
           =& \, t \sigma_u^2.
\end{align*}
Thus the variance of a random walk depends on $t$ which violates the assumption presented in Key Concept 14.5: a random walk is nonstationary.

Obviously, \@ref(eq:randomwalk) is a special case of an AR($1$) model where $\beta_1 = 1$. One can show that a time series that follows an AR($1$) model is stationary if $\lvert\beta_1\rvert < 1$. In a general AR($p$) model, stationarity is linked to the roots of the polynomial $$1-\beta_1 z - \beta_2 z^2 - \beta_3 z^3 - \dots - \beta_p z^p.$$ If all roots are greater than $1$ in absolute value, the AR($p$) series is stationary. If at least one root equals $1$, the AR($p$) is said to have a *unit root* and thus has a stochastic trend.

It is straightforward to simulate random walks in `r ttcode("R")` using `r ttcode("arima.sim()")`. The function `r ttcode("matplot()")` is convenient for simple plots of the columns of a matrix.


```{r, fig.align='center'}
# simulate and plot random walks starting at 0
set.seed(1)

RWs <- ts(replicate(n = 4, 
            arima.sim(model = list(order = c(0, 1 ,0)), n = 100)))

matplot(RWs, 
        type = "l", 
        col = c("steelblue", "darkgreen", "darkred", "orange"), 
        lty = 1, 
        lwd = 2,
        main = "Four Random Walks",
        xlab = "Time",
        ylab = "Value")
```


Adding a constant to \@ref(eq:randomwalk) yields
\begin{align}
  Y_t = \beta_0 + Y_{t-1} + u_t (\#eq:randomwalkdrift),
\end{align}
a *random walk model with a drift* which allows to model the tendency of a series to move upwards or downwards. If $\beta_0$ is positive, the series drifts upwards and it follows a downward trend if $\beta_0$ is negative.


```{r, fig.align='center'}
# simulate and plot random walks with drift
set.seed(1)

RWsd <- ts(replicate(n = 4, 
           arima.sim(model = list(order = c(0, 1, 0)), 
                     n = 100,
                     mean = -0.2)))

matplot(RWsd, 
        type = "l", 
        col = c("steelblue", "darkgreen", "darkred", "orange"), 
        lty = 1, 
        lwd = 2,
        main = "Four Random Walks with Drift",
        xlab = "Time",
        ylab = "Value")
```


#### Problems Caused by Stochastic Trends {-}

OLS estimation of the coefficients on regressors that have a stochastic trend is problematic because the distribution of the estimator and its $t$-statistic is non-normal, even asymptotically. This has various consequences:

+ **Downward bias of autoregressive coefficients**:
  
    If $Y_t$ is a random walk, $\beta_1$ can be consistently estimated by OLS but the estimator is biased toward zero. This bias is roughly $E(\widehat{\beta}_1) \approx 1 - 5.3/T$ which is substantial for sample sizes typically encountered in macroeconomics. This estimation bias causes forecasts of $Y_t$ to perform worse than a pure random walk model.
  
+ **Non-normally distributed $t$-statistics**:

    The nonnormal distribution of the estimated coefficient of a stochastic regressor translates to a nonnormal distribution of its $t$-statistic so that normal critical values are invalid and therefore usual confidence intervals and hypothesis tests are invalid, too, and the true distribution of the $t$-statistic cannot be readily determined. 
    
+ **Spurious Regression**:

    When two stochastically trending  time series are regressed onto each other, the estimated relationship may appear highly significant using conventional normal critical values although the series are unrelated. This is what econometricians call a *spurious* relationship.
    
As an example for spurious regression, consider again the green and the red random walks that we have simulated above. We know that there is no relationship between both series: they are generated independently of each other. 


```{r, fig.align='center'}
# plot spurious relationship
matplot(RWs[, c(2, 3)], 
        lty = 1,
        lwd = 2,
        type = "l",
        col = c("darkgreen", "darkred"),
        xlab = "Time",
        ylab = "",
        main = "A Spurious Relationship")    
```


Imagine we did not have this information and instead conjectured that the green series is useful for predicting the red series and thus end up estimating the ADL($0$,$1$) model
\begin{align*}
  Red_t = \beta_0 + \beta_1 Green_{t-1} + u_t.
\end{align*}
    
```{r}
# estimate spurious AR model
summary(dynlm(RWs[, 2] ~ L(RWs[, 3])))$coefficients
```

The result is obviously spurious: the coefficient on $Green_{t-1}$ is estimated to be about $1$ and the $p$-value of $1.14 \cdot 10^{-10}$ of the corresponding $t$-test indicates that the coefficient is highly significant while its true value is in fact zero.

As an empirical example, consider the U.S. unemployment rate and the Japanese industrial production. Both series show an upward trending behavior from the mid-1960s through the early 1980s.


```{r}
# plot U.S. unemployment rate & Japanese industrial production
plot(merge(as.zoo(USUnemp), as.zoo(JPIndProd)), 
     plot.type = "single", 
     col = c("darkred", "steelblue"),
     lwd = 2,
     xlab = "Date",
     ylab = "",
     main = "Spurious Regression: Macroeconomic Time series")

# add a legend
legend("topleft", 
       legend = c("USUnemp", "JPIndProd"),
       col = c("darkred", "steelblue"),
       lwd = c(2, 2))
```


```{r}
# estimate regression using data from 1962 to 1985
SR_Unemp1 <- dynlm(ts(USUnemp["1962::1985"]) ~ ts(JPIndProd["1962::1985"]))
coeftest(SR_Unemp1, vcov = sandwich)
```

A simple regression of the U.S. unemployment rate on Japanese industrial production using data from 1962 to 1985 yields
\begin{align}
  \widehat{U.S. UR}_t = -\underset{(1.12)}{2.37} + \underset{(0.29)}{2.22} \log(JapaneseIP_t). (\#eq:urjpip1)
\end{align}
This appears to be a significant relationship: the $t$-statistic of the coefficient on $\log(JapaneseIP_t)$ is bigger than 7.

```{r}
# Estimate regression using data from 1986 to 2012
SR_Unemp2 <- dynlm(ts(USUnemp["1986::2012"]) ~ ts(JPIndProd["1986::2012"]))
coeftest(SR_Unemp2, vcov = sandwich)
```

When estimating the same model, this time with data from 1986 to 2012, we obtain 
\begin{align}
  \widehat{U.S. UR}_t = \underset{(5.41)}{41.78} -\underset{(1.17)}{7.78} \log(JapaneseIP)_t (\#eq:urjpip2)
\end{align}
which surprisingly is quite different. \@ref(eq:urjpip1) indicates a moderate positive relationship, in contrast to the large negative coefficient in \@ref(eq:urjpip2). This phenomenon can be attributed to stochastic trends in the series: since there is no economic reasoning that relates both trends, both regressions may be spurious.

#### Testing for a Unit AR Root {-}

A formal test for a stochastic trend has been proposed by @dickey1979 which thus is termed the *Dickey-Fuller test*. As discussed above, a time series that follows an AR($1$) model with $\beta_1 = 1$ has a stochastic trend. Thus, the testing problem is
\begin{align*}
  H_0: \beta_1 = 1 \ \ \ \text{vs.} \ \ \ H_1: \lvert\beta_1\rvert < 1.
\end{align*}
The null hypothesis is that the AR($1$) model has a unit root and the alternative hypothesis is that it is stationary. One often rewrites the AR($1$) model by subtracting $Y_{t-1}$ on both sides:
\begin{align}
  Y_t = \beta_0 + \beta_1 Y_{t-1} + u_t \ \ \Leftrightarrow \ \ \Delta Y_t = \beta_0 + \delta Y_{t-1} + u_t (\#eq:dfmod)
\end{align}
where $\delta = \beta_1 - 1$. The testing problem then becomes
\begin{align*}
  H_0: \delta = 0 \ \ \ \text{vs.} \ \ \ H_1: \delta < 0
\end{align*}
which is convenient since the corresponding test statistic is reported by many relevant `r ttcode("R")` functions.^[The $t$-statistic of the Dickey-Fuller test is computed using homoskedasticity-only standard errors since under the null hypothesis, the usual $t$-statistic is robust to conditional heteroskedasticity.]

The Dickey-Fuller test can also be applied in an AR($p$) model. The *Augmented Dickey-Fuller (ADF) test* is summarized in Key Concept 14.8.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC14.8">
<h3 class = "right"> Key Concept 14.8 </h3>
<h3 class = "left"> The ADF Test for a Unit Root </h3>
<p>
Consider the regression
\\begin{align}
  \\Delta Y_t = \\beta_0 + \\delta Y_{t-1} + \\gamma_1 \\Delta_1 Y_{t-1} + \\gamma_2 \\Delta Y_{t-2} + \\dots + \\gamma_p \\Delta Y_{t-p} + u_t. (\\#eq:ADFreg1)
\\end{align}

The ADF test for a unit autoregressive root tests the hypothesis $H_0: \\delta = 0$ (stochastic trend) against the one-sided alternative $H_1: \\delta < 0$ (stationarity) using the usual OLS $t$-statistic.

If it is assumed that $Y_t$ is stationary around a deterministic linear time trend, the model is augmented by the regressor $t$:
\\begin{align}
  \\Delta Y_t = \\beta_0 + at + \\delta Y_{t-1} + \\gamma_1 \\Delta_1 Y_{t-1} + \\gamma_2 \\Delta Y_{t-2} + \\dots + \\gamma_p \\Delta Y_{t-p} + u_t,  (\\#eq:ADFreg2)
\\end{align}
where again $H_0: \\delta = 0$ is tested against $H_1: \\delta < 0$.

The optimal lag length $p$ can be estimated using information criteria. In \\@ref(eq:ADFreg1), $p=0$ (no lags of $\\Delta Y_t$ are used as regressors) corresponds to a simple AR($1$).

Under the null, the $t$-statistic corresponding to $H_0: \\delta = 0$ does not have a normal distribution. The critical values can only be obtained from simulation and differ for regressions \\@ref(eq:ADFreg1) and \\@ref(eq:ADFreg2) since the distribution of the ADF test statistic is sensitive to the deterministic components included in the regression.
</p>
</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[The ADF Test for a Unit Root]{14.8}
Consider the regression
\\begin{align}
  \\Delta Y_t = \\beta_0 + \\delta Y_{t-1} + \\gamma_1 \\Delta_1 Y_{t-1} + \\gamma_2 \\Delta Y_{t-2} + \\dots + \\gamma_p \\Delta Y_{t-p} + u_t. \\label{eq:ADFreg1}
\\end{align}

The ADF test for a unit autoregressive root tests the hypothesis $H_0: \\delta = 0$ (stochastic trend) against the one-sided alternative $H_1: \\delta < 0$ (stationarity) using the usual OLS $t$-statistic.\\newline

If it is assumed that $Y_t$ is stationary around a deterministic linear time trend, the model is augmented by the regressor $t$:
\\begin{align}
  \\Delta Y_t = \\beta_0 + at + \\delta Y_{t-1} + \\gamma_1 \\Delta_1 Y_{t-1} + \\gamma_2 \\Delta Y_{t-2} + \\dots + \\gamma_p \\Delta Y_{t-p} + u_t,  \\label{eq:ADFreg2}
\\end{align}
where again $H_0: \\delta = 0$ is tested against $H_1: \\delta < 0$.\\newline

The optimal lag length $p$ can be estimated using information criteria. In (\\ref{eq:ADFreg1}), $p=0$ (no lags of $\\Delta Y_t$ are used as regressors) corresponds to a simple AR($1$).\\newline

Under the null, the $t$-statistic corresponding to $H_0: \\delta = 0$ does not have a normal distribution. The critical values can only be obtained from simulation and differ for regressions (\\ref{eq:ADFreg1}) and (\\ref{eq:ADFreg2}) since the distribution of the ADF test statistic is sensitive to the deterministic components included in the regression.
\\end{keyconcepts}
')
```

#### Critical Values for the ADF Statistic {-}

Key Concept 14.8 states that the critical values for the ADF test in the regressions \@ref(eq:ADFreg1) and \@ref(eq:ADFreg2) can only be determined using simulation. The idea of the simulation study is to simulate a large number of ADF test statistics and use them to estimate quantiles of their *asymptotic* distribution. This section shows how this can be done using `r ttcode("R")`.

First, consider the following AR($1$) model with intercept
\begin{align*}
  Y_t =& \, \alpha + z_t, \ \ z_t = \rho z_{t-1} + u_t.
\end{align*}
This can be written as
\begin{align*}
  Y_t =& \, (1-\rho) \alpha + \rho y_{t-1} + u_t,
\end{align*}
i.e., $Y_t$ is a random walk without drift under the null $\rho = 1$. One can show that $Y_t$ is a stationary process with mean $\alpha$ for $\lvert\rho\rvert<1$.

The procedure for simulating critical values of a unit root test using the $t$-ratio of $\delta$ in \@ref(eq:dfmod) is as follows:

+ Simulate $N$ random walks with $n$ observations using the data generating process 
\begin{align*}
  Y_t =& \, a + z_t, \ \ z_t = \rho z_{t-1} + u_t,
\end{align*}
$t=1,\dots,n$ where $N$ and $n$ are large numbers, $a$ is a constant and $u$ is a zero mean error term. 

+ For each random walk, estimate the regression
\begin{align*}
  \Delta Y_t =& \, \beta_0 + \delta Y_{t-1} + u_t
\end{align*}
and compute the ADF test statistic. Save all $N$ test statistics.

+ Estimate quantiles of the distribution of the ADF test statistic using the $N$ test statistics obtained from the simulation. 

For the case with drift *and* linear time trend we replace the data generating process by
\begin{align}
  Y_t =& \, a + b \cdot t + z_t, \ \ z_t = \rho z_{t-1} + u_t (\#eq:rwdt)
\end{align}
where $b \cdot t$ is a linear time trend. $Y_t$ in \@ref(eq:rwdt) is a random walk with (without) drift if $b\neq0$ ($b=0$) under the null of $\rho=1$ (can you show this?). We estimate the regression
\begin{align*}
  \Delta Y_t =& \, \beta_0 + \alpha \cdot t + \delta Y_{t-1} + u_t.
\end{align*}

Loosely speaking, the precision of the estimated quantiles depends on two factors: $n$, the length of the underlying series and $N$, the number of test statistics used. Since we are interested in estimating quantiles of the *asymptotic* distribution (the Dickey-Fuller distribution) of the ADF test statistic both using many observations and large number of simulated test statistics will increase the precision of the estimated quantiles. We choose $n=N=1000$ as the computational burden grows quickly with $n$ and $N$.

```{r}
# repetitions
N <- 1000

# observations
n <- 1000

# define constant, trend and rho
drift <- 0.5
trend <- 1:n
rho <- 1

# function which simulates an AR(1) process
AR1 <- function(rho) {
  out <- numeric(n)
  for(i in 2:n) {
    out[i] <- rho * out[i-1] + rnorm(1)
  }
  return(out)
}

# simulate from DGP with constant 
RWD <- ts(replicate(n = N, drift + AR1(rho)))

# compute ADF test statistics and store them in 'ADFD'
ADFD <- numeric(N)

for(i in 1:ncol(RWD)) {
  ADFD[i] <- summary(
    dynlm(diff(RWD[, i], 1) ~ L(RWD[, i], 1)))$coef[2, 3]
}

# simulate from DGP with constant and trend
RWDT <- ts(replicate(n = N, drift + trend + AR1(rho)))

# compute ADF test statistics and store them in 'ADFDT'
ADFDT <- numeric(N)

for(i in 1:ncol(RWDT)) {
  ADFDT[i] <- summary(
    dynlm(diff(RWDT[, i], 1) ~ L(RWDT[, i], 1) + trend(RWDT[, i]))
  )$coef[2, 3]
}
```

```{r}
# estimate quantiles for ADF regression with a drift
round(quantile(ADFD, c(0.1, 0.05, 0.01)), 2)

# estimate quantiles for ADF regression with drift and trend
round(quantile(ADFDT, c(0.1, 0.05, 0.01)), 2)
```

The estimated quantiles are close to the large-sample critical values of the ADF test statistic reported in Table 14.4 of the book.


| Deterministic Regressors  | 10%    | 5%     | 1%    |
|:--------------------------|:-------|:-------|:------|
| Intercept only            | -2.57  | -2.86  | -3.43 |
| Intercept and time trend  | -3.12  | -3.41  | -3.96 |

Table: (\#tab:DFcrits) Large Sample Critical Values of ADF Test

The results show that using standard normal critical values is erroneous: the 5\% critical value of the standard normal distribution is $-1.64$. For the Dickey-Fuller distributions the estimated critical values are $-2.87$ (drift) and $-3.43$ (drift and linear time trend). This implies that a true null (the series has a stochastic trend) would be rejected far too often if inappropriate normal critical values were used.

We may use the simulated test statistics for a graphical comparison of the standard normal density and (estimates of) both Dickey-Fuller densities.


```{r, fig.align='center'}
# plot standard normal density
curve(dnorm(x), 
      from = -6, to = 3, 
      ylim = c(0, 0.6), 
      lty = 2,
      ylab = "Density",
      xlab = "t-Statistic",
      main = "Distributions of ADF Test Statistics",
      col = "darkred", 
      lwd = 2)

# plot density estimates of both Dickey-Fuller distributions
lines(density(ADFD), lwd = 2, col = "darkgreen")
lines(density(ADFDT), lwd = 2, col = "blue")

# add a legend
legend("topleft", 
       c("N(0,1)", "Drift", "Drift+Trend"),
       col = c("darkred", "darkgreen", "blue"),
       lty = c(2, 1, 1),
       lwd = 2)
```


The deviations from the standard normal distribution are significant: both Dickey-Fuller distributions are skewed to the left and have a heavier left tail than the standard normal distribution.

#### Does U.S. GDP Have a Unit Root? {-}

As an empirical example, we use the ADF test to assess whether there is a stochastic trend in U.S. GDP using the regression
\begin{align*}
  \Delta\log(GDP_t) = \beta_0 + \alpha t + \beta_1 \log(GDP_{t-1}) + \beta_2 \Delta \log(GDP_{t-1}) + \beta_3 \Delta \log(GDP_{t-2}) + u_t.
\end{align*}

```{r}
# generate log GDP series
LogGDP <- ts(log(GDP["1962::2012"]))

# estimate the model
coeftest(
  dynlm(diff(LogGDP) ~ trend(LogGDP, scale = F) + L(LogGDP) 
                     + diff(L(LogGDP)) + diff(L(LogGDP), 2)))
```

The estimation yields 
\begin{align*}
  \Delta\log(GDP_t) =& \underset{(0.118)}{0.28} + \underset{(0.0001)}{0.0002} t -\underset{(0.014)}{0.033} \log(GDP_{t-1}) \\
   & + \underset{(0.113)}{0.083} \Delta \log(GDP_{t-1}) + \underset{(0.071)}{0.188} \Delta \log(GDP_{t-2}) + u_t,
\end{align*}
so the ADF test statistic is $t=-0.033/0.014 = - 2.35$. The corresponding $5\%$ critical value from Table \@ref(tab:DFcrits) is $-3.41$ so we cannot reject the null hypothesis that $\log(GDP)$ has a stochastic trend in favor of the alternative that it is stationary around a deterministic linear time trend.

The ADF test can be done conveniently using `r ttcode("ur.df()")` from the package `r ttcode("urca")`.

```{r}
# test for unit root in GDP using 'ur.df()' from the package 'urca'
summary(ur.df(LogGDP, 
              type = "trend", 
              lags = 2, 
              selectlags = "Fixed"))
```

The first test statistic at the bottom of the output is the one we are interested in. The number of test statistics reported depends on the test regression. For `r ttcode('type = "trend"')`, the second statistics corresponds to the test that there is no unit root and no time trend while the third one corresponds to a test of the hypothesis that there is a unit root, no time trend and no drift term.

## Nonstationarity II: Breaks {#niib}

When there are discrete (at a distinct date) or gradual (over time) changes in the population regression coefficients, the series is nonstationary. These changes are called *breaks*. There is a variety of reasons why breaks can occur in macroeconomic time series but most often they are related to changes in economic policy or major changes in the structure of the economy. See Chapter 14.7 of the book for some examples.

If breaks are not accounted for in the regression model, OLS estimates will reflect the average relationship. Since these estimates might be strongly misleading and result in poor forecast quality, we are interested in testing for breaks. One distinguishes between testing for a break when the date is known and testing for a break with an unknown break date.

Let $\tau$ denote a known break date and let $D_t(\tau)$ be a binary variable indicating time periods before and after the break. Incorporating the break in an ADL($1$,$1$) regression model yields 
\begin{align*}
  Y_t =& \beta_0 + \beta_1 Y_{t-1} + \delta_1 X_{t-1} + \gamma_0 D_t(\tau) + \gamma_1\left[D_t(\tau) \cdot Y_{t-1}\right] \\ 
  &+ \, \gamma_2\left[ D_t(\tau) \cdot X_{t-1} \right] + u_t,
\end{align*}
where we allow for discrete changes in $\beta_0$, $\beta_1$ and $\beta_2$ at the break date $\tau$. The null hypothesis of no break, $$H_0: \gamma_0=\gamma_1=\gamma_2=0,$$ can be tested against the alternative that at least one of the $\gamma$'s is not zero using an $F$-Test. This idea is called a Chow test after Gregory @chow1960.

When the break date is unknown the *Quandt likelihood ratio* (QLR) *test* [@quandt1960] may be used. It is a modified version of the Chow test which uses the largest of all $F$-statistics obtained when applying the Chow test for all possible break dates in a predetermined range $\left[\tau_0,\tau_1\right]$. The QLR test is summarized in Key Concept 14.9.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC14.9">
<h3 class = "right"> Key Concept 14.9 </h3>
<h3 class = "left"> The QLR Test for Coefficient Stability </h3>
<p>
The QLR test can be used to test for a break in the population regression function if the date of the break is unknown. The QLR test statistic is the largest (Chow) $F(\\tau)$ statistic computed over a range of eligible break dates  $\\tau_0 \\leq \\tau \\leq \\tau_1$:
\\begin{align}
  QLR = \\max\\left[F(\\tau_0),F(\\tau_0 +1),\\dots,F(\\tau_1)\\right]. (\\#eq:QLRstatistic)
\\end{align}
</p>
The most important properties are:

+ The QLR test can be applied to test whether a subset of the coefficients in the population regression function breaks but the test also rejects if there is a slow evolution of the regression function. 

+ When there is a single discrete break in the population regression function that lying at a date within the range tested, the $QLR$ test statistic is $F(\\widehat{\\tau})$ and $\\widehat{\\tau}/T$ is a consistent estimator of fraction of the sample at which the break is.

+ The large-sample distribution of $QLR$ depends on $q$, the number of restrictions being tested and both ratios of end points to the sample size, $\\tau_0/T, \\tau_1/T$. 

+ Similar to the ADF test, the large-sample distribution of $QLR$ is nonstandard. Critical values are presented in Table 14.5 of the book.

</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[The QLR Test for Coefficient Stability]{14.9}
The QLR test can be used to test for a break in the population regression function if the date of the break is unknown. The QLR test statistic is the largest (Chow) $F(\\tau)$ statistic computed over a range of eligible break dates  $\\tau_0 \\leq \\tau \\leq \\tau_1$:
\\begin{align}
  QLR = \\max\\left[F(\\tau_0),F(\\tau_0 +1),\\dots,F(\\tau_1)\\right]. \\label{eq:QLRstatistic}
\\end{align}\\vspace{0.5cm}
The most important properties are:
\\begin{itemize}
\\item The QLR test can be applied to test whether a subset of the coefficients in the population regression function breaks but the test also rejects if there is a slow evolution of the regression function. 
When there is a single discrete break in the population regression function that lying at a date within the range tested, the $QLR$ test statistic is $F(\\widehat{\\tau})$ and $\\widehat{\\tau}/T$ is a consistent estimator of fraction of the sample at which the break is.
\\item The large-sample distribution of $QLR$ depends on $q$, the number of restrictions being tested and both ratios of end points to the sample size, $\\tau_0/T, \\tau_1/T$. 
\\item Similar to the ADF test, the large-sample distribution of $QLR$ is nonstandard. Critical values are presented in Table 14.5 of the book.
\\end{itemize}
\\end{keyconcepts}
')
```

#### Has the Predictive Power of the term spread been stable? {-}

Using the QLR statistic we may test whether there is a break in the coefficients on the lags of the term spread in \@ref(eq:gdpgradl22), the ADL($2$,$2$) regression model of GDP growth. Following Key Concept 14.9 we modify the specification of \@ref(eq:gdpgradl22) by adding a break dummy $D(\tau)$ and its interactions with both lags of term spread and choose the range of break points to be tested as 1970:Q1 - 2005:Q2 (these periods are the center 70\% of the sample data from 1962:Q2 - 2012:Q4). Thus, the model becomes
\begin{align*}
    GDPGR_t =&\, \beta_0 + \beta_1 GDPGR_{t-1} + \beta_2 GDPGR_{t-2} \\
            &+\,  \beta_3  TSpread_{t-1} + \beta_4 TSpread_{t-2} \\
            &+\, \gamma_1 D(\tau) + \gamma_2 (D(\tau) \cdot TSpread_{t-1}) \\
            &+\, \gamma_3 (D(\tau) \cdot TSpread_{t-2}) \\
            &+\, u_t.
\end{align*}

Next, we estimate the model for each break point and compute the $F$-statistic corresponding to the null hypothesis $H_0: \gamma_1=\gamma_2=\gamma_3=0$. The $QLR$-statistic is the largest of the $F$-statistics obtained in this manner. 

```{r}
# set up a range of possible break dates
tau <- seq(1970, 2005, 0.25)

# initialize vector of F-statistics
Fstats <- numeric(length(tau))

# estimation loop over break dates
for(i in 1:length(tau)) {

  # set up dummy variable
  D <- time(GDPGrowth_ts) > tau[i]

  # estimate ADL(2,2) model with intercations
  test <- dynlm(GDPGrowth_ts ~ L(GDPGrowth_ts) + L(GDPGrowth_ts, 2) + 
                D*L(TSpread_ts) + D*L(TSpread_ts, 2),
                start = c(1962, 1), 
                end = c(2012, 4))
  
  # compute and save the F-statistic
  Fstats[i] <- linearHypothesis(test, 
                                c("DTRUE=0", "DTRUE:L(TSpread_ts)", 
                                  "DTRUE:L(TSpread_ts, 2)"),
                                vcov. = sandwich)$F[2]

}
```

We determine the $QLR$ statistic using `r ttcode("max()")`.

```{r}
# identify QLR statistic
QLR <- max(Fstats)
QLR
```

Let us check that the $QLR$-statistic is the $F$-statistic obtained for the regression where 1980:Q4 is chosen as the break date.

```{r}
# identify the time period where the QLR-statistic is observed
as.yearqtr(tau[which.max(Fstats)])
```

Since $q=3$ hypotheses are tested and the central $70\%$ of the sample are considered to contain breaks, the corresponding $1\%$ critical value of the $QLR$ test is $6.02$. We reject the null hypothesis that all coefficients (the coefficients on both lags of term spread and the intercept) are stable since the computed $QLR$-statistic exceeds this threshold. Thus evidence from the $QLR$ test suggests that there is a break in the ADL($2$,$2$) model of GDP growth in the early 1980s.

To reproduce Figure 14.5 of the book, we convert the vector of sequential break-point $F$-statistics into a time series object and then generate a simple plot with some annotations.


```{r, fig.align='center'}
# series of F-statistics
Fstatsseries <- ts(Fstats, 
                   start = tau[1], 
                   end = tau[length(tau)], 
                   frequency = 4)

# plot the F-statistics 
plot(Fstatsseries, 
     xlim = c(1960, 2015),
     ylim = c(1, 7.5),
     lwd = 2,
     col = "steelblue",
     ylab = "F-Statistic",
     xlab = "Break Date",
     main = "Testing for a Break in GDP ADL(2,2) Regression at Different Dates")

# dashed horizontal lines for critical values and QLR statistic
abline(h = 4.71, lty = 2)
abline(h = 6.02, lty = 2)
segments(0, QLR, 1980.75, QLR, col = "darkred")
text(2010, 6.2, "1% Critical Value")
text(2010, 4.9, "5% Critical Value")
text(1980.75, QLR+0.2, "QLR Statistic")
```


#### Pseudo Out-of-Sample Forecasting {-}

Pseudo out-of-sample forecasts are used to simulate the out-of-sample performance (the real time forecast performance) of a time series regression model. In particular, pseudo out-of-sample forecasts allow estimation of the $RMSFE$ of the model and enable researchers to compare different model specifications with respect to their predictive power. Key Concept 14.10 summarizes this idea.

```{r, eval = my_output == "html", results='asis', echo=F, purl=F}
cat('
<div class = "keyconcept" id="KC14.10">
<h3 class = "right"> Key Concept 14.10 </h3>
<h3 class = "left"> Pseudo Out-of-Sample Forecasting </h3>
1. Divide the sample data into $s=T-P$ and $P$ subsequent observations. The $P$ observations are used as pseudo-out-of-sample observations.

2. Estimate the model using the first $s$ observations. 

3. Compute the pseudo-forecast $\\overset{\\sim}{Y}_{s+1\\vert s}$.

4. Compute the pseudo-forecast-error $\\overset{\\sim}{u}_{s+1} = Y_{s+1} - \\overset{\\sim}{Y}_{s+1\\vert s}$.

5. Repeat steps 2 trough 4 for all remaining pseudo-out-of-sample dates.
</div>
')
```

```{r, eval = my_output == "latex", results='asis', echo=F, purl=F}
cat('\\begin{keyconcepts}[Pseudo Out-of-Sample Forecasting]{14.10}
\\begin{enumerate}
\\item Divide the sample data into $s=T-P$ and $P$ subsequent observations. The $P$ observations are used as pseudo-out-of-sample observations.
\\item Estimate the model using the first $s$ observations. 
\\item Compute the pseudo-forecast $\\overset{\\sim}{Y}_{s+1\\vert s}$.
\\item Compute the pseudo-forecast-error $\\overset{\\sim}{u}_{s+1} = Y_{s+1} - \\overset{\\sim}{Y}_{s+1\\vert s}$.
\\item Repeat steps 2 trough 4 for all remaining pseudo-out-of-sample dates, i.e., reestimate the model at each date.
\\end{enumerate}
\\end{keyconcepts}
')
```

#### Did the Predictive Power of the Term Spread Change During the 2000s? {-}

The insight gained in the previous section gives reason to presume that the pseudo-out-of-sample performance of ADL($2$,$2$) models estimated using data *after* the break in the early 1980s should not deteriorate relative to using the whole sample: provided that the coefficients of the population regression function are stable after the potential break in 1980:Q4, these models should have good predictive power. We check this by computing pseudo-out-of-sample forecasts for the period 2003:Q1 - 2012:Q4, a range covering 40 periods, where the forecast for 2003:Q1 is done using data from 1981:Q1 - 2002:Q4, the forecast for 2003:Q2 is based on data from 1981:Q1 - 2003:Q1 and so on.

Similarly as for the $QLR$-test we use a `r ttcode("for()")` loop for estimation of all 40 models and gather their $SER$s and the obtained forecasts in a vector which is then used to compute pseudo-out-of-sample forecast errors.

```{r}
# end of sample dates
EndOfSample <- seq(2002.75, 2012.5, 0.25)

# initialize matrix forecasts
forecasts <- matrix(nrow = 1, 
                    ncol = length(EndOfSample))

# initialize vector SER
SER  <- numeric(length(EndOfSample))

# estimation loop over end of sample dates
for(i in 1:length(EndOfSample)) {

  # estimate ADL(2,2) model
  m <- dynlm(GDPGrowth_ts ~ L(GDPGrowth_ts) + L(GDPGrowth_ts, 2) 
                          + L(TSpread_ts) + L(TSpread_ts, 2), 
                start = c(1981, 1), 
                end = EndOfSample[i])
  
  SER[i] <- summary(m)$sigma
  
  # sample data for one-period ahead forecast
  s <- window(ADLdata, EndOfSample[i] - 0.25, EndOfSample[i])
  
  # compute forecast
  forecasts[i] <- coef(m) %*% c(1, s[1, 1], s[2, 1], s[1, 2], s[2, 2]) 
}
```

```{r}
# compute psuedo-out-of-sample forecast errors
POOSFCE <- c(window(GDPGrowth_ts, c(2003, 1), c(2012, 4))) - forecasts
```

We next translate the pseudo-out-of-sample forecasts into an object of class `r ttcode("ts")` and plot the real GDP growth rate against the forecasted series.


```{r, fig.align='center'}
# series of pseudo-out-of-sample forecasts
PSOSSFc <- ts(c(forecasts), 
              start = 2003, 
              end = 2012.75, 
              frequency = 4)

# plot the GDP growth time series
plot(window(GDPGrowth_ts, c(2003, 1), c(2012, 4)),
     col = "steelblue",
     lwd = 2,
     ylab = "Percent",
     main = "Pseudo-Out-Of-Sample Forecasts of GDP Growth")

# add the series of pseudo-out-of-sample forecasts
lines(PSOSSFc, 
      lwd = 2, 
      lty = 2)

# shade area between curves (the pseudo forecast error)
polygon(c(time(PSOSSFc), rev(time(PSOSSFc))), 
        c(window(GDPGrowth_ts, c(2003, 1), c(2012, 4)), rev(PSOSSFc)),
        col = alpha("blue", alpha = 0.3),
        border = NA)

# add a legend
legend("bottomleft", 
       lty = c(1, 2, 1),
       lwd = c(2, 2, 10),
       col = c("steelblue", "black", alpha("blue", alpha = 0.3)), 
       legend = c("Actual GDP growth rate",
         "Forecasted GDP growth rate",
         "Pseudo forecast Error"))
```


Apparently, the pseudo forecasts track the actual GDP growth rate quite well, except for the kink in 2009 which can be attributed to the recent financial crisis.

The $SER$ of the first model (estimated using data from 1981:Q1 to 2002:Q4) is $2.39$ so based on the in-sample fit we would expect the out of sample forecast errors to have mean zero and a root mean squared forecast error of about $2.39$.

```{r}
# SER of ADL(2,2) mode using data from 1981:Q1 - 2002:Q4
SER[1]
```

The root mean squared forecast error of the pseudo-out-of-sample forecasts is somewhat larger.

```{r}
# compute root mean squared forecast error
sd(POOSFCE)
```

An interesting hypothesis is whether the mean forecast error is zero, that is the ADL($2$,$2$) forecasts are right, on average. This hypothesis is easily tested using the function `r ttcode("t.test()")`.

```{r}
# test if mean forecast error is zero
t.test(POOSFCE)
```

The hypothesis cannot be rejected at the $10\%$ significance level. Altogether the analysis suggests that the ADL($2$,$2$) model coefficients have been stable since the presumed break in the early 1980s.

## Can You Beat the Market? (Part II)

The dividend yield (the ratio of current dividends to the stock price) can be considered as an indicator of future dividends: if a stock has a high current dividend yield, it can be considered undervalued and it can be presumed that the price of the stock goes up in the future, meaning that future excess returns go up.

This presumption can be examined using ADL models of excess returns, where lags of the logarithm of the stock's dividend yield serve as additional regressors. 

Unfortunately, a graphical inspection of the time series of the logarithm of the dividend yield casts doubt on the assumption that the series is stationary which, as has been discussed in Chapter \@ref(nit), is necessary to conduct standard inference in a regression analysis.


```{r, fig.align='center'}
# plot logarithm of dividend yield series
plot(StockReturns[, 2], 
     col = "steelblue", 
     lwd = 2, 
     ylab = "Logarithm", 
     main = "Dividend Yield for CRSP Index")
```


The Dickey-Fuller test statistic for an autoregressive unit root in an AR($1$) model with drift provides further evidence that the series might be nonstationary.

```{r}
# test for unit root in GDP using 'ur.df()' from the package 'urca'
summary(ur.df(window(StockReturns[, 2], 
                     c(1960,1), 
                     c(2002, 12)), 
              type = "drift", 
              lags = 0))
```

We use `r ttcode("window()")` to get observations from January 1960 to December 2012 only.

Since the $t$-value for the coefficient on the lagged logarithm of the dividend yield is $-1.27$, the hypothesis that the true coefficient is zero cannot be rejected, even at the $10\%$ significance level. 

However, it is possible to examine whether the dividend yield has predictive power for excess returns by using its differences in an ADL($1$,$1$) and an ADL($2$,$2$) model (remember that differencing a series with a unit root yields a stationary series), although these model specifications do not correspond to the economic reasoning mentioned above. Thus, we also estimate an ADL($1$,$1$) regression using the level of the logarithm of the dividend yield.

That is we estimate three different specifications:

\begin{align*}
  excess \, returns_t =& \, \beta_0 + \beta_1 excess \, returns_{t-1} + \beta_3 \Delta \log(dividend yield_{t-1}) + u_t \\
  \\
  excess \, returns_t =& \, \beta_0 + \beta_1 excess \, returns_{t-1} + \beta_2 excess \, returns_{t-2} \\ &+ \, \beta_3 \Delta \log(dividend yield_{t-1}) + \beta_4 \Delta \log(dividend yield_{t-2}) + u_t \\
  \\
  excess \, returns_t =& \, \beta_0 + \beta_1 excess \, returns_{t-1} + \beta_5 \log(dividend yield_{t-1}) + u_t \\
\end{align*}

```{r}
# ADL(1,1) (1st difference of log dividend yield)
CRSP_ADL_1 <- dynlm(ExReturn ~ L(ExReturn) + d(L(ln_DivYield)), 
                    data = StockReturns,
                    start = c(1960, 1), end = c(2002, 12))

# ADL(2,2) (1st & 2nd differences of log dividend yield)
CRSP_ADL_2 <- dynlm(ExReturn ~ L(ExReturn) + L(ExReturn, 2) 
                    + d(L(ln_DivYield)) + d(L(ln_DivYield, 2)), 
                    data = StockReturns,
                    start = c(1960, 1), end = c(2002, 12))

# ADL(1,1) (level of log dividend yield)
CRSP_ADL_3 <- dynlm(ExReturn ~ L(ExReturn) + L(ln_DivYield),
                    data = StockReturns,
                    start = c(1960, 1), end = c(1992, 12))
```

```{r}
# gather robust standard errors
rob_se_CRSP_ADL <- list(sqrt(diag(sandwich(CRSP_ADL_1))),
                        sqrt(diag(sandwich(CRSP_ADL_2))),
                        sqrt(diag(sandwich(CRSP_ADL_3))))
```

A tabular representation of the results can then be generated using `r ttcode("stargazer()")`. 

```{r, message=F, warning=F, results='asis', eval=F}
stargazer(CRSP_ADL_1, CRSP_ADL_2, CRSP_ADL_3,
  title = "ADL Models of Monthly Excess Stock Returns",
  header = FALSE, 
  type = "latex",
  column.sep.width = "-5pt",
  no.space = T,
  digits = 3, 
  column.labels = c("ADL(1,1)", "ADL(2,2)", "ADL(1,1)"),
  dep.var.caption  = "Dependent Variable: Excess returns on the CSRP value-weighted index",
  dep.var.labels.include = FALSE,
  covariate.labels = c("$excess return_{t-1}$", 
                       "$excess return_{t-2}$", 
                       "$1^{st} diff log(dividend yield_{t-1})$", 
                       "$1^{st} diff log(dividend yield_{t-2})$", 
                       "$log(dividend yield_{t-1})$", 
                       "Constant"),
  se = rob_se_CRSP_ADL) 
```

<!--html_preserve-->

```{r, message=F, warning=F, results='asis', echo=F, purl=F, eval=my_output=="html"}
stargazer(CRSP_ADL_1, CRSP_ADL_2, CRSP_ADL_3,
  header = FALSE, 
  type = "html",
  digits = 3, 
  column.labels = c("ADL(1,1)", "ADL(2,2)", "ADL(1,1)"),
  dep.var.caption  = "Dependent Variable: Excess Returns on the CSRP Value-Weighted Index",
  dep.var.labels.include = FALSE,
  covariate.labels = c("$excess return_{t-1}$", "$excess return_{t-2}$", "$1^{st} diff log(dividend yield_{t-1})$", "$1^{st} diff log(dividend yield_{t-2})$", "$log(dividend yield_{t-1})$", "Constant"),
  se = rob_se_CRSP_ADL,
  no.space = T
  )

stargazer_html_title("ADL Models of Monthly Excess Stock Returns", "adlmomesr")
```

<!--/html_preserve-->

```{r, message=F, warning=F, results='asis', echo=F, purl=F, eval=my_output=="latex"}
stargazer(CRSP_ADL_1, CRSP_ADL_2, CRSP_ADL_3,
  title = "\\label{tab:adlmomesr} ADL Models of Monthly Excess Stock Returns",
  header = FALSE, 
  type = "latex",
  column.sep.width = "-5pt",
  no.space = T,
  digits = 3, 
  column.labels = c("ADL(1,1)", "ADL(2,2)", "ADL(1,1)"),
  dep.var.caption  = "Dependent Variable: Excess returns on the CSRP Value-Weighted Index",
  dep.var.labels.include = FALSE,
  covariate.labels = c("$excess return_{t-1}$", "$excess return_{t-2}$", "$1^{st} diff log(dividend yield_{t-1})$", "$1^{st} diff log(dividend yield_{t-2})$", "$log(dividend yield_{t-1})$", "Constant"),
  se = rob_se_CRSP_ADL
  ) 
```

For models (1) and (2) none of the individual $t$-statistics suggest that the coefficients are different from zero. Also, we cannot reject the hypothesis that none of the lags have predictive power for excess returns at any common level of significance (an $F$-test that the lags have predictive power does not reject for both models). 

Things are different for model (3). The coefficient on the level of the logarithm of the dividend yield is different from zero at the $5\%$ level and the $F$-test rejects, too. But we should be suspicious: the high degree of persistence in the dividend yield series probably renders this inference dubious because $t$- and $F$-statistics may follow distributions that deviate considerably from their theoretical large-sample distributions such that the usual critical values cannot be applied.

If model (3) were of use for predicting excess returns, pseudo-out-of-sample forecasts based on (3) should at least outperform forecasts of an intercept-only model in terms of the sample RMSFE. We can perform this type of comparison using `r ttcode("R")` code in the fashion of the applications of Chapter \@ref(niib).

```{r, cache=T}
# end of sample dates
EndOfSample <- as.numeric(window(time(StockReturns), c(1992, 12), c(2002, 11)))

# initialize matrix  forecasts
forecasts <- matrix(nrow = 2, 
                    ncol = length(EndOfSample))

# estimation loop over end of sample dates
for(i in 1:length(EndOfSample)) {

  # estimate model (3)
  mod3 <- dynlm(ExReturn ~ L(ExReturn) + L(ln_DivYield), data = StockReturns, 
                start = c(1960, 1), 
                end = EndOfSample[i])
  
  # estimate intercept only model
  modconst <- dynlm(ExReturn ~ 1, data = StockReturns, 
                start = c(1960, 1), 
                end = EndOfSample[i])
  
  # sample data for one-period ahead forecast
  t <- window(StockReturns, EndOfSample[i], EndOfSample[i])
  
  # compute forecast
  forecasts[, i] <- c(coef(mod3) %*% c(1, t[1], t[2]), coef(modconst))
                     
}
```

```{r}
# gather data
d <- cbind("Excess Returns" = c(window(StockReturns[,1], c(1993, 1), c(2002, 12))),
           "Model (3)" = forecasts[1,], 
           "Intercept Only" = forecasts[2,], 
           "Always Zero" =  0)

# Compute RMSFEs
c("ADL model (3)" = sd(d[, 1] - d[, 2]),
  "Intercept-only model" = sd(d[, 1] - d[, 3]),
  "Always zero" = sd(d[,1] - d[, 4]))
```

The comparison indicates that model (3) is not useful since it is outperformed in terms of sample RMSFE by the intercept-only model. A model forecasting excess returns always to be zero has an even lower sample RMSFE. This finding is consistent with the weak-form efficiency hypothesis which states that all publicly available information is accounted for in stock prices such that there is no way to predict future stock prices or excess returns using past observations, implying that the perceived significant relationship indicated by model (3) is wrong.

#### Summary {-}

This chapter dealt with introductory topics in time series regression analysis, where variables are generally correlated from one observation to the next, a concept termed serial correlation. We presented several ways of storing and plotting time series data using `r ttcode("R")` and used these for informal analysis of economic data.

We have introduced AR and ADL models and applied them in the context of forecasting of macroeconomic and financial time series using `r ttcode("R")`. The discussion also included the topic of lag length selection. It was shown how to set up a simple function that computes the BIC for a model object supplied.

We have also seen how to write simple `r ttcode("R")` code for performing and evaluating forecasts and demonstrated some more sophisticated approaches to conduct pseudo-out-of-sample forecasts for assessment of a model's predictive power for unobserved future outcomes of a series, to check model stability and to compare different models.

Furthermore, some more technical aspects like the concept of stationarity were addressed. This included applications to testing for an autoregressive unit root with the Dickey-Fuller test and the detection of a break in the population regression function using the $QLR$ statistic. For both methods, the distribution of the relevant test statistic is non-normal, even in large samples. Concerning the Dickey-Fuller test we have used `r ttcode("R")`'s random number generation facilities to produce evidence for this by means of a Monte-Carlo simulation and motivated usage of the quantiles tabulated in the book.

Also, empirical studies regarding the validity of the weak and the strong form efficiency hypothesis which are presented in the applications *Can You Beat the Market? Part I & II* in the book have been reproduced using `r ttcode("R")`.

In all applications of this chapter, the focus was on forecasting future outcomes rather than estimation of causal relationships between time series variables. However, the methods needed for the latter are quite similar. Chapter \@ref(eodce) is devoted to estimation of so called *dynamic causal effects*.