-
Notifications
You must be signed in to change notification settings - Fork 5
/
Forecasting 3-7 - ARMA Covariates.Rmd
108 lines (84 loc) · 4.52 KB
/
Forecasting 3-7 - ARMA Covariates.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
output:
xaringan::moon_reader:
css: "my-theme.css"
lib_dir: libs
nature:
highlightStyle: github
highlightLines: true
---
layout: true
.hheader[<a href="index.html">`r fontawesome::fa("home", fill = "steelblue")`</a>]
---
```{r setup, include=FALSE, message=FALSE}
options(htmltools.dir.version = FALSE, servr.daemon = TRUE)
knitr::opts_chunk$set(fig.height=5, fig.align="center")
library(huxtable)
```
class: center, middle, inverse
# ARIMA Models
## Covariates
.futnote[Eli Holmes, UW SAFS]
.citation[[email protected]]
---
```{r load_packages, echo=FALSE, message=FALSE, warning=FALSE}
library(ggplot2)
library(gridExtra)
library(reshape2)
library(tseries)
library(forecast)
```
## Load in the data and covariates
We will load in the covariates from the NOAA ERDDAP server:
```{r, echo=FALSE, eval=FALSE}
library(RCurl)
library(XML)
library(stringr)
lat <- c(39,39,40)
lon <- c(24,25,25)
covs <- list()
for(i in 1:3){
loc <- paste0("[(",lat[i],".5):1:(",lat[i],".5)][(",lon[i],".5):1:(",lon[i],".5)]")
url <- paste0("https://coastwatch.pfeg.noaa.gov/erddap/griddap/esrlIcoads1ge.htmlTable?air[(1964-01-01):1:(2018-08-01T00:00:00Z)]",loc,",slp[(1964-01-01):1:(2018-08-01T00:00:00Z)]",loc,",sst[(1964-01-01):1:(2018-08-01T00:00:00Z)]",loc,",vwnd[(1964-01-01):1:(2018-08-01T00:00:00Z)]",loc,",wspd3[(1964-01-01):1:(2018-08-01T00:00:00Z)]",loc)
doc <- getURL(url)
cov <- readHTMLTable(doc, which=2, stringsAsFactors=FALSE)
coln <- paste0(colnames(cov),".",cov[1,])
coln <- str_replace(coln, "\n", "")
coln <- str_replace_all(coln, "[*]", "")
cov <- cov[-1,]
colnames(cov) <- coln
cov[,1] <- as.Date(cov[,1])
for(j in 2:dim(cov)[2]) cov[,j] <- as.numeric(cov[,j])
covs[[i]] <- cov
}
covsmean <- covs[[1]]
for(j in 2:dim(cov)[2])
covsmean[,j] <- apply(cbind(covs[[1]][,j], covs[[2]][,j], covs[[3]][,j]),1,mean,na.rm=TRUE)
covsmean <- covsmean[,c(-2,-3)]
covsmean$Year <- as.factor(format(cov[,1],"%Y"))
covsmean.mon <- covsmean
covsmean.year <- data.frame(Year=unique(covsmean$Year))
for(j in 2:(dim(covsmean)[2]-1)) covsmean.year <- cbind(covsmean.year, tapply(covsmean[,j], covsmean$Year, mean, na.rm=TRUE))
colnames(covsmean.year) <- c("Year",colnames(covsmean)[2:(dim(covsmean)[2]-1)])
save(landings, covs, covsmean.mon, covsmean.year, file="landings.RData")
```
```{r load_data2, message=FALSE, warning=FALSE, echo=FALSE}
load("landings.RData")
```
## Multivariate linear regression with ARMA errors
The `stats::arima()` and `forecast::auto.arima()` functions with argument `xreg` fit a multivariate linear regression with ARMA errors. Note, this is not what is termed a ARMAX model. ARMAX models will be addressed separately.
The model fitted when `xreg` is passed in is:
\begin{equation}
\begin{gathered}
x_t = \alpha + \phi_1 c_{t,1} + \phi_2 c_{t,2} + \dots + z_t \\
z_t = \beta_1 z_{t-1} + \dots + \beta_p z_{t-p} + e_t + \theta_1 e_{t-1} + \dots + \theta_q e_{t-q}\\
e_t \sim N(0,\sigma)
\end{gathered}
\end{equation}
where `xreg` is matrix with $c_{t,1}$ in column 1, $c_{t-2}$ in column 2, etc. $z_t$ are the ARMA errors.
## Multivariate regression of first or second differences
In the multivariate regression with ARMA errors, the response variable $x_t$ is not necessarily stationary since the covariates $c_t$'s need not be stationary. If we wish to model the first or second differences of $x_t$, then we are potentially modeling a stationary process if differencing leads to a stationary process.
We need to think carefully about how we set up a multivariate regression if our response variable is stationary.
One recommendation is if $x_t$ is differenced, the same differencing is applied to the covariates. The idea is if the response variable is stationary, we want to make sure that the independent variables are also stationary. However, in a fisheries application $x_t - x_{t-1}$ often has a biological meaning, the yearly (or monthly or hourly) rate of change, and that rate of change is what one is trying explain with a covariate. One would not necessarily expect the first difference to be stationary and one is trying to explain any trend in the one-step rate of change with some set of covariates. On the other hand, if the response variable, the raw data or the first or second difference, is stationary then trying to explain its variability via a non-stationary covariate will clearly lead to the effect size of the covariates being zero. We don't need to fit a model to tell us that.
$x_t - x_{t-1}$.
If $x_t-x_{t-1}$ is the response variable, see section below as there are some subtleties in fitting these types of models.