-
Notifications
You must be signed in to change notification settings - Fork 0
/
TV Regression Lab 2.Rmd
executable file
·150 lines (110 loc) · 3.79 KB
/
TV Regression Lab 2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
---
title: Time Varying Regression Lab 2
output:
html_document:
toc: true
toc_float: true
---
# Load the data
```{r load_data, fig.align = "center", fig.height = 4, fig.width = 8}
load("landings.RData")
landings$log.metric.tons = log(landings$metric.tons)
landings = subset(landings, Year <= 1987)
```
Add `t` to the data. This is Year minus first Year.
```{r}
landings$t = landings$Year-landings$Year[1]
```
We will look at sardines and a 4th order polynomial. h is the forecast length. h=5 is 5 years.
```{r}
val <- "Sardine"
poly.order <- 4
h <- 5
```
# Fit linear regression
Use the poly function
$$log(Sardine catch) = \alpha + \beta t + \beta_2 t^2 + \beta_3 t^3 + \beta_4 t^4 + e_t$$
```{r tvreg.sardine3}
model <- lm(log.metric.tons ~ poly(t, poly.order), data=landings, subset=Species==val)
```
# Create a forecast
Use the `predict()` function. First you must set up the newdata data.frame. This is specific to `predict()` function. Let's predict for the next 10 t after the end of the data.
```{r}
dat <- subset(landings, Species==val)
dat$t
```
The newdata data.frame must have the name the same as your predictor `t` and have the values at the `t` you want to forecast.
```{r}
nd <- data.frame(t=25 + 1:h)
```
Now you can predict:
```{r}
predict(model, newdata = nd)
```
That is for years
```{r}
max(dat$Year) + 1:h
```
# Show the prediction intervals on your forecast
You can use predict for this too. `lwr` is the lower 95% prediction and upr is the upper 95% prediction.
```{r}
predict(model, newdata = nd, interval="prediction")
```
# Make a nicer table of predictions
Show in metric tons instead of log metric tons.
```{r}
pr <- predict(model, newdata = nd, interval="prediction")
pr <- as.data.frame(exp(pr))
pr$Year <- max(dat$Year) + 1:h
```
Make table in R markdown
```{r}
library(knitr)
kable(pr)
```
# Create plots of forecasts
```{r}
pr <- predict(model, newdata = nd, interval="prediction")
pr <- as.data.frame(exp(pr))
ylims <- c(min(dat$metric.tons, pr$fit), max(dat$metric.tons, pr$fit))
pr$Year <- max(dat$Year) + 1:h
plot(dat$Year, dat$metric.tons, xlim=c(min(dat$Year), max(dat$Year)+h),
ylim=ylims)
lines(dat$Year, exp(fitted(model)), col="red")
lines(pr$Year, pr$fit)
```
Here is a function I created to make forecast plots using ggplot().
```{r plot.TVreg.func, echo=FALSE}
plotforecasttv <- function(object, h=5){
require(ggplot2)
dat <- object$model
dat <- as.matrix(dat)
dat <- data.frame(resp=dat[,1], t=0:(dim(dat)[1]-1), fit=fitted(model))
tlim <- max(dat$t)+1:h
pr95 <- predict(model, newdata = data.frame(t=max(dat$t)+1:h), level=0.95, interval="prediction")
pr80 <- predict(model, newdata = data.frame(t=max(dat$t)+1:h), level=0.80, interval="prediction")
pr95 <- as.data.frame(pr95); pr95$t <- tlim
pr80 <- as.data.frame(pr80); pr80$t <- tlim
ylims <- c(min(dat[,1],pr95$lwr, pr80$lwr), max(dat[,1],pr95$upr, pr80$upr))
p1 <- ggplot(dat, aes(x = t, y = resp))+
theme_bw() +
geom_point(color = "blue") + xlim(0,max(tlim)) + ylim(ylims) +
geom_line(color = "red", aes(x=t, y=fit))
p1 +
geom_ribbon(mapping=aes(x=t, ymin=lwr, ymax=upr), data=pr95, inherit.aes=FALSE, fill = "grey50") +
geom_ribbon(mapping=aes(x=t, ymin=lwr, ymax=upr), data=pr80, inherit.aes=FALSE, fill = "grey75") +
geom_line(aes(x=t, y=fit), pr95)
}
```
You can plot with
```{r}
plotforecasttv(model)
```
# The problem with high order polynomials
You model fits the data nicely, but your forecasts are very uncertain as you move farther out. This is a property of very flexible models. They fit the data, but tend to overfit so forecasts are more not less uncertain.
# Problems
1. Fit a 1st order polynomial to the sardine data. Create a forecast for 10 years into the future.
1. Create forecasts for one of the other species.
```{r}
unique(landings$Species)
```