forked from seankross/bookdown-start
-
Notifications
You must be signed in to change notification settings - Fork 4
/
Evaluation.qmd
177 lines (128 loc) · 8 KB
/
Evaluation.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
# How did my forecast perform?
Forecasts will be evaluated at each site and forecast horizon (i.e., time-step into the future), and a summary score will be assigned evaluating overall performance of all forecast submissions across sites. Forecasts will also be compared to a null model.
## Scoring Metric: Continuous Ranked Probability Score
Forecasts will be scored using the continuous ranked probability score (CRPS), a proper scoring rule for evaluating forecasts presented as distributions or ensembles (Gneiting & Raftery 2007). The CRPS compares the forecast probability distribution to that of the validation observation and assigns a score based on both the accuracy and precision of the forecast. We will use the 'crps_sample' function from the `scoringRules` package in R to calculate the CRPS for each forecast.
We will generate a combined score for all locations and forecast horizons. Forecasts will also be evaluated using the CRPS at each time-step in the forecast horizon and each location included in the forecasts.
Importantly, we use the convention for CRPS where zero is lowest and best possible score, therefore teams want to achieve the lowest score. CPRS can be also expressed as a negative number with zero as highest and best possible score (Gneiting & Raftery 2007). The `scoringRules` package that we use follows the 0 or greater convention.
```{r setup, include=FALSE}
library(scoringRules)
```
### Example of a CRPS calculation from an ensemble forecast
The following uses Equation 2 in Jordan, Kruger, and Lerch 2018
![Equation 1 from Jordan, Kruger, and Lerch 2018.](images/equation_2.png)
First, create a random sample from a probability distribution. This is the "forecast" for a particular point in time. For simplicity, we will use a normal distribution with a mean of 8 and standard deviation of 1
```{r}
x <- rnorm(1000, mean = 8, sd = 1.0)
```
Second, we have our data point (i.e., the target). We will set it to zero as well
```{r}
y <- 8
```
Now calculate CRPS using Equation 2
```{r}
s <- 0
for(i in 1:length(x)){
for(j in 1:length(x)){
s <- s + abs(x[i] - x[j])
}
}
crps_equation_2 <- mean(abs(x - y)) - s / (2 * length(x)^2)
crps_equation_2
```
Now calculate using the `crps_sample()` function in the `scoringRules` package
```{r}
crps_sample(y = y, dat = x)
```
### Exploring the scoring surface
Now lets see how the CRPS changes as the mean and standard deviation of the forecasted distribution change
First, set vectors for the different mean and SD values we want to explore
```{r}
sample_mean <- seq(4, 12, 0.1)
sample_sd <- seq(0.1, 10, 0.1)
```
Second, set our observed value to 8 for simplicity
```{r}
y <- 8
```
Now calculate the CRPS at each combination of forest mean and SD
```{r}
combined <- array(NA, dim = c(length(sample_mean), length(sample_sd)))
for(i in 1:length(sample_mean)){
for(j in 1:length(sample_sd)){
sample <- rnorm(10000, sample_mean[i], sample_sd[j])
combined[i, j] <- crps_sample(y = y, dat = sample)
}
}
```
Finally, visualize the scoring surface with the observed value represented by the red line
```{r}
contour(x = sample_mean, y = sample_sd, z = as.matrix(combined),nlevels = 20, xlab = "Mean", ylab = "SD")
abline(v = y, col = "red")
```
The contour surface highlights the trade-off between the mean and standard deviation.
### CRPS from the Normal Distribution
If the distributional forecast is a normal distribution represented by a mean $\mu$ and standard deviation $\sigma$, an ensemble of predictions is not needed to evaluate CRPS because we can take advantage of the analytic solution to CRPS under the normal assumption (Equation 4 from Gneiting et al. 2005)
Equation 5 from Gneiting et al. (2005) gives
$$\begin{align*}
CRPS(N(\mu, \sigma^2) | y) = \sigma \left( \frac{y - \mu}{\sigma} \left( 2 \Phi\left( \frac{y - \mu}{\sigma} \right) - 1 \right) + 2 \phi \left( \frac{y - \mu}{\sigma} \right) - \frac{1}{\sqrt{\pi}} \right)
\end{align*}$$
for $\Phi(\cdot)$ and $\phi(\cdot)$ the standard normal CDF and PDF, respectively. Therefore, if the forecast distribution is **truly** a normal distribution (often this isn't true in forecasts that only report a mean and sd) a simplified score can be applied as follows:
```{r}
sample_mean <- seq(4, 12, 0.1)
sample_sd <- seq(0.1, 10, 0.1)
combined_norm <- array(NA, dim = c(length(sample_mean), length(sample_sd)))
for(i in 1:length(sample_mean)){
for(j in 1:length(sample_sd)){
combined_norm[i, j] <- crps_norm(y = y, mean = sample_mean[i], sd = sample_sd[j])
}
}
```
Finally, visualize the scoring surface with the observed value represented by the red line.
```{r}
contour(x = sample_mean, y = sample_sd, z = as.matrix(combined_norm), nlevels = 20, xlab = "Mean", ylab = "SD")
abline(v = y, col = "red")
```
Note that at a given value of the sd, the lowest score is achieved at $\mu = y$ as shown for each of the blue lines where the minimum value of the score across each blue line is at the red line. This behavior makes sense because the CRPS is a score that rewards accuracy and precision. Thus, for any given level of precision (represented by the standard deviation), CRPS is optimized by producing the most accurate prediction of the distribution's location.
```{r}
contour(x = sample_mean, y = sample_sd, z = as.matrix(combined_norm), nlevels = 20, xlab = "Mean", ylab = "SD")
abline(v = y, col = "red")
abline(h = 2.5, col = "blue")
abline(h = 4.3, col = "blue")
abline(h = 6.8, col = "blue")
```
Interestingly, for a given mean $\mu \neq y$ we find a pattern that makes intuitive sense given the goal of CRPS to produce forecasts that are both accurate and precise. For a given amount of bias in the prediction (i.e., given a $\mu \neq y$), the optimal score is achieved by a standard deviation that slightly larger than the bias.
```{r}
layout(matrix(1:4, 2, 2, byrow = TRUE))
## plots for mu = 7
mu <- 7
contour(x = sample_mean, y = sample_sd, z = as.matrix(combined_norm), nlevels = 20, xlab = "Mean", ylab = "SD", main = paste0("CRPS contour given mu = ", mu))
abline(v = mu, col = "red")
min_sd <- sample_sd[which.min(crps_norm(y, mean = mu, sd = sample_sd))]
abline(h = min_sd, col = "blue")
plot(sample_sd, crps_norm(y, mean = mu, sd = sample_sd), type = 'l', main = paste0("CRPS profile given mu = ", mu))
abline(v = min_sd, col = "blue")
## plots for mu = 11
mu <- 11
contour(x = sample_mean, y = sample_sd, z = as.matrix(combined_norm), nlevels = 20, xlab = "Mean", ylab = "SD", main = paste0("CRPS contour given mu = ", mu))
abline(v = mu, col = "red")
min_sd <- sample_sd[which.min(crps_norm(y, mean = mu, sd = sample_sd))]
abline(h = min_sd, col = "blue")
plot(sample_sd, crps_norm(y, mean = mu, sd = sample_sd), type = 'l', main = paste0("CRPS profile given mu = ", mu))
abline(v = min_sd, col = "blue")
```
Next, we plot the relationship between a given value of $\mu$ and the $\sigma$ that produces the optimal CRPS. This looks like a linear relationship.
```{r}
optimal_sd <- rep(0, length(sample_mean))
for (i in 1:length(sample_mean)) {
optimal_sd[i] <- sample_sd[which.min(crps_norm(y, mean = sample_mean[i], sd = sample_sd))]
}
plot(sample_mean, optimal_sd, type = 'l')
```
Let's estimate the slope of the relationship. It looks like the optimal $sd$ for a normal distribution forecast that is biased by $|y - \mu|$ is $sd = 1.2|y - \mu|$ which makes sense as this would put the true value in a region of high probability.
```{r}
coef(lm(optimal_sd[sample_mean > 0] ~ sample_mean[sample_mean > 0]))
```
## References
Gneiting, T., A.E. Raftery. 2007. Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477): 359--378. https://doi.org/10.1198/016214506000001437
Jordan, A., F. Kruger, and S. Lerch 2018. Evaluating probabilistic forecasts with scoringRules. https://cran.r-project.org/web/packages/scoringRules/vignettes/article.pdf
Gneiting, T., A.E. Raftery, A.H. Westveld III, T. Goldman. 2005. Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Monthly Weather Review, 133(5): 1098-1118. https://doi.org/10.1175/MWR2904.1