-
Notifications
You must be signed in to change notification settings - Fork 3
/
README.Rmd
119 lines (89 loc) · 4.79 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
---
title: "stackr package"
output: github_document
---
<!-- badges: start -->
![GitHub R package version](https://img.shields.io/github/r-package/v/epiforecasts/stackr)
[![R-CMD-check](https://github.com/epiforecasts/stackr/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/epiforecasts/stackr/actions/workflows/R-CMD-check.yaml)
[![codecov](https://codecov.io/github/epiforecasts/stackr/branch/main/graph/badge.svg?token=rYeyG3kFIa)](https://codecov.io/github/epiforecasts/stackr)
![GitHub contributors](https://img.shields.io/github/contributors/epiforecasts/stackr)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
<!-- badges: end -->
# Overview
The `stackr` package provides an easy way to combine predictions
from individual time series or panel data models to an
ensemble. `stackr` stacks models according to the Continuous Ranked Probability
Score (CRPS) over k-step ahead predictions. It is therefore especially
suited for time-series and panel data. A function for
leave-one-out CRPS may be added in the future. Predictions need to be
predictive distributions represented by predictive samples. Usually, these will
be sets of posterior predictive simulation draws generated by an MCMC
algorithm.
# Installation
Install using
``` {r eval = FALSE}
devtools::install_github("epiforecasts/stackr")
```
# CRPS Stacking
Given some training data with true observed values as well as predictive samples
generated from different models, `stackr` finds the optimal (in the sense of
minimizing expected cross-validation predictive error) weights to form an
ensemble of these models. Using these weights, `stackr` can then provide
samples from the optimal model mixture by drawing from the predictive samples
of those models in the correct proportion. This gives a mixture model
solely based on predictive samples and is in this regard superior to other
ensembling techniques like Bayesian Model Averaging. More information
can be found in the package vignette.
Weights are generated using the `crps_weights` function. With these weights
and predictive samples, the `mixture_from_samples` function can be used to obtain
predictive samples from the optimal mixture model.
# Usage
## Load example data and split into train and test data
``` {r eval = FALSE}
splitdate <- as.Date("2020-03-28")
traindata <- example_data[date <= splitdate]
testdata <- example_data[date > splitdate]
```
## Get weights and create mixture
``` {r eval = FALSE}
weights <- crps_weights(traindata)
test_mixture <- mixture_from_samples(testdata, weights = weights)
```
## Score predictions
``` {r eval = FALSE}
library("scoringutils")
# combine data.frame with mixture with predictions from other models
score_df <- rbindlist(list(testdata, test_mixture), fill = TRUE)
# score all predictions using from github.com/epiforecasts/scoringutils
score_df[, crps := crps(unique(observed), t(predicted)),
by = .(geography, model, date)
]
# summarise scores
score_df[, mean(crps), by = model][, setnames(.SD, "V1", "CRPS")]
```
# References
- Using Stacking to Average Bayesian Predictive Distributions, Yuling Yao, Aki Vehtari, Daniel Simpson, and Andrew Gelman, 2018, Bayesian Analysis 13, Number 3, pp. 917–1003 DOI 10.1214/17-BA1091
- Strictly Proper Scoring Rules, Prediction, and Estimation,
Tilmann Gneiting and Adrian E. Raftery, 2007, Journal of the American
Statistical Association, Volume 102, 2007 - Issue 477 DOI 10.1198/016214506000001437
- Comparing Bayes Model Averaging and Stacking When Model Approximation Error Cannot be Ignored,
Bertrand Clarke, 2003, Journal of Machine Learning Research 4
- Bayesian Model Weighting: The Many Faces of Model Averaging,
Marvin Höge, Anneli Guthke and Wolfgang Nowak, 2020, Water, DOI 10.3390/w12020309
- Bayesian Stacking and Pseudo-BMA weights using the loo package,
Aki Vehtari and Jonah Gabry, 2019, https://mc-stan.org/loo/articles/loo2-weights.html
Contributors
---
<!-- ALL-CONTRIBUTORS-LIST:START - Do not remove or modify this section -->
<!-- prettier-ignore-start -->
<!-- markdownlint-disable -->
All contributions to this project are gratefully acknowledged using the [`allcontributors` package](https://github.com/ropensci/allcontributors) following the [all-contributors](https://allcontributors.org) specification. Contributions of any kind are welcome!
### Code
<a href="https://github.com/epiforecasts/stackr/commits?author=nikosbosse">nikosbosse</a>,
<a href="https://github.com/epiforecasts/stackr/commits?author=sbfnk">sbfnk</a>,
<a href="https://github.com/epiforecasts/stackr/commits?author=seabbs">seabbs</a>
### Issues
<a href="https://github.com/epiforecasts/stackr/issues?q=is%3Aissue+commenter%3Ajonathonmellor">jonathonmellor</a>
<!-- markdownlint-enable -->
<!-- prettier-ignore-end -->
<!-- ALL-CONTRIBUTORS-LIST:END -->