-
Notifications
You must be signed in to change notification settings - Fork 1
Dataset naming conventions
All prediction datasets (or "runs" of the forecasting models, as they are more commonly known within VIEWS) in the VIEWS API are labelled as follows:
{model name}{model version}_{date}_{try sequence}
, where:
-
model name
is a a short label for the prediction model at hand, e.g. the fatalities model that is currently in production. To learn more about current and deprecated VIEWS models, please visit https://viewsforecasting.org/methodology. -
model version
is a numeric identifier that specifies the concerned version of the aforementioned model, e.g.001
(from 2022 onwards). Changes to the model(s) in production (such as new ensembling techniques or updates to model compositions) are implemented in batches and documented in the model changelog. Each batch of changes prompts a new model version, upon which the corresponding numeric identifier is incremented by 1. -
date
specifies the calendar year (YYYY
) and month (MM
) of the last data that informs a given set of predictions, or – in the case of predictor/feature datasets – the end date of the period that the given dataset covers.2022_07
would thus either refer to a set of predictions that are informed by data up to and including July 2022, or a dataset with predictor/feature data up to and including the month of July 2022.- For pre-2022 data releases,
date
instead refers to the release date of the given dataset.
- For pre-2022 data releases,
-
try sequence
indicates whether the aforementioned production run required any bug fixes prior to successful completion. If the production run was completed on the first attempt, the try counter is given the default value oft01
(or simply01
for pre-2022 data releases). For each additional attempt, the counter is incremented by 1. Errors and resolutions are documented in the model changelog.
Datasets that contain data on selected input variables, or "predictors", informing the current prediction model are labelled as follows:
predictors_{model name}{model version}_0000_00_00
, where:
-
model name
is a a short label for the prediction model at hand, e.g. the fatalities model that is currently in production. To learn more about current and deprecated VIEWS models, please visit https://viewsforecasting.org/methodology. -
model version
is a numeric identifier that specifies the concerned version of the aforementioned model, e.g.001
(from 2022 onwards). Changes to the model(s) in production (such as new ensembling techniques or updates to model compositions) are implemented in batches and documented in the model changelog. Each batch of changes prompts a new model version, upon which the corresponding numeric identifier is incremented by 1. -
0000_00_00
is an indication that the dataset is being updated on a monthly basis.
When we transition to a new version of our forecasting model, we often make changes to the list of input data variables/predictors that inform the model.
To avoid confusion on this matter, we separate the input datasets in our API by the models that they inform, and only keep the input datasets updated while the models in questions are in use. When a given model is no longer in use, we stop updating its input dataset and change the name thereof to signal that the dataset has turned static.
In practise, we replace 0000_00_00
in the dataset name with {end date}_00
.
Input datasets that are not longer updated are thus named as follows:
predictors_{model name}{model version}_{end date}_00
, where:
-
model name
is a a short label for the prediction model at hand, e.g. the fatalities model that is currently in production. To learn more about current and deprecated VIEWS models, please visit https://viewsforecasting.org/methodology. -
model version
is a numeric identifier that specifies the concerned version of the aforementioned model, e.g.001
(from 2022 onwards). See the model changelog for more information on each version. -
end date
specifies the last calendar year (YYYY
) and month (MM
) that the dataset covers, e.g.2022_07
for a dataset that runs up until and including July 2022.