-
Notifications
You must be signed in to change notification settings - Fork 0
ModelObjects
The viewser toolkit also includes functions for storing and retrieving model objects. This is used to support production runs, to avoid having to retrain model objects. In addition, storage and retrieval of model objects could be used to share objects with other viewsers or between different computers.
There is a CLI interface that is useful for exploring and downloading currently available objects. The relevant commands are:
# Show list of names of currently available models
viewser model list
# Show metadata associated with model object
viewser model inspect $NAME
# Download model object to a pickle-file
viewser model download $NAME
For scripting, a Storage
class is made available by the views_runs
package:
from datetime import datetime
from views_runs import storage, ModelMetadata
# This can currently be any object that you want to store
my_model = ...
# Metadata is added via the ModelMetadata class (see below)
metadata = ModelMetadata(
author = "me",
run_id = "my-run",
queryset_name = "my-queryset",
train_start = 0,
train_end = 100,
training_date = datetime.now())
store = storage.Storage()
# Store a model object (type Any) and its metadata (type JsonSerializeable)
# The overwrite parameter determines whether or not an existing file would be replaced.
store.store("my-model", my_model, metadata, overwrite = False)
# Fetch a model object
another_model = store.fetch("another_model")
# Show list of available models
store.list()
# Fetch metadata associated with model
assert store.fetch_metadata("my-model") == metadata
Metadata is added to models to give some context about authorship, time coverage, training data and training date. These values are primarily used for organization. To avoid imposing restrictions or norms on the preprocessing of data or training of models, we have opted to use code as the most important source of further information about models.
The following values are mandated by the ModelMetadata
object:
-
author
: Your name -
run_id
: The name of the run associated with the model -
queryset_name
: The name of the queryset used to train the model -
train_start
: An integer which translates to a views-month -
train_end
: An integer which translates to a views-month -
training_date
: The date that the model was trained
As shown above, these values are provided when instantiating a new ModelMetadata.
The runs
mentioned above are managed through the ViewsMetadata
system from the views-forecasts
package, and solely referenced by the ModelObjects
system. To add runs, you will need this package installed and loaded, instructions available here: https://github.com/UppsalaConflictDataProgram/views_forecasts/blob/master/PredictionsDocs.ipynb
Import the package like this:
from views_forecasts.extensions import *
To see what runs are available in the system, run:
ViewsMetadata().get_runs()
To add a new run, run:
ViewsMetadata().new_run(name : str, description : str, min_month : Union[int,None], max_month : Union[int,None])
where :
-
name
is the name of the run (e.g.escwa_2021_01
). -
description
is a human legible text field. It can be arbitrarily long -
min_month
is the expected first month that should be present in all predictions generated as part of that run. -
max_month
is the expected last month that should be extant in all predictions generated as part of that run.
The min_month and max_month are not used right now, and they are thought as time annotations for own use, not strict constraints enforced through the system.