Skip to content

ModelObjects

mihaicroicu edited this page Jan 28, 2022 · 6 revisions

Model object storage

The viewser toolkit also includes functions for storing and retrieving model objects. This is used to support production runs, to avoid having to retrain model objects. In addition, storage and retrieval of model objects could be used to share objects with other viewsers or between different computers.

API

There is a CLI interface that is useful for exploring and downloading currently available objects. The relevant commands are:

# Show list of names of currently available models
viewser model list

# Show metadata associated with model object
viewser model inspect $NAME

# Download model object to a pickle-file
viewser model download $NAME

For scripting, a Storage class is made available by the views_runs package:

from datetime import datetime
from views_runs import storage, ModelMetadata

# This can currently be any object that you want to store
my_model = ...

# Metadata is added via the ModelMetadata class (see below)
metadata = ModelMetadata(
   author = "me",
   run_id = "my-run",
   queryset_name = "my-queryset",
   train_start = 0,
   train_end = 100,
   training_date = datetime.now())

 
store = storage.Storage()

# Store a model object (type Any) and its metadata (type JsonSerializeable)
# The overwrite parameter determines whether or not an existing file would be replaced.
store.store("my-model", my_model, metadata, overwrite = False)

# Fetch a model object

another_model = store.fetch("another_model")

# Show list of available models
store.list()

# Fetch metadata associated with model
assert store.fetch_metadata("my-model") == metadata

Metadata

Metadata is added to models to give some context about authorship, time coverage, training data and training date. These values are primarily used for organization. To avoid imposing restrictions or norms on the preprocessing of data or training of models, we have opted to use code as the most important source of further information about models.

The following values are mandated by the ModelMetadata object:

  • author: Your name
  • run_id: The name of the run associated with the model
  • queryset_name: The name of the queryset used to train the model
  • train_start: An integer which translates to a views-month
  • train_end: An integer which translates to a views-month
  • training_date: The date that the model was trained

As shown above, these values are provided when instantiating a new ModelMetadata.

Run Management

The runs mentioned above are managed through the ViewsMetadata system from the views-forecasts package, and solely referenced by the ModelObjects system. To add runs, you will need this package installed and loaded, instructions available here: https://github.com/UppsalaConflictDataProgram/views_forecasts/blob/master/PredictionsDocs.ipynb

Import the package like this:

from views_forecasts.extensions import *

To see what runs are available in the system, run:

ViewsMetadata().get_runs()

To add a new run, run:

ViewsMetadata().new_run(name : str, description : str, min_month : Union[int,None], max_month : Union[int,None])

where :

  • name is the name of the run (e.g. escwa_2021_01).
  • description is a human legible text field. It can be arbitrarily long
  • min_month is the expected first month that should be present in all predictions generated as part of that run.
  • max_month is the expected last month that should be extant in all predictions generated as part of that run.

The min_month and max_month are not used right now, and they are thought as time annotations for own use, not strict constraints enforced through the system.

Clone this wiki locally