Skip to content

ModelObjects

peder2911 edited this page Jan 13, 2022 · 6 revisions

Model object storage

The viewser toolkit also includes functions for storing and retrieving model objects. This is used to support production runs, to avoid having to retrain model objects. In addition, storage and retrieval of model objects could be used to share objects with other viewsers or between different computers.

Using storage

Currently, there is no validation or standardization of the model objects or metadata that are stored. Any pickle-serializeable object can be stored as a model object, and any JSON serializeable object (dictionary) can be storage as metadata. We chose flexibility over rigor to avoid shoehorning, and allow standards for model objects and metadata to develop organically.

A schema for metadata is coming, which will mandate certain values in the metadata.

API

There is a CLI interface that is useful for exploring and downloading currently available objects. The relevant commands are:

# Show list of names of currently available models
viewser model list

# Show metadata associated with model object
viewser model inspect $NAME

# Download model object to a pickle-file
viewser model download $NAME

For scripting, a Storage class is made available by the views_runs package:

from datetime import datetime
from views_runs import storage

# This can currently be any object that you want to store
my_model = ...

# Metadata is currently any JSON serializeable dictionary.
metadata = {"author": "testuser", "training_date": datetime.now()}
 
store = storage.Storage()

# Store a model object (type Any) and its metadata (type JsonSerializeable)
store.store("my-model", my_model, metadata)

# Fetch a model object

another_model = store.fetch("another_model")

# Show list of available models
store.list()

# Fetch metadata associated with model
assert store.fetch_metadata("my-model") == metadata

Schema

The team is currently in the process of developing a schema. To avoid early lock-in, we have opted for an open system first, which will be restricted as we figure out what objects and metadata we want to store, by adding validation logic to the .store method of the Storage class.

These are some of the fields that have been discussed as candidates for model object metadata, and should be included in the metadata dictionary:

  • level_of_analysis: A string, either "priogrid_month", "country_month", or one of the other available LOAs.
  • outcome: The name of the outcome variable on which the model was trained
  • queryset_name: The name of the queryset on which the model was trained
  • train_start: The start month of the training period (integer, views_month)
  • train_start: The end month of the training period (integer, views_month)
  • training_date: The date the model was trained (datetime.datetime.now())
  • author: A string identifying you as the author
  • run_id: Name of the run for which the model was trained

These fields should at some point be available as query parameters to, for example, find all models associated with a run, and author, or similar queries.

Clone this wiki locally