-
Notifications
You must be signed in to change notification settings - Fork 0
ModelObjects
The viewser toolkit also includes functions for storing and retrieving model objects. This is used to support production runs, to avoid having to retrain model objects. In addition, storage and retrieval of model objects could be used to share objects with other viewsers or between different computers.
Currently, there is no validation or standardization of the model objects or metadata that are stored. Any pickle-serializeable object can be stored as a model object, and any JSON serializeable object (dictionary) can be storage as metadata. We chose flexibility over rigor to avoid shoehorning, and allow standards for model objects and metadata to develop organically.
A schema for metadata is coming, which will mandate certain values in the metadata.
There is a CLI interface that is useful for exploring and downloading currently available objects. The relevant commands are:
# Show list of names of currently available models
viewser model list
# Show metadata associated with model object
viewser model inspect $NAME
# Download model object to a pickle-file
viewser model download $NAME
For scripting, a Storage
class is made available by the views_runs
package:
from datetime import datetime
from views_runs import storage
# This can currently be any object that you want to store
my_model = ...
# Metadata is currently any JSON serializeable dictionary.
metadata = {"author": "testuser", "training_date": datetime.now()}
store = storage.Storage()
# Store a model object (type Any) and its metadata (type JsonSerializeable)
store.store("my-model", my_model, metadata)
# Fetch a model object
another_model = store.fetch("another_model")
# Show list of available models
store.list()
# Fetch metadata associated with model
assert store.fetch_metadata("my-model") == metadata
The team is currently in the process of developing a schema. To avoid early
lock-in, we have opted for an open system first, which will be restricted as we
figure out what objects and metadata we want to store, by adding validation
logic to the .store
method of the Storage
class.
These are some of the fields that have been discussed as candidates for model object metadata, and should be included in the metadata dictionary:
- level_of_analysis: A string, either "priogrid_month", "country_month", or one of the other available LOAs.
- outcome: The name of the outcome variable on which the model was trained
- queryset_name: The name of the queryset on which the model was trained
- train_start: The start month of the training period (integer, views_month)
- train_start: The end month of the training period (integer, views_month)
- training_date: The date the model was trained (datetime.datetime.now())
- author: A string identifying you as the author
- run_id: Name of the run for which the model was trained
These fields should at some point be available as query parameters to, for example, find all models associated with a run, and author, or similar queries.