Skip to content
jimdale edited this page May 16, 2024 · 33 revisions

viewser 6++ user documentation

This wiki contains documentation for viewser for versions 6.0.0 and above. Much of what follows can also be found in the README.md file in the root directory of this repository.

Older versions of viewser (<6.0.0) retrieve data from a source which is no longer maintained, and these versions can no longer be used - previous users who wish to continue to use the service should upgrade to the latest version of viewser (see below).

Basic concepts

The viewser package is a client which interacts via an https connection with a remote service, usually known as views3, currently hosted at the Peace Research Institute of Oslo (PRIO).

views3 has an actively maintained Postgres database containing of order 10 000 features of various kinds (e.g. conflict, economic, developmental, environmental) mostly defined at one of two main temporal levels of analysis (month, year) and one of two main spatial levels of analysis (country, priogrid). Most features are regularly updated, some (e.g UCDP GED) on monthly timescales.

The viewser client allows users to fetch raw data from the database and apply a wide variety of mathematical transforms to each feature.

It communicates with a service running on a remote server, which bears all the computational load, thereby relieving users' machines. Communication is by a simple polling model: the client keeps pinging the service until it receives either an error message, or the requested data. In the meantime, the service returns status messages giving users' some idea of how much longer they need to wait for their data.

Users define what data and which transforms (which can be arbitrarily chained) they require using a Python class called a Queryset, which makes the specification of the requested dataset declarative - users do not need to worry about the various lower-level processes needed to produce a certain dataset. In particular, any aggregation or disaggregation between different levels of analysis is performed automatically. The package allows users to simply state what they want in a straightforward fashion that the data service understands, and requests are fulfilled on the fly, while being cached in case they are needed again.

The service eventually returns a single pandas dataframe (compressed for transfer) containing all requested data.

Installing viewser 6++

The following is a brief Quickstart guide - detailed instructions can be found at Detailed installation instructions.

viewser is a Python package publicly available via pip. It was developed for Apple and Linux platforms and has not been tested to any degree on Windows.

It is very strongly recommended that viewser be installed in a dedicated conda environment.

Conda can be obtained here (the miniforge installer is recommended):

https://conda.io/projects/conda/en/latest/user-guide/install/index.html

A minimal viewser environment can be created by executing

conda create -n viewser python=3.11

Once the environment is activated by

conda activate viewser

viewser itself can be installed via

pip install viewser

The viewser package is regularly updated, so users should frequently update it by executing

pip install --upgrade viewser

Once viewser is installed, it needs to be configured to set the URL from which it fetches data. This can be achieved by

viewser config set REMOTE_URL https://viewser.viewsforecasting.org

Getting help

To open this wiki in a browser window from the terminal, run:

viewser help wiki

Using viewser

The viewser client can be used in two ways:

Via command-line interface (CLI)

viewser functionality is exposed via a CLI on your system after installation. An overview of available commands can be seen by running viewser --help.

The CLI is envisaged mainly as a tool to help users with issues such as selecting appropriate transforms, exploring the database, determining the level of analysis of a given feature, etc.

Useful CLI commands

Show all features in the database:

viewser features list <loa>

with <loa> being one of ['pgm', 'cm', 'pgy', 'cy', 'pg', 'c', 'am', 'a', 'ay', 'm', 'y']

Show all transforms sorted by level of analysis:

viewser transforms list

Show all transforms available at a particular level of analysis:

viewser transforms at_loa <loa>

with <loa> being one of ['any', 'country_month', 'priogrid_month', 'priogrid_year']

Show docstring for a particular transform:

viewser transform show <transform-name>

List querysets stored in the queryset database:

viewser querysets list

Produce code required to generate a queryset

viewser querysets show <queryset-name>

Via API

The full functionality of viewser is exposed via its API for use in scripts and notebooks

The two fundamental objects used to define what data is fetched by the client are the Queryset and the Column, where a Queryset consists of one or more Columns, and a Column is one raw feature to which zero or more transforms have been applied.

Follow the links below for guides to creating querysets and fetching data.

Common tasks

Conventions

ViEWS 3 relies on several conventions for data and naming, to make exchange and interoperation between packages easier:

Packages

Documentation is provided for each of the constituent ViEWS 3 packages. Notebooks are also available, which show complete workflows using the various packages in concert. Note that package names are written with a dash (-) on PyPi, and with an underscore (_) on github, due to differing naming conventions.

  • viewser is the entrypoint for interacting with the ViEWS 3 cloud, which provides data for the ViEWS team.
  • views-transformation-library contains data transformation functions available in viewser
  • views-runs provides helper classes for model run management (soon to be deprecated)
  • stepshift implements the step-shifting algorithm used for predictive modelling.
  • views-partitioning is used to partition data for training.