Skip to content
kltm edited this page Jun 1, 2015 · 5 revisions

What is the “Schema”?

The schema in question is literally the schema file (probably schema.xml) used by the Solr server. However, this probably isn’t the most useful way to think about it for our purposes.

A better way to think about it might be that it is a set of shared information and assumptions about the data in the Solr server shared with the client (bbop.manager, widgets, etc.) libraries. For what we do, the information is shared either by a JSON variable (amigo.data.golr in a lot of cases–a data blob used to initialize bbop.golr.conf) or YAML files (such those found in AmiGO 2’s metadata/ directory). In fact, in our case, the JSON variable is rather directly generated from the YAML files.

The Current State of the Schema

You can take a direct look at the current schema used for AmiGO 2 at GitHub or look the files that it is derived from (using OWLTools).

What Are the Fields?

There are special endings for fields that are searched for by the various managers, widgets, and libraries. These should be considered more-or-less special tokens and not generally used unless you know what you’re doing.

Although the JSON schema blob is more used by us in practice, since it is derived from the YAML files (and has a very simple map), we usually do not worry about the exact fields. However, with an eye on making a new loader, let’s examine some of the fields of a schema in more detail.

A partial ordering: _list|_closure _map|_label (_searchable)

Simple fields

Things like synonyms are just a simple field with no further information attached. They are usually rendered as-is. Another, for example, is:

  • taxon

Searchable fields (*_searchable)

Many fields need to be processed differently to make them more searchable (descriptions, etc), rather than just being something that is easy to filter by. (Managers will often scan for these automatically in the JSON configs produced from the YAML files.)

  • taxon_label
  • taxon_label_searchable

Label/ID pairs (*_label)

If there are fields X and X_label, these are considered a pair and will often be rendered with a link out.

An example so far is the grouping:

  • taxon
  • taxon_label
  • taxon_label_searchable

Lists and closures (_list and _closure)

Sometimes the data is involved is lists (references, flattened closures, etc.) In these cases, we also add _list or _closure fields into the schema. With taxon for example:

  • taxon
  • taxon_label
  • taxon_label_searchable
  • taxon_closure
  • taxon_closure_label
  • taxon_closure_label_searchable

Maps (*_map)

Because we’d like to be able to know the how lists of IDs and labels relate to each other, there is also the automatic category of maps. These are simple JSON blobs that hold the mapping between IDs and labels when there are lists/closures.

  • taxon
  • taxon_label
  • taxon_label_searchable
  • taxon_closure
  • taxon_closure_label
  • taxon_closure_map
  • taxon_closure_label_searchable

JSON (*_json)

Besides the _map, the other JSON category is the general (and unsearchable) _json. These are currently used for things like rendering the complex nature of annotation extensions and the like, e.g.: annotation_extension_json.

Graphs (*_graph_json)

Still a slightly special category, these are considered to be graphs that conform to being loadable by bbop-graph’s load_json method.

  • topology_graph_json
  • transitivity_graph_json

What is a Personality?

Given what we have now, a personality is just the set of assumptions defined in a single YAML file (or, more practically, defined in a field of the JSON hash).

In our example, by setting the personality to “bbop_ont”, we…

TODO