-
Notifications
You must be signed in to change notification settings - Fork 26
Caching results from common API calls
The Open Tree webapps make frequent calls to a few API methods, often requesting the same data (e.g. arguson views of major clades). For performance reasons, and to relieve stress on the API servers, we've opted to cache these results using web2py.
This solution assumes that all cached APIs (currently treemachine
and taxomachine
) are in the same domain as phylesystem-api
, as in our standard configuration. Systems that distribute APIs across multiple domains should either use non-caching method URLs or modify the cached
action below to work across domains.
Initially, cached values are stored in RAM and set to never expire. To clear all cached values (after each synthesis release or other change in source data), simply restart web2py.
Web2py uses a @cache
decorator to designate controller actions whose responses will be cached. Arguments to this decorator are evaluated on each request, include one which lets us define a unique cache key for each method call and its arguments, for example:
taxomachine/v1/getContextsJSON
treemachine/v1/getSyntheticTree?format=arguson&maxDepth=3&subtreeNodeID=170042&treeID=otol.draft.22
The "query string" portion of this key is reconstructed from request.vars
, so it captures all arguments, whether originally sent via GET
or POST
. For example, this is needed to distinguish calls to getSyntheticTree
, which would otherwise all return a single response (a single arguson view).
When a cached controller action is called, web2py will check the cache to see if there's a response under the proposed key. If found, this is returned immediately; if not found, the controller action is called normally, and its result stored in the cache before being returned to the caller.
Since phylesystem-api (a web2py app) is the default recipient for calls to api.opentreeoflife.org
, we've added the caching hooks there. This also makes for a single, generic controller action cached
in the default controller. Any Open Tree API method can be called via this proxy, simply by adding cached/
immediately after the domain for an API method.
For example, the tree-view app loads arguson views of a target clade (and its nearby descendants) using this method:
https://api.opentreeoflife.org/treemachine/v1/getSyntheticTree
To cache the results for next time (or retrieve the cached results quickly):
https://api.opentreeoflife.org/cached/treemachine/v1/getSyntheticTree
That's it! To use caching for common API calls in the tree-view app, we've simply modified its config file to include CACHED_
versions of some API base URLs, and updated cache-worthy method URLs to use them.
For tree-view (arguson) responses, this is done by simply restarting apache, which resets the RAM cache.
We're also using the @cache
decorator in the main webapp to cache "local comments" (tied to a particular node, taxon, or URL) in RAM. These are subject to more frequent changes, from the curation UI or users working directly in the 'feedback' issue tracker on GitHub.
This is handled (imperfectly) using a combination of methods:
-
A GitHub webhook provides notification when an issue (or issue comment) is created. This pings the
/plugin_localcomments/clear_local_comments
(described here), which analyses the JSON payload and tries to clear only the related cache items. -
The "delete comment" and "close issue" buttons in curation UI will also clear the cache. Currently this is a brute-force clearing of all cached comments, since we lack the context to be more discriminating.
-
Since GitHub's webhook can't be triggered by modifying or deleting comments on GitHub itself, we only store these results for 5 minutes. This should hopefully provide a balance of performance under load and adequate freshness.
If caching these variables in RAM creates problems, we can change a single line of code to switch to a filesystem-based "disk cache". These values would survive a web2py restart, so the cache would need to be cleared explicitly using either of these web2py cache methods:
cache.ram(key, None) # clear a single value using its unique key
cache.ram.clear(regex='...') # clear all values with keys matching this regex