Proposal: Constrain the API to return a single edge per cooccuring entity pair #11
Replies: 2 comments 1 reply
-
Currently the only regular use of |
Beta Was this translation helpful? Give feedback.
-
diabetes_lookup_response.txt |
Beta Was this translation helpful? Give feedback.
-
Currently, the cooccurrence endpoint returns multiple edges for a given cooccurring entity pair. Each edge corresponds to a distinct document part, e.g. title, abstract, sentence, within which the entity pair was observed to cooccur. Each edge includes six different cooccurrence metric scores as well as a list of document identifiers where the entities were observed to cooccur.
We have observed use of the API that results in all edges corresponding to an entity pair being included in the knowledge graph of an ARA. This seems potentially problematic since the presence of multiple edges may be inferred as a sign of increased support/confidence for an assertion. Since the cooccurrence edges are not necessarily independent, this assumption that multiple edges mean greater confidence should not be made. I am concerned that the "default" assumption and use of the API may lead to poor/misunderstood outcomes, especially for ARAs that are automatically scanning the SmartAPI registry and blindly using any KP that is registered.
One solution: consolidate EPC data for each document part in which an entity pair is observed to cooccur. Serve this EPC data on a single edge for a given entity pair.
An alternative solution: Should we disable the
lookup
operation, and instead requireoverlay
to be used? This solution would likely prevent the unintentional inclusion of cooccurrence edges in ARA KGs. The downside could be that cooccurrence could not be used as a proxy for a relationship that could potentially "connect the dots" in some mechanistic explanation.I'm not sure of the best path forward, hence this discussion. Please comment below.
Beta Was this translation helpful? Give feedback.
All reactions