Proposal: Constrain the API to return a single edge per cooccuring entity pair #11

bill-baumgartner · 2023-03-08T17:20:02Z

bill-baumgartner
Mar 8, 2023
Maintainer

Currently, the cooccurrence endpoint returns multiple edges for a given cooccurring entity pair. Each edge corresponds to a distinct document part, e.g. title, abstract, sentence, within which the entity pair was observed to cooccur. Each edge includes six different cooccurrence metric scores as well as a list of document identifiers where the entities were observed to cooccur.

We have observed use of the API that results in all edges corresponding to an entity pair being included in the knowledge graph of an ARA. This seems potentially problematic since the presence of multiple edges may be inferred as a sign of increased support/confidence for an assertion. Since the cooccurrence edges are not necessarily independent, this assumption that multiple edges mean greater confidence should not be made. I am concerned that the "default" assumption and use of the API may lead to poor/misunderstood outcomes, especially for ARAs that are automatically scanning the SmartAPI registry and blindly using any KP that is registered.

One solution: consolidate EPC data for each document part in which an entity pair is observed to cooccur. Serve this EPC data on a single edge for a given entity pair.

An alternative solution: Should we disable the lookup operation, and instead require overlay to be used? This solution would likely prevent the unintentional inclusion of cooccurrence edges in ARA KGs. The downside could be that cooccurrence could not be used as a proxy for a relationship that could potentially "connect the dots" in some mechanistic explanation.

I'm not sure of the best path forward, hence this discussion. Please comment below.

edgargaticaCU · 2023-03-09T21:18:12Z

edgargaticaCU
Mar 9, 2023
Maintainer

Currently the only regular use of overlay on the dev instance comes from the SmartAPI status monitor, which does status checks a few times a day. On the other hand, the lookup operations is used more frequently, presumably as part of an ARA's direct query.
This makes me think it would be better to go with the first solution, since this would remove the potential confusion of multiple (possibly dependent) edges while still remaining visible and useful to other tools.

0 replies

edgargaticaCU · 2023-03-10T20:38:51Z

edgargaticaCU
Mar 10, 2023
Maintainer

diabetes_lookup_response.txt
Attached is a sample response which merges the edges for all document zones into one edge.

1 reply

bill-baumgartner Mar 13, 2023
Maintainer Author

This looks good to me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: Constrain the API to return a single edge per cooccuring entity pair #11

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

Proposal: Constrain the API to return a single edge per cooccuring entity pair #11

bill-baumgartner Mar 8, 2023 Maintainer

Replies: 2 comments · 1 reply

edgargaticaCU Mar 9, 2023 Maintainer

edgargaticaCU Mar 10, 2023 Maintainer

bill-baumgartner Mar 13, 2023 Maintainer Author

bill-baumgartner
Mar 8, 2023
Maintainer

Replies: 2 comments 1 reply

edgargaticaCU
Mar 9, 2023
Maintainer

edgargaticaCU
Mar 10, 2023
Maintainer

bill-baumgartner Mar 13, 2023
Maintainer Author