TRAPI 1.5: set_interpretation/MCQ #800

colleenXu · 2024-03-26T02:15:40Z

See notes from 2024-03-27 group meeting.

I have a lot of concerns / questions, and we're going to ask the Translator group for clarification. We'll start with asking about the basic requirements / deadlines.

colleenXu · 2024-03-26T03:07:59Z

This is all of my initial understanding. There seems to be a LOT more going on, which we'll track in the opening post of this issue.

[EDIT: PAUSE FOR DISCUSSION. NEEDS MORE INVESTIGATION ON WHAT THE SPEC IS, based on Eric's reply in Translator Slack]

Previously, we implemented QNode is_set behavior (original issue, behavior with ID/node-expansion).

default behavior (property missing or is_set: false): each result has 1 KG node bound to each QNode
but if a QNode has is_set: true, a result can have >= 1 KG node bound to that QNode. AKA there's a merging/consolidation.

Feature

In TRAPI 1.5, is_set is replaced with set_interpretation, which has more explicit rules for results-assembly (PR, lines 881-896). It's an optional property with string values (enum).

default behavior (property missing or null) == "BATCH": same as before, each result has 1 KG node bound to each QNode
"MANY": same as previous is_set: true behavior.
- Note: This new specification only seems to cover when QNodes have multiple starting IDs. But I'd like to keep our current use of is_set:true / set_interpretation: MANY on QNodes with no starting IDs to merge/consolidate results.
"ALL": new behavior. This should only be set on QNodes that have multiple starting IDs/entities. Similar to the "MANY" behavior, but only keep results that contain all starting IDs/entities.
- AKA if only some of the starting IDs/entities are in the consolidated result, it should be thrown out (and any KG nodes/edges unique to it should be pruned).

Examples

All use the same basic query, just setting set_interpretation to different values. I used HP IDs with no descendants (because ID/node-expansion triggers an automatic use of is_set: true, see #555 (comment))

set to BATCH (default)

Query:

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories":["biolink:PhenotypicFeature"],
                    "ids":["HP:0500041", "HP:0007750"],
                    "set_interpretation": "BATCH"
                },
                "n1": {
                    "categories":["biolink:Disease"]
               }
            },
            "edges": {
                "eA": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:phenotype_of"]
                }
            }
        }
    }
}

Response should have 39 results: current_default.json. This was generated with current default (not setting is_set)

set to MANY

Query:

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories":["biolink:PhenotypicFeature"],
                    "ids":["HP:0500041", "HP:0007750"],
                    "set_interpretation": "MANY"
                },
                "n1": {
                    "categories":["biolink:Disease"]
               }
            },
            "edges": {
                "eA": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:phenotype_of"]
                }
            }
        }
    }
}

Response should have 37 results (rather than 39): current_is_set.json. This was generated with current is_set: true.

set to ALL

Query:

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories":["biolink:PhenotypicFeature"],
                    "ids":["HP:0500041", "HP:0007750"],
                    "set_interpretation": "ALL"
                },
                "n1": {
                    "categories":["biolink:Disease"]
               }
            },
            "edges": {
                "eA": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:phenotype_of"]
                }
            }
        }
    }
}

Response should have 2 results (rather than 39 or 37). Only 2 disease entities are connected to both starting entities. See the first two results of current_is_set.json. This was generated with current is_set: true.

MONDO:0009003 (achromatopsia 2)
MONDO:0013560 (Hermansky-Pudlak syndrome 8)

Complications that need discussion

(1) ID/node expansion

Currently, if we find descendants of a starting ID (ID/node expansion), we set that starting ID's QNode to is_set: true #555 (comment). Can we remove this behavior?

the current behavior has unintended consequences, like being completely unable to do is_set: false / set_interpretation: BATCH for some queries
but I'm not sure if we depend on this behavior (we want to keep the behavior of "subclass_of edges using different descendant IDs are kept in the same result")
Context: we implemented this with an old version of representing subclass info (comment). Now we use "constructed edges" + aux-graphs (issue)

Example

I expect 11 results for following query (not setting is_set), but end up with 10 results which is the same as setting is_set: true.

response to the query below, not setting is_set: 10_not_setting_is_set.json
VS response setting is_set 10_is_set_true.json

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories":["biolink:PhenotypicFeature"],
                    "ids":["HP:0007800", "HP:0025586"]
                },
                "n1": {
                    "categories":["biolink:Disease"]
               }
            },
            "edges": {
                "eA": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:phenotype_of"]
                }
            }
        }
    }
}

This happens because ID/node-expansion finds a descendant for 1 of the starting IDs. Console logs:

  bte:biothings-explorer-trapi:main Expanded ids for node n0: (2 ids -> 3 ids) +0ms
  bte:biothings-explorer-trapi:main Added is_set:true to node n0 +1ms

Note to self with another example

This query has 130 results whether is_set: true or not, when it should have >= 134 results when not. It also has some subclass_of edges/aux-graphs, but I'm not sure if it's a good test for seeing if ID/node-expansion becomes wonky after the changes.

{
    "message": {
        "query_graph": {
            "nodes": {
                "n0": {
                    "categories":["biolink:PhenotypicFeature"],
                    "ids":["HP:0003259", "HP:0000110"]
                },
                "n1": {
                    "categories":["biolink:Disease"]
               }
            },
            "edges": {
                "eA": {
                    "subject": "n0",
                    "object": "n1",
                    "predicates": ["biolink:phenotype_of"]
                }
            }
        }
    }
}

(2) Unclear what the KG Node `is_set` property is for

Asked in Translator Slack:
The PR for set_interpretation also adds an is_set property to KG Nodes (lines 1011-1017). I'm not sure if this is meant to be used, and how (merging KG Nodes??).

Eric's reply in Translator Slack - needs more investigation.

(3) Clarifying an edge case

Asked in Translator Slack:
If set_interpretation is set on a QNode with multiple starting IDs, but these IDs all map to the same entity (using NodeNorm), then there isn't any set behavior to do. Is that fine? Does there need to be any log noting this?

colleenXu added the trapi 1.5 label Mar 26, 2024

colleenXu mentioned this issue Mar 26, 2024

overview and management of TRAPI 1.5 features (excluding set_interpretation/MCQ) #798

Closed

colleenXu changed the title ~~change QNode is_set ➡️set_interpretation, implement "ALL"~~ Major: change QNode is_set ➡️set_interpretation, implement "ALL" Mar 26, 2024

colleenXu added the needs discussion label Mar 27, 2024

colleenXu changed the title ~~Major: change QNode is_set ➡️set_interpretation, implement "ALL"~~ TRAPI 1.5 set_interpretation/MCQ Mar 29, 2024

colleenXu changed the title ~~TRAPI 1.5 set_interpretation/MCQ~~ TRAPI 1.5: set_interpretation/MCQ Mar 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TRAPI 1.5: set_interpretation/MCQ #800

TRAPI 1.5: set_interpretation/MCQ #800

colleenXu commented Mar 26, 2024 •

edited

Loading

colleenXu commented Mar 26, 2024 •

edited

Loading

Feature

Examples

Complications that need discussion

(1) ID/node expansion

(2) Unclear what the KG Node `is_set` property is for

(3) Clarifying an edge case

TRAPI 1.5: set_interpretation/MCQ #800

TRAPI 1.5: set_interpretation/MCQ #800

Comments

colleenXu commented Mar 26, 2024 • edited Loading

colleenXu commented Mar 26, 2024 • edited Loading

Feature

Examples

Complications that need discussion

(1) ID/node expansion

(2) Unclear what the KG Node is_set property is for

(3) Clarifying an edge case

colleenXu commented Mar 26, 2024 •

edited

Loading

colleenXu commented Mar 26, 2024 •

edited

Loading

(2) Unclear what the KG Node `is_set` property is for