consolidate results for multi-ID query nodes #331

andrewsu · 2021-10-21T19:47:37Z

This ticket is a refinement of the results-building behavior defined in #164. Previously we did not address the case where the TRAPI specifies multiple allowed IDs for a single node. For example, consider this query (also pasted as the Example query below), that asks BTE to explain the relationship between 6 different-but-related drugs in n00 to 3 different-but-related diseases in n02. In the BTE results rendered in ARAX (see BTE screenshot), there are three different results that differ only by the three diseases in n02. In this case, the desired behavior would be to combine those three entries into a single result object. Further down below in the results list, there are also other result objects that correspond to the different drugs in the query -- those also should be consolidated into a single result object.

The rationale for this change is that the "result" that the user likely cares about is the gene in n01, and the user will likely want to see all the supporting evidence for that gene. So, a gene that is connected to multiple drugs in n00 and multiple disease in n02 would be viewed very differently than a gene connected to only one drug/disease.

Example query:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01"
        },
        "e01": {
          "subject": "n01",
          "object": "n02"
        }
      },
      "nodes": {
        "n00": {
          "ids": [
            "DRUGBANK:DB00215",
            "DRUGBANK:DB01175",
            "DRUGBANK:DB00472",
            "DRUGBANK:DB00176",
            "DRUGBANK:DB00715",
            "DRUGBANK:DB01104"
          ],
          "categories": ["biolink:SmallMolecule"]
        },
        "n01": {
          "categories": ["biolink:Gene"]
        },
        "n02": {
          "ids": ["DOID:5844", "DOID:1936", "DOID:3393"],
          "categories": ["biolink:Disease"]
        }
      }
    }
  }
}

BTE screenshot

The text was updated successfully, but these errors were encountered:

colleenXu · 2021-10-21T19:53:07Z

note that this would be change compared to previous reasoning...which was to bind only 1 entity per qnode...

perhaps this is a "special" case where more than 1 ID / entity is supplied in 1 qnode...

ariutta · 2021-10-21T22:53:34Z

What should the JSON for the consolidated results assembly look like?

andrewsu · 2021-10-22T05:23:18Z

starting with the BTE result here: https://arax.ncats.io/api/arax/v1.2/response/db1e8d54-3d3b-469e-9c8b-caee7e355064 (part of which is shown the screenshots above), a consolidated record from that might look something like the JSON in https://gist.github.com/andrewsu/bb35aaccbecca4a0f697e33e5570c09c, which would be visualized like this:

ariutta · 2021-10-27T15:57:59Z

My understanding for this example is that each of the different-but-related drugs must have a record (returned from at least one of the external APIs) that connects it with the gene. In other words, if acetaminophen were one of the different-but-related drugs, it wouldn't be enough that fluoxetine had a record connecting it with INS. If acetaminophen didn't have its own record connecting it with INS, it shouldn't be included in the consolidated drugs node. Does this sound right?

Stated another way: if both fluoxetine and acetaminophen were in the list of different-but-related drugs in the query, but only fluoxetine had a record connecting it with INS, then acetaminophen shouldn't be included in the consolidated drugs node in the results, right?

andrewsu · 2021-10-27T16:24:08Z

yes, correct. unless there was an edge connecting acetaminophen to INS, acetaminophen should not appear in the node_bindings section of the result.

ariutta · 2021-10-27T16:58:45Z

In our group meeting, we decided this consolidation will at least for now only happen for lists of ids for nodes. (If needed, we may consider consolidation by node categories and edge predicates in the future, but those may also never be needed.)

ariutta · 2021-11-03T15:51:39Z

I wanted to verify -- we could have ids lists for different nodes, and those nodes could be of the same type. So if we had two disease nodes, they could each have ids lists, and those lists could be identical, partially intersecting or non-intersecting, right?

andrewsu · 2021-11-03T16:35:48Z

yes, I think that's right. But in practice, those lists will be non-intersecting 99% of the time. So if throwing an error where multiple nodes of the same type have any overlapping IDs significantly simplifies the implementation, I think that's totally fine...

andrewsu · 2021-11-04T05:17:18Z

Closing this ticket as a special case of #341. If the intent is to consolidate results when multiple IDs are specified, then the query should explicitly state that using the is_set parameter. So the behavior described above should be performed if and only if the query looks like this:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01"
        },
        "e01": {
          "subject": "n01",
          "object": "n02"
        }
      },
      "nodes": {
        "n00": {
          "ids": [
            "DRUGBANK:DB00215",
            "DRUGBANK:DB01175",
            "DRUGBANK:DB00472",
            "DRUGBANK:DB00176",
            "DRUGBANK:DB00715",
            "DRUGBANK:DB01104"
          ],
          "categories": ["biolink:SmallMolecule"],
          "is_set": true
        },
        "n01": {
          "categories": ["biolink:Gene"],
          "is_set": false
        },
        "n02": {
          "ids": ["DOID:5844", "DOID:1936", "DOID:3393"],
          "categories": ["biolink:Disease"],
          "is_set": true
        }
      }
    }
  }
}

andrewsu mentioned this issue Oct 29, 2021

consolidate results based on is_set query parameter #341

Closed

andrewsu closed this as completed Nov 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consolidate results for multi-ID query nodes #331

consolidate results for multi-ID query nodes #331

andrewsu commented Oct 21, 2021

colleenXu commented Oct 21, 2021

ariutta commented Oct 21, 2021

andrewsu commented Oct 22, 2021

ariutta commented Oct 27, 2021 •

edited

Loading

andrewsu commented Oct 27, 2021

ariutta commented Oct 27, 2021

ariutta commented Nov 3, 2021

andrewsu commented Nov 3, 2021

andrewsu commented Nov 4, 2021

consolidate results for multi-ID query nodes #331

consolidate results for multi-ID query nodes #331

Comments

andrewsu commented Oct 21, 2021

colleenXu commented Oct 21, 2021

ariutta commented Oct 21, 2021

andrewsu commented Oct 22, 2021

ariutta commented Oct 27, 2021 • edited Loading

andrewsu commented Oct 27, 2021

ariutta commented Oct 27, 2021

ariutta commented Nov 3, 2021

andrewsu commented Nov 3, 2021

andrewsu commented Nov 4, 2021

ariutta commented Oct 27, 2021 •

edited

Loading