Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consolidate results for multi-ID query nodes #331

Closed
andrewsu opened this issue Oct 21, 2021 · 9 comments
Closed

consolidate results for multi-ID query nodes #331

andrewsu opened this issue Oct 21, 2021 · 9 comments

Comments

@andrewsu
Copy link
Member

This ticket is a refinement of the results-building behavior defined in #164. Previously we did not address the case where the TRAPI specifies multiple allowed IDs for a single node. For example, consider this query (also pasted as the Example query below), that asks BTE to explain the relationship between 6 different-but-related drugs in n00 to 3 different-but-related diseases in n02. In the BTE results rendered in ARAX (see BTE screenshot), there are three different results that differ only by the three diseases in n02. In this case, the desired behavior would be to combine those three entries into a single result object. Further down below in the results list, there are also other result objects that correspond to the different drugs in the query -- those also should be consolidated into a single result object.

The rationale for this change is that the "result" that the user likely cares about is the gene in n01, and the user will likely want to see all the supporting evidence for that gene. So, a gene that is connected to multiple drugs in n00 and multiple disease in n02 would be viewed very differently than a gene connected to only one drug/disease.

Example query:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01"
        },
        "e01": {
          "subject": "n01",
          "object": "n02"
        }
      },
      "nodes": {
        "n00": {
          "ids": [
            "DRUGBANK:DB00215",
            "DRUGBANK:DB01175",
            "DRUGBANK:DB00472",
            "DRUGBANK:DB00176",
            "DRUGBANK:DB00715",
            "DRUGBANK:DB01104"
          ],
          "categories": ["biolink:SmallMolecule"]
        },
        "n01": {
          "categories": ["biolink:Gene"]
        },
        "n02": {
          "ids": ["DOID:5844", "DOID:1936", "DOID:3393"],
          "categories": ["biolink:Disease"]
        }
      }
    }
  }
}

BTE screenshot
image

@colleenXu
Copy link
Collaborator

note that this would be change compared to previous reasoning...which was to bind only 1 entity per qnode...

perhaps this is a "special" case where more than 1 ID / entity is supplied in 1 qnode...

@ariutta
Copy link
Collaborator

ariutta commented Oct 21, 2021

What should the JSON for the consolidated results assembly look like?

@andrewsu
Copy link
Member Author

starting with the BTE result here: https://arax.ncats.io/api/arax/v1.2/response/db1e8d54-3d3b-469e-9c8b-caee7e355064 (part of which is shown the screenshots above), a consolidated record from that might look something like the JSON in https://gist.github.com/andrewsu/bb35aaccbecca4a0f697e33e5570c09c, which would be visualized like this:

image

@ariutta
Copy link
Collaborator

ariutta commented Oct 27, 2021

My understanding for this example is that each of the different-but-related drugs must have a record (returned from at least one of the external APIs) that connects it with the gene. In other words, if acetaminophen were one of the different-but-related drugs, it wouldn't be enough that fluoxetine had a record connecting it with INS. If acetaminophen didn't have its own record connecting it with INS, it shouldn't be included in the consolidated drugs node. Does this sound right?

Stated another way: if both fluoxetine and acetaminophen were in the list of different-but-related drugs in the query, but only fluoxetine had a record connecting it with INS, then acetaminophen shouldn't be included in the consolidated drugs node in the results, right?

@andrewsu
Copy link
Member Author

yes, correct. unless there was an edge connecting acetaminophen to INS, acetaminophen should not appear in the node_bindings section of the result.

@ariutta
Copy link
Collaborator

ariutta commented Oct 27, 2021

In our group meeting, we decided this consolidation will at least for now only happen for lists of ids for nodes. (If needed, we may consider consolidation by node categories and edge predicates in the future, but those may also never be needed.)

@ariutta
Copy link
Collaborator

ariutta commented Nov 3, 2021

I wanted to verify -- we could have ids lists for different nodes, and those nodes could be of the same type. So if we had two disease nodes, they could each have ids lists, and those lists could be identical, partially intersecting or non-intersecting, right?

@andrewsu
Copy link
Member Author

andrewsu commented Nov 3, 2021

yes, I think that's right. But in practice, those lists will be non-intersecting 99% of the time. So if throwing an error where multiple nodes of the same type have any overlapping IDs significantly simplifies the implementation, I think that's totally fine...

@andrewsu
Copy link
Member Author

andrewsu commented Nov 4, 2021

Closing this ticket as a special case of #341. If the intent is to consolidate results when multiple IDs are specified, then the query should explicitly state that using the is_set parameter. So the behavior described above should be performed if and only if the query looks like this:

{
  "message": {
    "query_graph": {
      "edges": {
        "e00": {
          "subject": "n00",
          "object": "n01"
        },
        "e01": {
          "subject": "n01",
          "object": "n02"
        }
      },
      "nodes": {
        "n00": {
          "ids": [
            "DRUGBANK:DB00215",
            "DRUGBANK:DB01175",
            "DRUGBANK:DB00472",
            "DRUGBANK:DB00176",
            "DRUGBANK:DB00715",
            "DRUGBANK:DB01104"
          ],
          "categories": ["biolink:SmallMolecule"],
          "is_set": true
        },
        "n01": {
          "categories": ["biolink:Gene"],
          "is_set": false
        },
        "n02": {
          "ids": ["DOID:5844", "DOID:1936", "DOID:3393"],
          "categories": ["biolink:Disease"],
          "is_set": true
        }
      }
    }
  }
}

@andrewsu andrewsu closed this as completed Nov 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants