-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support of collections #373
Comments
Following up on today's discussion "use case 3" of collections and enrichment, I maintain that this problem was solved long ago and ARAX implements exactly this with existing TRAPI 1.3 and no change is needed. Here's my example query:
Notably, is_set = true indicates that the list of ids should be treated as a group. And here's the ARAX result for this query: Each result is a disease that is highly connected to that list of proteins (not necesarily all). A higher fraction of that set causes results to bubble to the top, and more edges also cause higher ranking. The set/collection for query is defined by the QNode.ids list and QNode.is_set=true I think this is simple and logical and does everything we need. |
After further thought, I think I agree with @edeutsch here. Originally I was thinking there were two use cases that should be handled separately -- for results merging and for enrichment-based associations. But the query behavior for both is the same, and the enrichment score can be reflected in the results scoring. So I'm on board with |
is_set might be the answer, I agree. But I'm a little unsure how it works. I understand the example that @edeutsch posted above, but I don't really understand the behavior for something like this (@andrewsu is this what you meant by result merging?):
|
This is also valid, but a different use case than we were discussing. In this case, each Result is MONDO:1234 at one end and a disease at the other end, and then a set/collection of proteins that they share in common between them in the middle. Ranking should be something like the results with the most shared proteins appear highest, although there is plenty of room for improvements on the ranking that could take things like the quality of the edges, NGD between the two diseases, etc. into account as well. |
So in the case that I put above every element of n1 in an answer must be attached to both n0 and n2? |
In the ARAX implementation currently, yes. I suppose there might be an opportunity for different implementations to include only partially connected nodes, although I wouldn't recommend it. Seems related to the whole "can you return partial paths" discussion, which I'm not certain we ever really resolved. |
So it seems like there is different behavior for the same construct? If it's a bound node then I do enrichment, but if it's an unbound node then I don't? |
I don't think the behavior needs to be any different whether it is bound or unbound. I suppose it might be, as a refinement decided by the implementer, but I'm think it it would normally be the same. |
Sorry, I might be missing something, but is it up to the server to decide how to implement is_set? It might mean the fully connected, or it might mean partially connected, and that partially connected might mean enrichment or max connectedness, or other versions? |
Until we decide that everyone has to do things the same way, I suppose we're all free to do things a bit differently. Aragorn is doing a whole lot of things differently than ARAX. Our current definition for is_set is this:
So a strict reading means to me that partial connectedness is not permitted (contrary to what I supposed above). It stipulates nothing about how ranking should done, and I'm sure there is diversity in ideas on how ranking is best done in cases like this from enrichment to max connectedness. So until we stipulate how it must be done, there can be diversity. |
Example of a KG with a gene set:
|
Here's my graphical representation of what I think the proposal is. Is this right @vdancik ? |
Example query with an
would result in a following result
where as auxiliary graph is
and a KG is in my previous comment |
So here is a slight update to the picture based on today's discussion. The query predicate is updated. And I depicted Result #1 as one that contains all 5 input genes, but Result #2 is the next best match where 3 of the 5 match. There was some discussion of whether this means AND or OR. or a "soft AND", i.e. "as many as possible". I am thinking that the is_set=true construction is interpreted to mean "as many of the set as possible". More members would mean a higher rank. But sets that don't contain all members are not automatically discarded. But maybe this is not the desired outcome. Additional note: In this scenario, the Query must have knowledge_type: inferred (i.e. "creative mode") How is this different from the sort of thing that COHD already does? |
We should probably document why this isn't good enough: Can we capture all the enrichment statistical metrics in each Result.Analysis.attributes[]? The query predicate "related_to" is tripping us up here. Better to consider a query predicate like "enriched_in" (*does not actually exist yet). Or "participates_in"? |
I have added an “alternate Support Graph” item #5 with the link to the agenda for today’s meeting. Not sure if this was what you wanted, so please let me know if we need other actions.
Terese Camp, PMP
Research Project Manager
Renaissance Computing Institute (RENCI)
University of North Carolina at Chapel Hill
***@***.******@***.***>
From: Eric Deutsch ***@***.***>
Date: Wednesday, January 17, 2024 at 11:46 PM
To: NCATSTranslator/ReasonerAPI ***@***.***>
Cc: Camp, Terese ***@***.***>, Assign ***@***.***>
Subject: Re: [NCATSTranslator/ReasonerAPI] add support of collections (Issue #373)
You don't often get email from ***@***.*** Learn why this is important<https://aka.ms/LearnAboutSenderIdentification>
We should probably document why this isn't good enough:
image.png (view on web)<https://github.com/NCATSTranslator/ReasonerAPI/assets/12707718/6266a81e-40cd-44df-9b67-8386d592ee22>
—
Reply to this email directly, view it on GitHub<#373 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BB5G3HIHAM745ABPTAWXFTDYPCSJPAVCNFSM6AAAAAAQL3GR5SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQOJXG44DSMZVHA>.
You are receiving this because you were assigned.Message ID: ***@***.***>
|
We should add support of collections in TRAPI by adding Boolean property
is_set
toKnowledgeGraph.Node
andQueryGraph.QNode
to indicate that a node represents a collection of entities rather then a single entity.Since there already is
is_set
inQueryGraph.QNode
with somewhat confusing meaning, we should also addcollate
toQueryGraph.QNode
to indicate that nodes in results should be grouped.The text was updated successfully, but these errors were encountered: