-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stand up "creative DTD" endpoint #1818
Comments
I'm thinking that it would easier to implement this with a suitable "knowledge type" flag/constraint (#1815) on |
@edeutsch I was imagining a separate endpoint as the output format (and input that triggers it) may change, and figured it would be nice to keep a separate endpoint to fiddle with. I do like your idea of indicating what type of result you want by using flags/constraints. I figure, though, that it will be much faster to implement this one type of query via an operation like |
Oh, and another thing @edeutsch: we're going to host some clinical data working group workflows, so I thought we could use the normal route (and query graph interpreter) for that. I was also going to ask during the next AHM if people are cool with us listing this endpoint in the manuscript Chunyu is publishing. Basically for those who want to see it in action, but don't want to learn ARAXi/full TRAPI/Translator to use it. |
Hi @finnagin, I've written some scripts to run my model and uploaded them with the necessary data to the Here are some statistics for generating the top 20 paths for the top 50 drugs predicted to treat Alkaptonuria (MONDO:0008753). The results are here. Memory cost: 40 - 50 GB |
Thanks @chunyuma ! Does the model usually take that much ram to make predictions? I don't think arax.ncats.io will have enough extra ram to run that so we might need to spin up another ec2 instance instead of running as a separate service on arax.ncats.io. |
@finnagin, right. It generally needs that ram to make predictions because it needs to read in some pre-training embeddings and the whole RTX-KG2c. Perhaps we might need the thoughts of @edeutsch, @saramsey, or @dkoslicki to see if it is worthy to set up another ec2 instance for this model. |
running it on arax.ncats.io itself seems risky because the container itself has a limit of around 55 GB. |
From meeting with Amy:
|
Issue: nodes returned by name from the model |
See brain dump file for current state and what to do moving forward |
Multiprocessing may be a red herring: it looks like Finn's code in |
… for today. Run test_ARAX_infer -k test_with_qg to see what's wrong
…w probably_treats edge when a treats edge already exists in the QG) #1818
…precisely, replace it with biolink:NamedThing) since the model can return quite the variety of categories, and does not respect the biolink traversal up the heirarchy #1818
@amykglen I think I've fixed everything, as all tests appear to be passing now. I had to do some jiggering with node categories, which nodes and edges are marked as LMK if things look good to you |
…s provided. Fix leftover hardcoded curie #1818
…at's not yet implemented, so mark as should fail #1818
… done after GC has closed the log #1818
awesome, yep, things are looking good to me! thanks for figuring out the I'm planning to work on handling multi-qedge inferred queries over the next few days, but it may take me a little time as that makes things quite a bit more complex in Expand (having to merge answers/QGs and etc.) I'm thinking I'll do that in a branch off of random question: does XDTD do any |
Sounds good; mixed inferred and lookup edges is ahead of the curve as only the template (single inferred edge) is required by Tuesday, so there's plenty of time to do mixed knowledge types Re: subclass reasoning, no, it only does the inference for the exact curie supplied. |
So I am ready/trying to deploy this for dev/testing, but seeing this error:
I'm hoping @amykglen or someone can update the central configv2? and that should fix it? |
Yes, an updated to |
the authoritative |
ok, the authoritative |
note that when rolling out to prod we'll have to edit the |
okay, we are deployed to /test and /beta. The endpoints pass our basic test query. |
Tested and is looking good! I will want to make some changes later, (ala #1862), but I think it's fine to roll out to all endpoints. |
okay, will do, even production? |
Can we do this together at the hackathon tomorrow? |
Yup, considering the UI team is expecting creative results, we should roll it out everywhere |
sure, sounds good to me! |
Deployed everywhere, so closing |
@chunyuma will create a class/method/script/function that has the following structure:
Input: A single disease CURIE and two integers M & N
output: the top M drugs predicted to treat the disease, along with N explanation paths for each drug
This will be using his reinforcement learning model.
@finnagin will stand up an endpoint to arax.ncats.io (with a name something like "CreativeDTD") that will:
Take as input a TRAPI query structured like:
(x)-[r]-(y)
wherex
isbiolink:ChemicalEntity
(or any of its descendants),r
is any biolink relationship (effectively ignoring the relationship type) andy
isbiolink:DiseaseOrPhenotypicFeature
(or any of its descendants). Everything else will be ignored (including the workflow portion of TRAPI).As output, it will give a standard TRAPI response. The only nuance here is that the paths that Chunyu's method returns can have variable length: anywhere from 1 to 3 hops. As such, the query graph associated with this may need to be something like:
(x)-[r_opt1]-(y)
(x)-[r_opt2]-()-[r_opt2]-(y)
(x)-[r_opt3]-()-[r_opt3]-()-[r_opt3]-(y)
Or something similar to communicate 1 to 3 hops. Note that the current constraint of
expand
requiring at least one non-optional edge shouldn't matter here asexpand
will not be used.Timeline: Preliminary implementation by May 3, production ready by May 31. LMK if this timeline is reasonable (of course, the earlier the better, but there are other priorities each of you have as well).
The text was updated successfully, but these errors were encountered: