-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade to Biolink 2.1 (KG2.7.1) - due Aug. 11 #1593
Comments
well the synonymizer build appeared to go well - artifacts are uploaded to arax.ncats.io at only 56 problems listed in going to do some spot-checking tomorrow to make sure things look ok. |
terrific! |
things seem good in the synonymizer based on some spot-checking... although wondering about one thing - is it normal for the
|
It could happen if the concept is not known to the SRI Node Normalizer, but that is not the case here. Looks like a bug. What happens if you run this:
in the location where the SRI NN and NodeSyn was built? |
it returns this:
|
hmm, okay, definitely seems like a bug. do you use those fields for anything? |
no, I don't use them. so unless it's indicative of a larger issue, not a problem for the KG2c build. |
so I think we're at a point now where @chunyuma can start building COHD/DTD databases from KG2.7.1. all necessary files should be on arax.ncats.io at: I'm going to work next on loading KG2.7.1c into Plover (may require a little bit of tweaking to remove mixins from the |
I suppose this is an important point that we need resolve before we move forward: what do we do about these pesky mixins? |
At the moment the category_manager skips biolink:Entity. But I think it is true that all mixins come after this in the list. So it would be an easy tweak to STOP processing at biolink:Entity. This would exclude biolink:Entity and mixins I think. Do we want to do that? |
ah, but now Chris has responded that we want mixins in there. From Slack: Yes, I think we want to include mixins in ancestors / descendents. One of the main reasons to have these mixins is to make querying easy by grouping similar concepts that are not direct ancestors in the model. That relies on applying ancestor logic to mixins.... |
Hi @finnagin, @dkoslicki and @jaredroach, To build the COHD database and DTD model/database for kg2.7.1c based on the Biolink Model 2.1, I plan to use the |
well surprisingly it worked to just load KG2c into Plover as is! (with mixins included as labels.) it did increase baseline memory usage by a decent amount, but not enough for it to cause a problem I think. down the road I may tweak the code a bit to decrease its memory usage, but it seems to be totally fine for now. |
Terrific! It looks like @chunyuma has several to dos in the list, but otherwise, do we just sit tight until you're back and then make the switch? |
I've already started building COHD and DTD now. Both of them should be able to complete next week except for DTD probability precomputed database which might need more time. |
yeah, I think that sounds fine to wait to roll it out until I'm back - but in the meantime other codeowners could go in and fix any usages of and if anyone wants to test their changes, they just need to:
|
I just messaged Chunyu about this in slack but posting here as well so others are aware it was addressed: the |
Do we want to try to implement a local way of getting descendents? I'm asking because we currently we cannot run the DTD tests on Travis because of it's lack of a cache for when it hits the SRI and I'm worried having SRI calls for cohd too might make those tests impossible to run as well. |
actually, I just remembered I already have a local method for category descendants in the RTX/code/ARAX/ARAXQuery/Expand/kp_selector.py Line 300 in 8c95253
|
I think all tests associated with |
great. for some reason the CHP tests are still failing for me in the
the errors are about connection reset:
are you seeing this as well, @chunyuma? |
It seems like this is a CHP API problem. I will figure it out today. |
@amykglen, I met a similar problem like you got when I tested
I think the problem might come from chp team's server. Right now, when I called their APIs, I met |
Hi @yakaboskic, could you please help us take a look if there is something wrong with CHP APIs? Currently, we can't call the |
Hi @chunyuma, so I am currently working on a server rebuild right now, but should be back up tonight. However, we have depreciated the predicates endpoint. |
Thanks for reply @yakaboskic. Please let us know when it is back. Thanks again! |
I've created a pull request for the new predicate tuples in the NCATS testing repo. @dkoslicki when you get a chance could you approve and merge it? |
Hi @chunyuma, CHP server is back up as of around 12AM EST last night. |
Thanks @yakaboskic! |
Hi @yakaboskic, I tried calling the CHP APIs via
But I got a warning message saying I generated the standard query by using the following function:
|
Hi @chunyuma! Oh no! So... we actually turned off support for our weird muli-hop query structure and have transitioned everyone (we thought) to a fully one hop query structure in order to hopefully help teams better integrate with us. I am sorry that this was not communicated!! So in light of that function you used to build those standard queries, I have made an equivalent python script to build equivalent one hop queries that will answer the same question as above. Basically the change is that we have Gene to Disease, Drug to Disease, Gene to Drug, Drug to Gene, and Gene to Gene edges, and you can specify what we call a predicate proxy (EFO for survival) and predicate context (i.e. more gene or drug curies) but you don't have to specify this if you don't want to (we default). Here is the equivalent build script and I have checked and it works. Here is the code below (and also attached as a txt file, can't upload python here). Please let me know if you have any questions/concerns/issues! And many apologies for any unexpected work this may have caused!
|
Thanks @yakaboskic! I really appreciate your help for figuring out the issue! I will have a try later today based on your code and suggestions. |
@amykglen, thanks to the help of @yakaboskic, both
|
awesome, thanks everyone! I guess after the full DTD rebuild is done you can go ahead and close this issue, @chunyuma. |
Also, thanks @amykglen for developing the
|
associated code changes should go in the
kg2integration
branchfirst update NodeSynonymizer:
version=2.1.0
in Biolink Lookup tool URLonce we have what seems to be a good synonymizer:
record_kg2c_meta_info.py
)rebuild or edit and put copies in
/translator/data/orangeboard/databases/KG2.7.1
on arax.ncats.io:* note: saved this as
config_local.json
, since we want it to be used overconfigv2.json
during testingnote: As databases are rebuilt, the new copy of
config_local.json
will need to be updated to point to their new paths.update ARAX codebase:
biolink:ChemicalSubstance
tobiolink:ChemicalEntity
(lots of tests use this)config_local.json
- must locally setforce_local=True
inARAX_expander.py
to avoid using the old KG2 API)other things:
config_local.json
toconfig_local.json_FROZEN_DO-NOT-EDIT-FURTHER
(any remaining edits to the config file, such as when the DTD build is complete, should be made directly to the masterconfigv2.json
on araxconfig.rtx.ai)The text was updated successfully, but these errors were encountered: