-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve synonymization of Technetium Tc-99m Albumin Aggregated #1046
Comments
well this is slightly better merged in the new synonymizer (#2003) - now a CHEMBL node has been added. though the MESH term is still on its own: Cluster for NCIT:C87398 (CHEMBL.COMPOUND:CHEMBL1201522) has 2 nodes:
Cluster for MESH:D013668 has 1 nodes:
kinda funny the SRI doesn't recognize this MESH term, since they do recognize most MESH |
@amykglen can this be closed? |
And @amykglen , it looks like the NGD API endpoint handles this: https://arax.ci.transltr.io/api/arax/v1.4/ui/#/PubmedMeshNgd/pubmed_mesh_ngd |
so I'm seeing three clusters in our latest synonymizer (KG2.9.2c) for Technetium Tc 99m albumin aggregated: the first two appear to arise from the fact that the SRI node normalizer assigns those identifiers to two such separate clusters: but the third is for a KEGG node, which the SRI node normalizer doesn't currently support - so with that one, I think we could do better in our synonymization and assign it to one of the first two clusters. @dkoslicki - that's really interesting that the NGD endpoint does appear to map the name for the CHEMBL node to the main cluster for Technetium Tc-99m Albumin Aggregated, even though that CHEMBL node is still in a separate cluster in the synonymizer. I'm a bit perplexed as to how that would be happening.. I'm not familiar with that endpoint but I thought our NGD only uses the synonymizer to map concepts? but I suppose I'll take it! so in summary - I wrote up an issue in the SRI NN repo about there being two clusters for Technetium Tc 99m albumin aggregated (TranslatorSRI/NodeNormalization#280), but I think it's worth keeping this issue open for now due to the poor synonymization of the KEGG node (which is in our hands, since the SRI NN doesn't appear to support KEGG identifiers currently) |
Thanks for the sleuthing @amykglen ! I'll leave this open and tag as technical debt |
as @edeutsch requested, I traced a couple instances where the local fastNGD database (#729) 'misses' a concept but eUtils doesn't - this is the write-up for one example: NCIT:C87398 (Technetium Tc-99m Albumin Aggregated).
from kg2canonicalized:
so in this case, we can see there's no MESH curie in the
equivalent_curies
, and I confirmed that neither of the equivalent nodes nor their attached edges havepublications
listed in KG2, so it's not surprising fastNGD isn't aware of any PMIDs for this node.but what is interesting is that there is a MESH node in KG2 named "Technetium Tc 99m Aggregated Albumin" (word order is slightly different):
and there are definitely PubMed articles associated with MESH term D013668, so if
NodeSynonymizer
synonymized these two concepts, then the fastNGD system would no longer 'miss' NCIT:C87398/CHEMBL.COMPOUND:CHEMBL1201522.The text was updated successfully, but these errors were encountered: