-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add source column to merged context CSVs #52
Conversation
merged,webbox,http://webbox.ecs.soton.ac.uk/ns#,canonical | ||
merged,WEBELEMENTS,https://www.webelements.com/,canonical | ||
merged,webservice,http://www.openlinksw.com/ontology/webservices#,canonical | ||
merged,webtlab,http://webtlab.it.uc3m.es/,can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here - WIKIDATA
comes from the Bioregistry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The prefix in bioregistry is wikidata
, not WIKIDATA
right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct, but the processing in this repo uppercases the bioregistry prefixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right; so should the source be prefixmaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't consider prefixmaps to be its own registry, this is just a transform on a Bioregistry prefix. Given #48 and biopragmatics/bioregistry#969, it should be more obvious how all uppercase prefix synonyms get in.
If we were to change the "source" annotation to be "prefixmaps" from content derived from the Bioregistry, then annotating the source would be sort of self-defeating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sierra-moxon so to try and explain again, the point of the source column is to help track down where a given expansion came from. In this case, even though a transformation was done on it, it's still from the Bioregistry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about we call it "expansion_source" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cthoyt - did you see that this was "approved" and had this question for you? Please merge as you see fit, but consider being more explicit in the naming of this column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not see this, thank you for the ping. I will update that as you suggested :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in f263cb8
merged,webbox,http://webbox.ecs.soton.ac.uk/ns#,canonical | ||
merged,WEBELEMENTS,https://www.webelements.com/,canonical | ||
merged,webservice,http://www.openlinksw.com/ontology/webservices#,canonical | ||
merged,webtlab,http://webtlab.it.uc3m.es/,can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See here - wd
comes from Prefix.cc
@sierra-moxon please let me know if there's anything you'd like changed here / what the way forward is |
@sierra-moxon can we please merge this? It will also solve @glass-ships' problem in #55 |
Closes #51
This PR adds an additional field to the
PrefixExpansion
class that can be optionally annotated with the source from which the class was instantiated. This is then used during theContext.combine()
function such that when themerged
andmerged.oak
contexts are generated, each record gets annotated with the simple context from which it came.This PR also runs the ETL pipeline to regenerate the merged prefix maps, now with the sources annotated in the merged files. The new
source
column does not affect reading in any way.Interestingly, this sheds some light on #49 - now we know that the
WIKIDATA
prefix comes from the Bioregistry andwd
from Prefix.cc.