Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add source column to merged context CSVs #52

Merged
merged 7 commits into from
Nov 21, 2023

Conversation

cthoyt
Copy link
Collaborator

@cthoyt cthoyt commented Oct 20, 2023

Closes #51

This PR adds an additional field to the PrefixExpansion class that can be optionally annotated with the source from which the class was instantiated. This is then used during the Context.combine() function such that when the merged and merged.oak contexts are generated, each record gets annotated with the simple context from which it came.

This PR also runs the ETL pipeline to regenerate the merged prefix maps, now with the sources annotated in the merged files. The new source column does not affect reading in any way.

Interestingly, this sheds some light on #49 - now we know that the WIKIDATA prefix comes from the Bioregistry and wd from Prefix.cc.

@cthoyt cthoyt requested a review from sierra-moxon October 20, 2023 08:52
merged,webbox,http://webbox.ecs.soton.ac.uk/ns#,canonical
merged,WEBELEMENTS,https://www.webelements.com/,canonical
merged,webservice,http://www.openlinksw.com/ontology/webservices#,canonical
merged,webtlab,http://webtlab.it.uc3m.es/,can
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here - WIKIDATA comes from the Bioregistry

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefix in bioregistry is wikidata, not WIKIDATA right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct, but the processing in this repo uppercases the bioregistry prefixes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right; so should the source be prefixmaps?

Copy link
Collaborator Author

@cthoyt cthoyt Nov 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't consider prefixmaps to be its own registry, this is just a transform on a Bioregistry prefix. Given #48 and biopragmatics/bioregistry#969, it should be more obvious how all uppercase prefix synonyms get in.

If we were to change the "source" annotation to be "prefixmaps" from content derived from the Bioregistry, then annotating the source would be sort of self-defeating

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sierra-moxon so to try and explain again, the point of the source column is to help track down where a given expansion came from. In this case, even though a transformation was done on it, it's still from the Bioregistry.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about we call it "expansion_source" ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cthoyt - did you see that this was "approved" and had this question for you? Please merge as you see fit, but consider being more explicit in the naming of this column.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not see this, thank you for the ping. I will update that as you suggested :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in f263cb8

merged,webbox,http://webbox.ecs.soton.ac.uk/ns#,canonical
merged,WEBELEMENTS,https://www.webelements.com/,canonical
merged,webservice,http://www.openlinksw.com/ontology/webservices#,canonical
merged,webtlab,http://webtlab.it.uc3m.es/,can
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See here - wd comes from Prefix.cc

@cthoyt
Copy link
Collaborator Author

cthoyt commented Nov 14, 2023

@sierra-moxon please let me know if there's anything you'd like changed here / what the way forward is

@cthoyt
Copy link
Collaborator Author

cthoyt commented Nov 21, 2023

@sierra-moxon can we please merge this? It will also solve @glass-ships' problem in #55

@sierra-moxon sierra-moxon merged commit b8a2bbd into linkml:main Nov 21, 2023
5 checks passed
@cthoyt cthoyt deleted the add-source branch November 21, 2023 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Include source(s) in merged context
2 participants