Add source column to merged context CSVs #52

cthoyt · 2023-10-20T08:44:13Z

Closes #51

This PR adds an additional field to the PrefixExpansion class that can be optionally annotated with the source from which the class was instantiated. This is then used during the Context.combine() function such that when the merged and merged.oak contexts are generated, each record gets annotated with the simple context from which it came.

This PR also runs the ETL pipeline to regenerate the merged prefix maps, now with the sources annotated in the merged files. The new source column does not affect reading in any way.

Interestingly, this sheds some light on #49 - now we know that the WIKIDATA prefix comes from the Bioregistry and wd from Prefix.cc.

Closes linkml#51

cthoyt · 2023-10-20T08:53:48Z

src/prefixmaps/data/merged.csv

-merged,webbox,http://webbox.ecs.soton.ac.uk/ns#,canonical
-merged,WEBELEMENTS,https://www.webelements.com/,canonical
-merged,webservice,http://www.openlinksw.com/ontology/webservices#,canonical
-merged,webtlab,http://webtlab.it.uc3m.es/,can


See here - WIKIDATA comes from the Bioregistry

The prefix in bioregistry is wikidata, not WIKIDATA right?

correct, but the processing in this repo uppercases the bioregistry prefixes

right; so should the source be prefixmaps?

I don't consider prefixmaps to be its own registry, this is just a transform on a Bioregistry prefix. Given #48 and biopragmatics/bioregistry#969, it should be more obvious how all uppercase prefix synonyms get in.

If we were to change the "source" annotation to be "prefixmaps" from content derived from the Bioregistry, then annotating the source would be sort of self-defeating

@sierra-moxon so to try and explain again, the point of the source column is to help track down where a given expansion came from. In this case, even though a transformation was done on it, it's still from the Bioregistry.

how about we call it "expansion_source" ?

@cthoyt - did you see that this was "approved" and had this question for you? Please merge as you see fit, but consider being more explicit in the naming of this column.

I did not see this, thank you for the ping. I will update that as you suggested :)

Done in f263cb8

cthoyt · 2023-10-20T08:54:06Z

src/prefixmaps/data/merged.csv

-merged,webbox,http://webbox.ecs.soton.ac.uk/ns#,canonical
-merged,WEBELEMENTS,https://www.webelements.com/,canonical
-merged,webservice,http://www.openlinksw.com/ontology/webservices#,canonical
-merged,webtlab,http://webtlab.it.uc3m.es/,can


See here - wd comes from Prefix.cc

cthoyt · 2023-11-14T07:55:14Z

@sierra-moxon please let me know if there's anything you'd like changed here / what the way forward is

cthoyt · 2023-11-21T13:36:53Z

@sierra-moxon can we please merge this? It will also solve @glass-ships' problem in #55

cthoyt added 2 commits October 20, 2023 10:38

Include source in merged context files

d385831

Closes linkml#51

Run ETL

cb8284f

cthoyt requested a review from sierra-moxon October 20, 2023 08:52

cthoyt commented Oct 20, 2023

View reviewed changes

cthoyt added 3 commits October 20, 2023 10:55

Update context.py

a37144f

Merge remote-tracking branch 'upstream/main' into add-source

7406f1a

Update again

1100d4c

sierra-moxon approved these changes Nov 14, 2023

View reviewed changes

cthoyt added 2 commits November 21, 2023 18:17

Update name of field

f263cb8

Merge remote-tracking branch 'upstream/main' into add-source

f79de4d

sierra-moxon merged commit b8a2bbd into linkml:main Nov 21, 2023
5 checks passed

cthoyt deleted the add-source branch November 21, 2023 18:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add source column to merged context CSVs #52

Add source column to merged context CSVs #52

cthoyt commented Oct 20, 2023

cthoyt Oct 20, 2023

sierra-moxon Nov 6, 2023

cthoyt Nov 6, 2023

sierra-moxon Nov 6, 2023

cthoyt Nov 6, 2023 •

edited

Loading

cthoyt Nov 14, 2023

sierra-moxon Nov 14, 2023

sierra-moxon Nov 21, 2023

cthoyt Nov 21, 2023

cthoyt Nov 21, 2023

cthoyt Oct 20, 2023

cthoyt commented Nov 14, 2023

cthoyt commented Nov 21, 2023

Add source column to merged context CSVs #52

Add source column to merged context CSVs #52

Conversation

cthoyt commented Oct 20, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cthoyt Nov 6, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cthoyt commented Nov 14, 2023

cthoyt commented Nov 21, 2023

cthoyt Nov 6, 2023 •

edited

Loading