Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prefix with space character? #1338

Open
matentzn opened this issue Dec 23, 2024 · 3 comments
Open

Prefix with space character? #1338

matentzn opened this issue Dec 23, 2024 · 3 comments

Comments

@matentzn
Copy link
Collaborator

Check this: https://github.com/mapping-commons/sssom-py/pull/568/files#r1886866051

It contains a prefix with a space:

"prefix_synonyms": [
            "MEDLINE",
            "PubMed",
            "PubMed ID",

For various reasons, I think it makes sense to not allow this. For example:

https://github.com/mapping-commons/sssom-py/pull/568/files#r1886866270

http://bioregistry.io/PubMed ID:

Which really doesn't make any sense at all.

I think we should add qc that prevents spaces in prefixes.

@bgyori
Copy link
Contributor

bgyori commented Dec 23, 2024

I think the argument here is backwards: the synonyms listed in Bioregistry aren't meant to suggest that these are allowed or encouraged usages, rather, they reflect what you might encounter in external data sources, and are necessary to be able to automatically map those data sources to standard Bioregistry CURIEs. So, for instance, PubMed ID is very likely in this list because there was some ontology somewhere which used this as a prefix for cross-references.

@matentzn
Copy link
Collaborator Author

I am not sure I agree though with the incorporation of curie or uri prefixes in bioregistry that are not at the very least standard compliant - a curie with a space is not a curie anymore. Tools will just needlessly choke IMO if they see a case like this. Charlie spend thousands of hours fixing malformed curies around dozens of curation projects for that reason!

@bgyori
Copy link
Contributor

bgyori commented Dec 24, 2024

What I'm saying is that the "curation" of malformed CURIE prefixes often comes down to adding a synonym to Bioregistry so that the given non-standard prefix can be mapped automatically to the corresponding Bioregistry record (this is in addition to the standardization of prefixes using e.g., weird capitalization which can be mapped without the need to be explicitly enumerated as synonyms). Still, I see why the space is an issue, there are a few possible solutions to eliminate it, but I think we should hear from @cthoyt first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants