Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Further restrict license identifier/keywords in JSON Schema? #937

Closed
m-mohr opened this issue Jan 2, 2021 · 4 comments
Closed

Further restrict license identifier/keywords in JSON Schema? #937

m-mohr opened this issue Jan 2, 2021 · 4 comments
Assignees
Labels
json schema prio: must-have required for release associated with question
Milestone

Comments

@m-mohr
Copy link
Collaborator

m-mohr commented Jan 2, 2021

I started crawling the STAC Collections for STAC Index. The issue I faced most often with collections was that people just dump everything into the license field... Some do HTML, some do full license texts etc. In the spec we say it should be an SPDX license identifier, proprietary or various. I myself had made the license field in the database max. 50 chars, which would fit the longest spdx identifier (BSD-3-Clause-No-Nuclear-License-2014, 37 chars) and run into problems with storing those. Of course I can fix that by just not storing license information for those collections breaking the rules, but maybe we should also include some further restrictions in the JSON Schema? For example a regexp that only allows a-z, A-Z, 0-9, dots (.) and dashes (-) with a maximum of x characters (e.g. 37 or some more). That should allow all values allowed and report all those invalid license "identifiers".

@m-mohr m-mohr added this to the 1.0.0-beta.3 milestone Jan 2, 2021
@m-mohr m-mohr self-assigned this Jan 2, 2021
@m-mohr
Copy link
Collaborator Author

m-mohr commented Jan 2, 2021

Further information regarding the license that are considered invalid here could be added later, see the related issue #836.

@m-mohr m-mohr changed the title Further restrict license identifier in JSON Schema? Further restrict license identifier/keywords in JSON Schema? Jan 3, 2021
@m-mohr
Copy link
Collaborator Author

m-mohr commented Jan 3, 2021

Another thing I found out is that there are some catalogs that use keywords as a dump for long sentences and classifications. I had quite a number of individual keywords that are >50 chars, although I expected that to be enough. Should that also be restricted to a sensible length? Even 100 chars is not enough in some cases.

Example that doesn't feel like a keyword anymore. More like multiple "keysentences":

Horizontal resolution at measurement scale: 3 km; Horizontal resolution at observation scale for Rayleigh/Mie: 87/10 km"

Both overly long licenses and keywords are not very nice for displaying purposes (STAC Browser and others), searching and indexing (STAC Index and others).

(From my side: I'll likely allow those long keywords in STAC Index, but will reject overly long non-spdx licenses).

@m-mohr
Copy link
Collaborator Author

m-mohr commented Jan 4, 2021

Another finding: Some people put DOI URIs in sci:doi instead of putting them into the links and just providing the DOI.

@cholmes cholmes added the prio: must-have required for release associated with label Jan 4, 2021
@m-mohr
Copy link
Collaborator Author

m-mohr commented Feb 2, 2021

License identifiers are in fact already validated and thus people seem to just not validate their stuff. Same for the DOI field.

I can't really come up with a reasonable limit for keywords as every length limit I put there seems arbitrary. Thus, I leave them as it is.

No further ToDos here.

@m-mohr m-mohr closed this as completed Feb 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
json schema prio: must-have required for release associated with question
Projects
None yet
Development

No branches or pull requests

2 participants