You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I started crawling the STAC Collections for STAC Index. The issue I faced most often with collections was that people just dump everything into the license field... Some do HTML, some do full license texts etc. In the spec we say it should be an SPDX license identifier, proprietary or various. I myself had made the license field in the database max. 50 chars, which would fit the longest spdx identifier (BSD-3-Clause-No-Nuclear-License-2014, 37 chars) and run into problems with storing those. Of course I can fix that by just not storing license information for those collections breaking the rules, but maybe we should also include some further restrictions in the JSON Schema? For example a regexp that only allows a-z, A-Z, 0-9, dots (.) and dashes (-) with a maximum of x characters (e.g. 37 or some more). That should allow all values allowed and report all those invalid license "identifiers".
The text was updated successfully, but these errors were encountered:
Another thing I found out is that there are some catalogs that use keywords as a dump for long sentences and classifications. I had quite a number of individual keywords that are >50 chars, although I expected that to be enough. Should that also be restricted to a sensible length? Even 100 chars is not enough in some cases.
Example that doesn't feel like a keyword anymore. More like multiple "keysentences":
Horizontal resolution at measurement scale: 3 km; Horizontal resolution at observation scale for Rayleigh/Mie: 87/10 km"
Both overly long licenses and keywords are not very nice for displaying purposes (STAC Browser and others), searching and indexing (STAC Index and others).
(From my side: I'll likely allow those long keywords in STAC Index, but will reject overly long non-spdx licenses).
I started crawling the STAC Collections for STAC Index. The issue I faced most often with collections was that people just dump everything into the license field... Some do HTML, some do full license texts etc. In the spec we say it should be an SPDX license identifier,
proprietary
orvarious
. I myself had made the license field in the database max. 50 chars, which would fit the longest spdx identifier (BSD-3-Clause-No-Nuclear-License-2014
, 37 chars) and run into problems with storing those. Of course I can fix that by just not storing license information for those collections breaking the rules, but maybe we should also include some further restrictions in the JSON Schema? For example a regexp that only allows a-z, A-Z, 0-9, dots (.) and dashes (-) with a maximum of x characters (e.g. 37 or some more). That should allow all values allowed and report all those invalid license "identifiers".The text was updated successfully, but these errors were encountered: