Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export datasets catalog as DCAT, document undocumented API, provide metadata export and other standards #27

Open
ivbeg opened this issue Jan 15, 2024 · 2 comments

Comments

@ivbeg
Copy link

ivbeg commented Jan 15, 2024

Description of Feature:

There is no export of the datacatalog and/or documented API and export of the metadata of the single record.
It would be great if:

  • whole data catalog will be available as DCAT 3 https://www.w3.org/TR/vocab-dcat-3/
  • API will be well documented
  • metadata as JSON will be available by link on the each dataset page

It would be even better if Schema.org dataset will be implemented too and OAI-PHM interface will be implemeted.

What value is this feature adding to Source Cooperative?

  1. DCAT support is required to integrate with data.europe.eu and other custom data aggregators and search engines.
  2. Schema.org dataset support is required to be indexed by Google Dataset engine
  3. Documented API is a demonstration of the product quality and team sophistication.
  4. OAI-PHM support required to be indexed by OpenAIRE or BASE scientific outputs search engines.
@cboettig
Copy link

Thanks for raising this @ivbeg , I'd be interested in this too. Though I'm curious how you see this being implemented though -- would the necessary metadata become required fields in the "New repository" page? What about optional / format specific fields? Source.coop seems to deliberately make the 'new repo' process as simple as possible -- I imagine the devs wouldn't want to metadata entry to become a barrier there.

A related option might be for users to provide the necessary metadata in the STAC format, given that STAC is widely used in this space and has close ties to Radiant? STAC Browser already supports rendering metadata to Schema.org for SEO purposes... (of course users could directly upload DCAT3 or schema.org json-ld files to the repo too, though probably few want to do that and even so it probably wouldn't be crawled?)

@ivbeg
Copy link
Author

ivbeg commented Jan 15, 2024

@cboettig DCAT isn't really difficult to implement, it shouldn't take long time to map existing metadata to DCAT fields. Still it's important to document existing Source Cooperative API and metadata. Also DCAT requires only very basic fields, so if it's impossible to extract from existing datasets description then it would be great to review how dataset/repo creation organized and to update it.

Sure, any thematic data catalogs has a lot of additional metadata fields, but you don't need to export all of them, actually most aggregators use only basic information. Still DCAT supports extensions and many EU countries implement them to aggregate country specific metadata from regional and local data catalogs to the national one.

About STAC I could say that it's different since it's well documented and widely used. More likely data search engines would support STAC API harvesting. Schema.org Dataset implementation actually works for Google only. But DCAT implementation works for European data.europe.eu and national data catalogs. If there is STAC server in EU country it could be indexed seamlessly if DCAT and especially DCAT AP will be implemented.

But anyway, I would like to say that implementing public data catalog without public API and/or whole catalog metadata availability is a bad practice. I've reviewed about 40+ data catalogs software during 2023 and API is a key feature, common standards is the second one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants