Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shift format registry downloads and normalisation to use Siegfried #15

Open
4 tasks
anjackson opened this issue Sep 2, 2022 · 0 comments
Open
4 tasks

Comments

@anjackson
Copy link
Contributor

The current Python implementation is fine, if messy. But for format registries, it's essentially implementing the same thing as Siegfried's roy tool. Rather than keeping this separate tool updated, it could be merged with roy and perhaps modify roy so that it can output the full normalized registry contents as YAML/JSON. This might be quite a lot of work though, and will need to be in Go rather than Python, so probably a long-term goal.

Some of the steps appear to be:

  • add an option to roy inspect so it emits the whole normalised dataset as YAML or similar.
  • add support for all known format registries to roy (FFW, GitHub Linguist, TRiD, ???).
  • modify the wikidata.sig build so the Archiveamatica extensions can be omitted (like -pronom)
  • modify the digipres.org and sentinel systems to run roy to gather the data and aggregate that instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant