Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Versioning #2

Open
gipert opened this issue Sep 14, 2022 · 5 comments
Open

Versioning #2

gipert opened this issue Sep 14, 2022 · 5 comments

Comments

@gipert
Copy link
Member

gipert commented Sep 14, 2022

Even if we don't expect to change it often, I propose to tag our LH5 format specification for data preservation / reproducibility. We should then store this version as LH5 file metadata.

See also: legend-exp/legend-pydataobj#6

@oschulz
Copy link
Contributor

oschulz commented Sep 14, 2022

Yes, good point. Let's wait a little bit longer though until I've had time to update the writeup and docs build, Ok?

@jasondet
Copy link

What if we simply tag this repo and then specify that data objects should have an attribute, something like rev, set to the revision number in the tag? This would allow for schema evolution of the data_objects.

@oschulz
Copy link
Contributor

oschulz commented Jan 22, 2023

I'm not sure tagging the repo will make a lot of sense, since it's supposed to cover all our data formats. As for LH5, we'll have to see if we need a version number or if "capabilities used" can be inferred from the file (should be the case currently). In any case we should document both older and newer formats in this repo if we have a format evolution, not just the latest format.

@jasondet
Copy link

I worry that requiring “capabilities used to be inferable from the file” would ultimately limit the possibilities for schema evolution.

I agree that documentation of all older formats must be part of newer specs, so that the new spec supports decoding of all available formats

@oschulz
Copy link
Contributor

oschulz commented Jan 22, 2023

I worry that requiring “capabilities used to be inferable from the file” would ultimately limit the possibilities for schema evolution.

We can definitely add an dataformat-version attribute to all datasets, in addition to datatype, defaulting to v1.0 if it's not present. That also means that we don't have to do it until we have a v1.1 or so (but we can, of course). :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants