Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ztoc v1.0 #465

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

Add ztoc v1.0 #465

wants to merge 1 commit into from

Conversation

Kern--
Copy link
Contributor

@Kern-- Kern-- commented Mar 3, 2023

This needs a lot of work to be mergeable, but I wanted to discuss the direction before putting more effort into it. What do you all think?

Issue #, if available:
N/A

Description of changes:
This change copies our existing ztoc flatbuffer definition to ztoc09.fbs and removes build_tool_identifier from ztoc.fbs.

The problem I'm trying to solve is that build_tool_identifier is a versioned string (e.g. AWS SOCI CLI v0.1) which means that different versions of the tool produce different outputs for the same inputs. Removing the field is a breaking change so the thought was to make that breaking change for ztoc 1.0, but keep the original definition in SOCI so that existing v0.9 ztocs still work. The only way to do that is to duplicate the ztoc serialization logic.

Alternatively, we could keep the field and remove the version (which still produces different output from different tools, even if they use the same library) or we could mark the field deprecated and just ignore it.

If we want ztoc to eventually live outside of SOCI, I think it makes sense to try to have a clean v1.0 that doesn't have deprecated fields. If we accept this change, then when we eventually split off ztoc, the v0.9 definitions would stay in SOCI and only the v1.0 logic would be split. That way we get backwards compatibility in SOCI and a clean ztoc package. That might artificially restrict the ztoc API because we would need a way for an external package (in this case SOCI) to serialize and deserialize a ztoc.

Testing performed:
N/A

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Signed-off-by: Kern Walster <[email protected]>
@Kern-- Kern-- requested a review from a team as a code owner March 3, 2023 18:23
@@ -0,0 +1,209 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has the functions that used to be called flatbufToZtoc and ztocToFlatbuffer. The only change is all the methods are suffixed with 09 now.

Copy link
Contributor

@djdongjin djdongjin Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the above change I mentioned early, we also only need a single ztoc_marshaller.go that has:

  • ztocToFlatbufZtoc
  • flatbufZtocToZtoc
  • flatbufZtoc09ToZtoc

(3 instead of 4 is because we really don't need a ztocToFlatbufZtoc09, since we don't want to produce old flatbuf anymore; we need flatbufZtoc09ToZtoc because we need to be able to unmarsha old data to a Ztoc that without buildertool).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we do want ztocToFlatbuffer09 in the short term because we need to give some time for snapshotters to update before we produce ztoc 1.0 by default and in the long term because we're working with content addressable storage so it's important that roundtripping bytes -> ztoc -> bytes produces exactly the same bytes as before.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds like the BuildToolIdentifier will exist in the Ztoc go struct forever?

Unless we have a Ztoc09 struct and Ztoc struct in our go code, and the Ztoc will be moved to the standalone Ztoc library in the future while Ztoc09 stays in SOCI. Because otherwise it doesn't make sense in the standalone Ztoc lib the Ztoc struct has a deprecated BuildToolIdentifier field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. To get around this, I think we can just hard code the value or mark the field deprecated. Hardcoding should mean we continue to produce the same indices forever. Deprecating means everything gets new digests after the change (but at least the old ones still work).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean we remove the field in flatbuf, but keep the field in our Ztoc struct (and comment that this field is deprecated; or hardcode this value)?

I think to really "have a clean v1.0 that doesn't have deprecated fields", we need to remove the field in both flatbuf (to work around we make a 09 flatbuf copy now) AND Ztoc struct (e.g., this field also shouldn't appear in Ztoc struct, either as deprecated or hardcoded, in the standalone ztoc lib.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean we have 1 Ztoc struct without the field and we hardcode the value that goes into the flatbuffer in ztocToFlatbuffer09. Thinking a little deeper, we have to hardcode it to what we currently have (which is ”AWS SOCI CLI v0.1”) to make sure that re-serializing produces the same digests.

@@ -0,0 +1,190 @@
package ztoc
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file has the modified flatbufToZtoc and ztocToFlatbuffer. The only change is that I removed all of the references to build_tool_identifier.

@Kern-- Kern-- marked this pull request as draft March 3, 2023 18:26
Copy link
Contributor

@djdongjin djdongjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll look at the change shortly, but I have a general question.

Maybe I'm wrong, but I remember the reason to use flatbuffer is for its good forward/backward compatibility. But so far, I didn't see how that benefit us? (e.g. here to deprecreate a field, we need to make a copy). This seems we're handling forward/backward by ourselves through our code instead of any protocols.

Can we utilize the forward/backward functionality brought by these protocols, regardless protobuf or flatbuf?

For example, I remember protobuf fields are numbered, so you can delete the deprecated field directly but you can not reuse the number even that field is removed/deprecated. Similarly, flatbuf seems have the deprecated keyword for similar purpose.

If adding deprecated to this flatbuf line can resolve the compatibility issue and enable us to remove the field in SOCI code, I think it's worthwhile to do so, even we need to keep the single line in the fbs file (tag the field as deprecated)

ztoc/fbs/ztoc09.fbs Show resolved Hide resolved
@Kern--
Copy link
Contributor Author

Kern-- commented Mar 3, 2023

Maybe I'm wrong, but I remember the reason to use flatbuffer is for its good forward/backward compatibility. But so far, I didn't see how that benefit us? (e.g. here to deprecreate a field, we need to make a copy). This seems we're handling forward/backward by ourselves through our code instead of any protocols.

Yes. This is not the normal way to handle this even with flatbuffers. The reason for doing this is specifically to get a clean v1.0 that can exist outside of SOCI. If we want ztocs to be useful in general, I think it hurts our case to have a v1.0 with deprecated fields.

@djdongjin
Copy link
Contributor

djdongjin commented Mar 3, 2023

But still, if we want to completely remove this field in 1.0, we can still remove this deprecated field right?

The only different is that:

Now:

"keep 2 fbs files: one with the field, and one without the fileld" v.s. "make the field deprecated"

1.0:

"only keep 1 fbs file w/o field" (remove the other file) v.s. "remove the deprecated field"

@@ -0,0 +1,50 @@
namespace v09;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change this to ztoc09?

I feel it's hard to figure out that ztoc and v09 are the same kind of folders (except version)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will change.

Comment on lines +41 to +48
table Ztoc {
version : string; // The version of the Ztoc in format <major_version>.<minor_version>, e.g. 1.0
build_tool_identifier : string;
compressed_archive_size : long;
uncompressed_archive_size : long;
toc : TOC;
compression_info : CompressionInfo;
}
Copy link
Contributor

@djdongjin djdongjin Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to just add a new table Ztoc09 (within the same zotc.fbs instead of a completely new ztoc09.fbs) that without the builder_tool_identifier field? This can reduce a lot of duplicated code. And we're not changing anything else (e.g. table CompressionInfo/TOC/etc)

So the workflow will be:

for new data:

Ztoc struct (defined by SOCI) <-> Ztoc struct (generated by flatbuf from Ztoc flatbuf table) <-> Ztoc flatbuf bytes

for old data (only unmarshal because we don't want to produce new data in old definition, i.e., Ztoc09):

Ztoc09 flatbuf bytes -> Ztoc09 struct (generated by flatbuf from Ztoc09 flatbuf table) -> Ztoc struct (defined by SOCI / just need to remove the builder tool field in Ztoc09 struct).

This also make it clean (only a new Ztoc09 due to removal of build_tool_identifier) and clear(no changes on other tables like CompressionInfo/etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we want these in the same flatbuffer definition because the goal is to keep v0.9 inside soci but move v1.0 to another repository eventually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ztoc09 could be defined in a new file that includes this file.

@@ -0,0 +1,209 @@
/*
Copy link
Contributor

@djdongjin djdongjin Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the above change I mentioned early, we also only need a single ztoc_marshaller.go that has:

  • ztocToFlatbufZtoc
  • flatbufZtocToZtoc
  • flatbufZtoc09ToZtoc

(3 instead of 4 is because we really don't need a ztocToFlatbufZtoc09, since we don't want to produce old flatbuf anymore; we need flatbufZtoc09ToZtoc because we need to be able to unmarsha old data to a Ztoc that without buildertool).

compression_info : CompressionInfo;
}

root_type Ztoc;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The common contents of ztoc and ztoc09 could go in a third file that the two main files include, reducing duplication. Then ztoc and the third file could be moved to an external repo later and just the smaller ztoc09 kept here. Example: https://gist.github.com/sparr/eeb7e407220d8e8a9451312c1dd08616

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would this work if ztoc09.fbs existed here and ztoc_types.fbs existed in another git repository? Would we have to make it a submodule to get the import path right?

@djdongjin
Copy link
Contributor

also we need to have necessary unit (or integration) tests to test those expected conversions.

@Kern-- Kern-- mentioned this pull request Mar 9, 2023
Comment on lines +33 to +37
if ztoc.Version == "0.9" {
flatbuf, err = ztocToFlatbuffer09(ztoc)
} else {
flatbuf, err = ztocToFlatbuffer(ztoc)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use switch here and return err in default case? I think that makes it more clear and won't have wrong value by accident (if happens, we'll have error).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants