-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The ContentHash implementation for std::Option is inconsistent with the documentation #3051
Comments
I should add, if we make this change enums that implement |
This commit also updates the documentation regarding how ContentHash should be implemented for enums to address the concerns raised in #3051.
FYI, @Ralith (who wrote the current version of the macro) |
I'm not sure this is an issue with the current macro actually since I don't think it supports enum types. It's only an issue for the hand coded implementation for std::Optional. That gives me an idea actually. We could implement the proc macro per the "spec" and just allow the divergence for std::Optional. I don't think there should be any ill effects from allowing this divergence unless we need a guarantee that different enum types with the same layouts and contents will have equivalent hashes. |
iirc, changing the |
RE: #3051 (comment), commit 2447dfe suggests that you're right and changing the way a type hashes won't break anything. Perhaps we should document above the trait that the hash value should be computed once and persisted and not used for integrity checking or re-verification. Assuming this is correct, then I'd suggest we just change the implementation for This is probably a question for @Ralith, but I'm wondering why we can't simply use the built in |
This also affects the |
It looks like the impl for |
The main thing is that If we can't use off-the-shelf |
Also, +1 to normalizing the assorted enum impls. |
I'll mail a change to fix the u8 / u32 issue first, and then replace the custom impls with the macro afterwards. How important is the stability and portability currently? Portability seems broken since we're using the hardware endianness (although most systems are le). It sounds like consistency isn't that important currently either since we can apparently change the hash impl for enums without breaking anything. Edit: I'm not suggesting we use |
At minimum, the hash generated by using the same jj codebase should be stable across platforms. Otherwise the test would break. In addition to that, it's probably better to not change the output unless necessary (to save disk space.)
Don't we use |
The `ContentHash` documentation specifies that implementations for enums should hash the ordinal number of the variant contained in the enum as a 32-bit little-endian number and then hash the contents of the variant, if any. The current implementations for `std::Option` and `MergedTreeId` are non-conformant since they hash the ordinal number as a u8 with platform specific endianness. Fixes #3051
The `ContentHash` documentation specifies that implementations for enums should hash the ordinal number of the variant contained in the enum as a 32-bit little-endian number and then hash the contents of the variant, if any. The current implementations for `std::Option` and `MergedTreeId` are non-conformant since they hash the ordinal number as a u8 with platform specific endianness. Fixes #3051
As far as I can tell this is only used for the impl for I mailed out #3061 to fix this in the current implementations, which will be superseded by the proc macro once it's submitted. @yuja I think your explanation makes sense. If we didn't have consistency and portability the snapshot tests would break on other platforms even if jj would continue to work fine for actual repos. |
The `ContentHash` documentation specifies that implementations for enums should hash the ordinal number of the variant contained in the enum as a 32-bit little-endian number and then hash the contents of the variant, if any. The current implementations for `std::Option` and `MergedTreeId` are non-conformant since they hash the ordinal number as a u8 with platform specific endianness. Fixes #3051
All multi-byte integers' implementations should be calling
These are currently (erroneously) using |
Right.. there's only a single byte so the platform endianness doesn't matter. |
…d RemoteRefState The `ContentHash` documentation specifies that implementations for enums should hash the ordinal number of the variant contained in the enum as a 32-bit little-endian number and then hash the contents of the variant, if any. The current implementations for `std::Option`, `MergedTreeId`, and `RemoteRefState` are non-conformant since they hash the ordinal number as a u8 with platform specific endianness. Fixes #3051
The documentation for the
ContentHash
trait states that:This does not match the implementation for
std::Option
, which actually hashes the enum variant's ordinal number asu8
, presumably with architecture specific endianness.https://github.com/martinvonz/jj/blob/2e64bf83fd9b8abf4c9880482ea4ce19492f3139/lib/src/content_hash.rs#L76-L86
In
state.update(&[0])
, the&[0]
is inferred as a&[u8]
due to the function signature ofdigest::Update.update)
.I'm not sure what the implication is of changing the implementation after the fact. Presumably things will break. I think therefore we should update the documentation to say that the ordinal number should be hashed as a LE u8, and we should write the implementation as
state.update(&0.as_le_bytes())
to ensure portability.The text was updated successfully, but these errors were encountered: