Skip to content

Commit

Permalink
Update EIP-7495: Compact serialization for Variant[S]
Browse files Browse the repository at this point in the history
Merged by EIP-Bot.
  • Loading branch information
etan-status authored Apr 15, 2024
1 parent 41fb826 commit 589eb30
Show file tree
Hide file tree
Showing 3 changed files with 342 additions and 140 deletions.
25 changes: 16 additions & 9 deletions EIPS/eip-7495.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,31 +118,36 @@ Merkleization `hash_tree_root(value)` of an object `value` is extended with:

### `Variant[S]`

For the purpose of type safety, `Variant[S]` is defined to serve as a subset of `StableContainer` `S`. While `S` still determines how the `Variant[S]` is serialized and merkleized, `Variant[S]` MAY implement additional restrictions on valid combinations of fields.
`Variant[S]` serves as a subset of `StableContainer` `S`. While `S` still determines how the `Variant[S]` is merkleized, `Variant[S]` MAY implement additional restrictions on valid combinations of fields and serialization is optimized for a more compact representation based on these restrictions.

- Fields in `Variant[S]` may have a different order than in `S`; the canonical order in `S` is always used for serialization and merkleization regardless of any alternative orders in `Variant[S]`
- Fields in `Variant[S]` may have a different order than in `S`; this only affects serialization, the canonical order in `S` is always used for merkleization
- Fields in `Variant[S]` may be required, despite being optional in `S`
- Fields in `Variant[S]` may be missing, despite being optional in `S`
- All fields that are required in `S` must be present in `Variant[S]`

Serialization of a specific `Variant` follows a similar scheme as the one for its underlying `StableContainer`, except that the leading `Bitvector` is replaced by a sparse representation that only indicates presence or absence of optional fields `Optional[T]`. Bits for required fields as well as trailing padding bits are not included when serializing `Variant[S]`. If there are no optional fields, the entire `Bitvector` is omitted. While this serialization is more compact, note that it is not forward compatible and that context information that determines the underlying data type has to be indicated out of bands. If forward compatibility is required, the `Variant`'s underlying data SHALL be serialized as defined by the underlying `StableContainer`.

`Variant[S]` is considered ["variable-size"](https://github.com/ethereum/consensus-specs/blob/67c2f9ee9eb562f7cc02b2ff90d92c56137944e1/ssz/simple-serialize.md#variable-size-and-fixed-size) iff it contains any `Optional[T]` or any "variable-size" fields.

```python
# Serialization and merkleization format
# Defines the common merkleization format and a portable serialization format across variants
class Shape(StableContainer[4]):
side: Optional[uint16]
color: uint8
radius: Optional[uint16]

# Valid variants
# Inherits merkleization format from `Shape`, but is serialized more compactly
class Square(Variant[Shape]):
side: uint16
color: uint8

# Inherits merkleization format from `Shape`, but is serialized more compactly
class Circle(Variant[Shape]):
radius: uint16
color: uint8
```

In addition, `OneOf[S]` is defined to provide a `select_variant` helper function for determining the `Variant[S]` to use when parsing `S`. The `select_variant` helper function MAY incorporate environmental information, e.g., the fork schedule.
In addition, `OneOf[S]` is defined to provide a `select_variant` helper function for determining the `Variant[S]` to use when parsing `StableContainer` `S`. The `select_variant` helper function MAY incorporate environmental information, e.g., the fork schedule.

```python
class AnyShape(OneOf[Shape]):
Expand All @@ -156,16 +161,18 @@ class AnyShape(OneOf[Shape]):
assert False
```

The extent and syntax in which `Variant[S]` and `OneOf[S]` are supported MAY differ among underlying SSZ implementations. Where it supports clarity, specifications SHOULD use `Variant[S]` and `OneOf[S]` as defined here.

## Rationale

### What are the problems solved by `StableContainer[N]`?

Current SSZ types are only stable within one version of a specification, i.e., one fork of Ethereum. This is alright for messages pertaining to a specific fork, such as attestations or beacon blocks. However, it is a limitation for messages that are expected to remain valid across forks, such as transactions or receipts. In order to support evolving the features of such perpetually valid message types, a new SSZ scheme needs to be defined.
Current SSZ types are only stable within one version of a specification, i.e., one fork of Ethereum. This is alright for messages pertaining to a specific fork, such as attestations or beacon blocks. However, it is a limitation for messages that are expected to remain valid across forks, such as transactions or receipts. In order to support evolving the features of such perpetually valid message types, a new SSZ scheme needs to be defined. Furthermore, consumers of Merkle proofs may have a different software update cadence as Ethereum; an implementation should not break just because a new fork introduces unrelated new features.

To avoid restricting design space, the scheme has to support extension with new fields, obsolescence of old fields, and new combinations of existing fields. When such adjustments occur, old messages must still deserialize correctly and must retain their original Merkle root.

### What are the problems solved by `Variant[S]`?

The forward compatible merkleization of `StableContainer` may be desirable even in situations where only a single variant is valid at any given time, e.g., as determined by the fork schedule. In such situations, message size can be reduced and type safety increased by exchanging `Variant[S]` instead of the underlying `StableContainer`. This can be useful, e.g., for consensus data structures such as `BeaconState`, to ensure that Merkle proofs for its fields remain compatible across forks.

### Why not `Union[T, U, V]`?

Typically, the individual `Union` cases share some form of thematic overlap, sharing certain fields with each other. In a `Union`, shared fields are not necessarily merkleized at the same [generalized indices](https://github.com/ethereum/consensus-specs/blob/67c2f9ee9eb562f7cc02b2ff90d92c56137944e1/ssz/merkle-proofs.md). Therefore, Merkle proof systems would have to be updated each time that a new flavor is introduced, even when the actual changes are not of interest to the particular system.
Expand All @@ -180,7 +187,7 @@ Additionally, every time that the number of fields reaches a new power of 2, the

## Backwards Compatibility

`StableContainer[N]` is a new SSZ type and does not conflict with other SSZ types currently in use.
`StableContainer[N]` and `Variant[S]` are new SSZ types and do not conflict with other SSZ types currently in use.

## Test Cases

Expand Down
181 changes: 166 additions & 15 deletions assets/eip-7495/stable_container.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import io
from typing import BinaryIO, Dict, List as PyList, Optional, TypeVar, Type, Union as PyUnion, \
from typing import BinaryIO, Dict, List as PyList, Optional, Tuple, TypeVar, Type, Union as PyUnion, \
get_args, get_origin
from textwrap import indent
from remerkleable.bitfields import Bitvector
Expand All @@ -12,8 +12,18 @@
N = TypeVar('N')
S = TypeVar('S', bound="ComplexView")


def all_fields(cls) -> Dict[str, Tuple[Type[View], bool]]:
fields = {}
for k, v in cls.__annotations__.items():
fopt = get_origin(v) == PyUnion and type(None) in get_args(v)
ftyp = get_args(v)[0] if fopt else v
fields[k] = (ftyp, fopt)
return fields


class StableContainer(ComplexView):
_field_indices: Dict[str, tuple[int, Type[View], bool]]
_field_indices: Dict[str, Tuple[int, Type[View], bool]]
__slots__ = '_field_indices'

def __new__(cls, backing: Optional[Node] = None, hook: Optional[ViewHook] = None, **kwargs):
Expand Down Expand Up @@ -71,13 +81,8 @@ class StableContainerView(StableContainer):
return StableContainerView

@classmethod
def fields(cls) -> Dict[str, tuple[Type[View], bool]]:
fields = {}
for k, v in cls.__annotations__.items():
fopt = get_origin(v) == PyUnion and type(None) in get_args(v)
ftyp = get_args(v)[0] if fopt else v
fields[k] = (ftyp, fopt)
return fields
def fields(cls) -> Dict[str, Tuple[Type[View], bool]]:
return all_fields(cls)

@classmethod
def is_fixed_byte_length(cls) -> bool:
Expand Down Expand Up @@ -247,7 +252,10 @@ def serialize(self, stream: BinaryIO) -> int:

return num_prefix_bytes + num_data_bytes


class Variant(ComplexView):
_o: int

def __new__(cls, backing: Optional[Node] = None, hook: Optional[ViewHook] = None, **kwargs):
if backing is not None:
if len(kwargs) != 0:
Expand All @@ -268,28 +276,171 @@ def __new__(cls, backing: Optional[Node] = None, hook: Optional[ViewHook] = None
value = cls.S(backing, hook, **kwargs)
return cls(backing=value.get_backing())

def __init_subclass__(cls, *args, **kwargs):
super().__init_subclass__(*args, **kwargs)
cls._o = 0
for _, (_, fopt) in cls.fields().items():
if fopt:
cls._o += 1

def __class_getitem__(cls, s) -> Type["Variant"]:
if not issubclass(s, StableContainer):
raise Exception(f"invalid variant container: {s}")

class VariantView(Variant, s):
S = s

@classmethod
def fields(cls) -> Dict[str, tuple[Type[View], bool]]:
return s.fields()

VariantView.__name__ = VariantView.type_repr()
return VariantView

@classmethod
def fields(cls) -> Dict[str, Tuple[Type[View], bool]]:
return all_fields(cls)

@classmethod
def is_fixed_byte_length(cls) -> bool:
if cls._o > 0:
return False
for _, (ftyp, _) in cls.fields().items():
if not ftyp.is_fixed_byte_length():
return False
return True

@classmethod
def type_byte_length(cls) -> int:
if cls.is_fixed_byte_length():
return cls.min_byte_length()
else:
raise Exception("dynamic length variant does not have a fixed byte length")

@classmethod
def min_byte_length(cls) -> int:
total = Bitvector[cls._o].type_byte_length() if cls._o > 0 else 0
for _, (ftyp, fopt) in cls.fields().items():
if fopt:
continue
if not ftyp.is_fixed_byte_length():
total += OFFSET_BYTE_LENGTH
total += ftyp.min_byte_length()
return total

@classmethod
def max_byte_length(cls) -> int:
total = Bitvector[cls._o].type_byte_length() if cls._o > 0 else 0
for _, (ftyp, _) in cls.fields().items():
if not ftyp.is_fixed_byte_length():
total += OFFSET_BYTE_LENGTH
total += ftyp.max_byte_length()
return total

def active_fields(self) -> Bitvector:
active_fields_node = super().get_backing().get_right()
return Bitvector[self.__class__.S.N].view_from_backing(active_fields_node)

def optional_fields(self) -> Bitvector:
assert self.__class__._o > 0
active_fields = self.active_fields()
optional_fields = Bitvector[self.__class__._o]()
oindex = 0
for fkey, (_, fopt) in self.__class__.fields().items():
if fopt:
(findex, _, _) = self.__class__.S._field_indices[fkey]
optional_fields.set(oindex, active_fields.get(findex))
oindex += 1
return optional_fields

@classmethod
def type_repr(cls) -> str:
return f"Variant[{cls.S.__name__}]"

@classmethod
def deserialize(cls: Type[S], stream: BinaryIO, scope: int) -> S:
value = cls.S.deserialize(stream, scope)
return cls(backing=value.get_backing())
if cls._o > 0:
num_prefix_bytes = Bitvector[cls._o].type_byte_length()
if scope < num_prefix_bytes:
raise ValueError("scope too small, cannot read Variant optional fields")
optional_fields = Bitvector[cls._o].deserialize(stream, num_prefix_bytes)
scope = scope - num_prefix_bytes

field_values: Dict[str, Optional[View]] = {}
dyn_fields: PyList[FieldOffset] = []
fixed_size = 0
oindex = 0
for fkey, (ftyp, fopt) in cls.fields().items():
if fopt:
have_field = optional_fields.get(oindex)
oindex += 1
if not have_field:
field_values[fkey] = None
continue
if ftyp.is_fixed_byte_length():
fsize = ftyp.type_byte_length()
field_values[fkey] = ftyp.deserialize(stream, fsize)
fixed_size += fsize
else:
dyn_fields.append(FieldOffset(
key=fkey, typ=ftyp, offset=int(decode_offset(stream))))
fixed_size += OFFSET_BYTE_LENGTH
assert oindex == cls._o
if len(dyn_fields) > 0:
if dyn_fields[0].offset < fixed_size:
raise Exception(f"first offset {dyn_fields[0].offset} is "
f"smaller than expected fixed size {fixed_size}")
for i, (fkey, ftyp, foffset) in enumerate(dyn_fields):
next_offset = dyn_fields[i + 1].offset if i + 1 < len(dyn_fields) else scope
if foffset > next_offset:
raise Exception(f"offset {i} is invalid: {foffset} "
f"larger than next offset {next_offset}")
fsize = next_offset - foffset
f_min_size, f_max_size = ftyp.min_byte_length(), ftyp.max_byte_length()
if not (f_min_size <= fsize <= f_max_size):
raise Exception(f"offset {i} is invalid, size out of bounds: "
f"{foffset}, next {next_offset}, implied size: {fsize}, "
f"size bounds: [{f_min_size}, {f_max_size}]")
field_values[fkey] = ftyp.deserialize(stream, fsize)

return cls(**field_values) # type: ignore

def serialize(self, stream: BinaryIO) -> int:
if self.__class__._o > 0:
optional_fields = self.optional_fields()
num_prefix_bytes = optional_fields.serialize(stream)
else:
num_prefix_bytes = 0

num_data_bytes = 0
oindex = 0
for _, (ftyp, fopt) in self.__class__.fields().items():
if fopt:
have_field = optional_fields.get(oindex)
oindex += 1
if not have_field:
continue
if ftyp.is_fixed_byte_length():
num_data_bytes += ftyp.type_byte_length()
else:
num_data_bytes += OFFSET_BYTE_LENGTH
assert oindex == self.__class__._o

temp_dyn_stream = io.BytesIO()
data = super().get_backing().get_left()
active_fields = self.active_fields()
for fkey, (ftyp, _) in self.__class__.fields().items():
(findex, _, _) = self.__class__.S._field_indices[fkey]
if not active_fields.get(findex):
continue
fnode = data.getter(2**get_depth(self.__class__.N) + findex)
v = ftyp.view_from_backing(fnode)
if ftyp.is_fixed_byte_length():
v.serialize(stream)
else:
encode_offset(stream, num_data_bytes)
num_data_bytes += v.serialize(temp_dyn_stream) # type: ignore
temp_dyn_stream.seek(0)
stream.write(temp_dyn_stream.read(num_data_bytes))

return num_prefix_bytes + num_data_bytes


class OneOf(ComplexView):
def __class_getitem__(cls, s) -> Type["OneOf"]:
Expand Down
Loading

0 comments on commit 589eb30

Please sign in to comment.