Skip to content

Dozer cache sorted inverted index key serialization format

Bei Chu edited this page Apr 27, 2023 · 4 revisions

Format Description

Single Field Key

  • field type, 1 byte.
  • field data, variable length.

Reference: Field::encode.

Multiple Fields Key

For every field:

  • field total length, big endian u64, 8 bytes.
  • field, same as single field key.

Reference: get_secondary_index.

The field total length value is the sum of field total length segment and field segment, so its value is at least 9.

We choose not to encode total number of fields, so when serializing a prefix slice of multiple fields, the serialization of the slice is exactly a prefix of the serialization of all fields.

Example

Single Field Key

Field::String("abc") -> [4, b'a', b'b', b'c']

Explanation: 4 is the type for String, followed by the bytes of the string.

Multiple Fields Key

[Field::String("a"), Field::String("bc")] -> [0, 0, 0, 0, 0, 0, 0, 10, 4, b'a', 0, 0, 0, 0, 0, 0, 0, 11, 4, b'b', b'c']

Explanation:

First field:

[0, 0, 0, 0, 0, 0, 0, 10, 4, b'a']

The first 8 bytes are the big endian representation of 10u64, and the total length of the serialization of the first field is 10.

The last 2 bytes are the single field Field::String("a") representation.

Second field:

[0, 0, 0, 0, 0, 0, 0, 11, 4, b'b', b'c']

The first 8 bytes are the big endian representation of 11u64, and the total length of the serialization of the first field is 11.

The last 3 bytes are the single field Field::String("bc") representation.