Skip to content
This repository has been archived by the owner on Dec 24, 2023. It is now read-only.

Switch from Flatbuffers to ASN.1 ? #10

Open
ChristopherRabotin opened this issue Dec 29, 2021 · 2 comments
Open

Switch from Flatbuffers to ASN.1 ? #10

ChristopherRabotin opened this issue Dec 29, 2021 · 2 comments
Assignees
Labels
proposed A proposed functionality specification Related to the "how should this be done" question

Comments

@ChristopherRabotin
Copy link
Member

ChristopherRabotin commented Dec 29, 2021

ASN.1 is a highly efficient standardized platform-independent binary encoding used in lots of applications, including telecommunications and cryptographic key exchanges. This just looks amazing, seriously. Huge thanks to @pwnorbitals for letting me know about this.

Some references:

Note: all of the ASN.1 specs below can be tested directly on this playground: https://asn1.io/asn1playground/ .

Benchmark encoding sizes

One issue with the der library is that it does not support the Real type defined in section 2.4 of the specs (PDF).

Rebuilding the REAL type

Built-in

ANISE DEFINITIONS AUTOMATIC TAGS ::= 
BEGIN
  Real ::= SEQUENCE       
  {                                                     
     data REAL
  }                                                     
END

Encode:

value Real ::= {
  data 3.141592653589793
}

DER: 26 bytes

Naive

ANISE DEFINITIONS AUTOMATIC TAGS ::= 
BEGIN
  Real ::= SEQUENCE       
  {                                                     
     mantissa INTEGER DEFAULT 0,
     realbase INTEGER DEFAULT 10,
     exponent INTEGER DEFAULT 0
  }                                                     
END

Encode Pi:

value Real ::= {
  mantissa 3141592653589793,
  realbase 10,
  exponent 15
}

DER: 14 bytes

Full specs

(Took me one hour to fix...)

ANISE DEFINITIONS AUTOMATIC TAGS ::= 
BEGIN
  Normal ::= SEQUENCE       
  {                                                     
     mantissa INTEGER DEFAULT 0,
     realbase INTEGER DEFAULT 10,
     exponent INTEGER DEFAULT 0
  }

Subnormal ::= ENUMERATED {
    plus-infinity,
    neg-infinity
}   

Real ::= CHOICE {
    as_normal Normal,
    as_subnormal Subnormal
}                                                                                                 
END

Encoding a normal number:

realdata Real ::= as_normal : {
  mantissa 3141592653589793,
  realbase 10,
  exponent 15
}

DER: 14 bytes ( no overhead it seems!)

Encoding a subnormal number:

realdata Real ::= as_subnormal : plus-infinity

DER: 1 byte (yet, ONE!)

@ChristopherRabotin ChristopherRabotin added specification Related to the "how should this be done" question proposed A proposed functionality labels Dec 29, 2021
@ChristopherRabotin ChristopherRabotin self-assigned this Dec 29, 2021
@ChristopherRabotin
Copy link
Member Author

Trade study

The ASN1 representation compared to flatbuffers, protobufs, and the current SPICE representation have been weighted. @pwnorbitals and I have decided to go ahead with ASN1 given the decision matrix below.

Decision matrix

Each item rates from 1 (worst) to 5 (best).

~ SPICE DAF Flatbuffer ASN1 Protocol buffers
Compactness 4 3 5 5
Spec clarity 2 5 4 5
Network transfer 1 3 5 1
Extensibility 1 5 4 5
Zero alloc read 5 4 5 1
Zero alloc write 4 1 5 1
Runtime parsing 5 4 4 1
Certifiability 4 2 5 2
Multi-arch 4 4 5 4
Subset serialization 1 1 5 5
TOTALS 31 32 47 30

Detailed criteria

Compactness

How small is the file on disk?

  • DAF: this format is basically a byte array of IEEE-754 doubles in big endian encoding. It hardly gets more compact that this for storing arbitrary 64bit floats. The drawback of DAF is how it stores strings and how it pads some structs to make sure to correctly deserialize in FORTRAN. Reference size of 16.7 MB.

  • Flatbuffers: serialization is quite small thanks to its variable lengths, but this requires additional control bytes for correct deserialization (vtable_offset), making it less compact than the equivalent DAF (20.1 MB)

  • ASN.1: like flatbuffers, ASN.1 uses variable length to encode data. This adds two bytes per structure (one Tag byte and one Length byte). Hence, arbitrarily large real data is encoded anywhere between 3 bytes (tag+length+subnormal kind, e.g. NaN, zero) to 12 bytes (tag+length+exponent on two bytes+mantissa on 8 bytes). In the case of interpolation coefficients, these seem to typically be encoded on 10 to 11 bytes. This limitation can be bypassed for arrays of 64 bits by encoding all of the floats as a single OctetString and parsing it as a contiguous array (in fact, this allowed me to create a 6.7MB file with all of the data from the SPK 16.7 MB de421.bsp file). That parsing would be custom, but it would allow for O(1) access of any item in the array.

  • Protobuf: everything is extremely compact. In a previous project, I found that a protobuf encoding is about the same compactness as the equivalent SPK DAF file.

Note: these tests were done on the DE421.bsp file, looking specifically at the encoding of 64 bit floats.

Spec clarity

When reading the format specifications, how clearly is the message conveyed?

  • DAF: It's a bit complicated. The endianness depends on the platform on which the file was generated, and the specifications do not say which is preferred. Instead, they recommend converting to the equivalent ASCII file, transfer that, and the convert back to the natively encoded DAF file. Further, the specifications are only laid out in the NAIF documentation, without providing a "specs" file.

  • Flatbuffer: Very clear specification from the .fbs files, which can then be used to automatically generate bindings in many languages.

  • ASN.1: Clear specification from the .asn files. Some compilers (e.g. https://github.com/ttsiodras/asn1scc and https://www.erlang.org/doc/apps/asn1/asn1_getting_started.html) may allow generation of structures in some languages, but typically require manual creation of those structures. However, ASN.1 is an ITU and ISO certification specification language!

  • Protobufs: Very clear specification from the .pb files, which can then be used to automatically generate bindings in many languages.

Network transfer

When transfering data across a network, how can the receiving party prepare for receiving the data?

  • DAF: The format does not provide any length information in a header. Therefore, network transfer will need to first send a length byte for the receiver to prepare the memory arena for storing the information. The order of packets matters (ie no UDP/IP transfers).

  • Flatbuffer: Flatbuffers are designed for TCP transfers but I cannot find whether the root structure has any length data associated (it is available in the table structure).

  • ASN1: All structures start with a tag and a length, on one byte each (at least for the length, may be more). Hence, the receiving party only need to read the first few bytes to determine how large the arena needs to be before reading the rest of the stream.

  • Protobufs: Like DAF, it does not provide any length data.

Extensibility

How easy is it to extend these specifications?

  • DAF: Extremely difficult as these would need to be agreed by NAIF. Only a few data types have been added in the last two decades.

  • Flatbuffer: Excellent support for updating the specs and making different field obsolete while providing a default value. Supports versioning.

  • ASN.1: Although it does not support versioning per se, it isn't difficult to add a field and to provide a default value if it isn't specified.

  • Protobuf: Excellent support for updating the specs and making different field obsolete while providing a default value. Supports versioning.

Zero alloc read

Does reading required dynamic memory allocation with the standard tools?

  • DAF: No.
  • Flatbuffers: Yes by default ... but the byte array can be mmap'd and then passed to the deserialization library.
  • ASN.1: the der Rust library is nostd and guarantees zero allocation on read and write.
  • Protobuf: Yes, and the variable length of each field needs a memory allocation of its own.

Zero alloc write

  • DAF: Unknown because I haven't analyzed the tools that create DAF files. I suspect it ought to be possible to bypass memory allocation for writing.
  • Flatbuffers: Yes and the library requires mallocs for writing.
  • ASN.1: the der Rust library is nostd and guarantees zero allocation on read and write.
  • Protobuf: Yes.

Runtime parsing

How involved is the reading of the byte array and conversion to native types?

  • DAF: Basically none (a few integers are deliberately encoded on 8 bytes) as long as you've correctly read the file and used the correct endianness (that's the hard part).
  • Flatbuffers: Yes, but limited because the library decodes the byte array into the native types.
  • ASN.1: Yes, but limited because the library decodes the byte array into the native types.
  • Protobuf: Yes, extensive, requires mallocs.

Certifiability

If we wanted to certify this format for use throughout the industry, how hard would that be?

  • DAF: Possible given its ubiquitous use in the space industry.
  • Flatbuffers: Would be possible but difficult, and probably be branded as a library designed by people without experience in embedded system.
  • ASN.1: Quite possible! It's an ITU and ISO certified language, and cellular networks and cryptography exchanges certify their communication with it.
  • Protobuf: Would be quite hard given all the mallocs, and probably be branded as a library designed by people without experience in embedded system.

Multi-arch

Does it support big and little endian machines, and does it support many programming languages?

  • DAF: Somewhat supports big and little endianness, but it isn't recommended by NAIF. Language support is very limited and writing parsers is very hard in my experience.

  • Flatbuffers: Yes, works on multi-arch systems out of the box and supports several languages (but not Julia or FORTRAN).

  • ASN.1: Yes, superb! It's designed for that! Many libraries in many languages too.

  • Protobuf: Yes, works on several architectures and quite well used in the software industry.

Subset serialization

If we wanted to serialize only part of what the specs provide (e.g. only a Vector3 and not a whole ephemeris), could we do that while following the specs and allowing for transfer of that information?

  • DAF: Basically none since there isn't any specification for different types, it's just a blob of bytes.
  • Flatbuffers: Not trivial as one would need to add a root element possibility for every new kind that
  • ASN1: Yes! Each struct has its own tag, and its own deserialization of bytes regardless of what's around it.
  • Protobuf: Yes.

@pwnorbitals
Copy link

Great work from your side, @ChristopherRabotin. Incredible work :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
proposed A proposed functionality specification Related to the "how should this be done" question
Projects
None yet
Development

No branches or pull requests

2 participants