Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spec: add channel UUID #922

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 37 additions & 8 deletions website/docs/spec/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,12 +111,14 @@ All MCAP records are serialized as follows:

Record type is a single byte opcode, and record content length is a uint64 value.

Records may be extended by adding new fields at the end of existing fields. Readers should ignore any unknown fields.
Future changes to this specification may extend records by adding new fields at the end of existing fields. Readers should ignore any unknown fields.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are introducing this language it would be good to do in a separate PR, and ensure we have conformance tests for readers well ahead of introducing any new fields.


> The Footer and Message records will not be extended, since their formats do not allow for backward-compatible size changes.

Each record definition below contains a `Type` column. See the [Serialization](#serialization) section on how to serialize each type.

Record content may end before all known fields have been serialized. Readers should treat a missing field as having that field type's [zero value](#zero-values).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are going to make this statement, we should probably be explicit that records must still be fully formed, even if they are missing fields (i.e. records must match their stated length). otherwise the file should be considered corrupt.


### Header (op=0x01)

| Bytes | Name | Type | Description |
Expand Down Expand Up @@ -155,13 +157,17 @@ A Channel record defines an encoded stream of messages on a topic.

Channel records are uniquely identified within a file by their channel ID. A Channel record must occur at least once in the file prior to any message referring to its channel ID. Any two channel records sharing a common ID must be identical.

| Bytes | Name | Type | Description |
| ----- | ---------------- | --------------------- | ----------------------------------------------------------------------------------------------------------- |
| 2 | id | uint16 | A unique identifier for this channel within the file. |
| 2 | schema_id | uint16 | The schema for messages on this channel. A schema_id of 0 indicates there is no schema for this channel. |
| 4 + N | topic | String | The channel topic. |
| 4 + N | message_encoding | String | Encoding for messages on this channel. The [well-known message encodings][message_encodings] are preferred. |
| 4 + N | metadata | `Map<string, string>` | Metadata about this channel |
The `uuid` field may be used to associate channels between files. For example, an application that merges MCAP files
may merge channels with the same non-nil `uuid` into the same output channel.

| Bytes | Name | Type | Description |
| ----- | ---------------- | --------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| 2 | id | uint16 | A unique identifier for this channel within the file. |
| 2 | schema_id | uint16 | The schema for messages on this channel. A schema_id of 0 indicates there is no schema for this channel. |
| 4 + N | topic | String | The channel topic. |
| 4 + N | message_encoding | String | Encoding for messages on this channel. The [well-known message encodings][message_encodings] are preferred. |
| 4 + N | metadata | `Map<string, string>` | Metadata about this channel |
| 16 | uuid | uuid | A globally unique identifier for this channel. A [nil UUID](https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.7) indicates no globally unique identifier is available. |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems like this could be solved by using channel metadata. is there something that makes this special enough to be an explicitly named field?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Being a first class citizen means that tooling can rely on this field. Metadata is better read as "user-data" that is unspecified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have information about such a use case? What if I use something other than UUID to identify channels?

I think we should have hard coded fields where there is a demonstrated need for cross-tool interoperability. If this is only to support needs of individual companies then using a named metadata field would be sufficient.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not convinced that UUID is the appropriate term or requirement. I would describe it is an identifier that is stable across multiple separate mcap files (possibly via rolling recording).

I think this PR needs to be split up into separate proposals that have clear problem statements. Right now its not in a state that makes a case on what problems it is solving.


Channel records may be duplicated in the summary section.

Expand Down Expand Up @@ -319,22 +325,33 @@ A Summary Offset record contains the location of records within the summary sect

## Serialization

### Zero Values

Record field types have a defined _zero value_. This is the value that readers should use if that field
is missing from the serialized MCAP record.

### Fixed-width types

Multi-byte integers (`uint16`, `uint32`, `uint64`) are serialized using [little-endian byte order](https://en.wikipedia.org/wiki/Endianness).

The [zero value](#zero-values) of all integer types is 0.

### String

Strings are serialized using a `uint32` byte length followed by the string data, which should be valid [UTF-8](https://en.wikipedia.org/wiki/UTF-8).

<byte length><utf-8 bytes>

The [zero value](#zero-values) of a string is the empty string.
Copy link
Contributor

@amacneil amacneil Jul 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The [zero value](#zero-values) of a string is the empty string.
The [zero value](#zero-values) of a string is a string with length zero (i.e. `0x00000000`).


### Bytes

Bytes is sequence of bytes with no additional requirements.

<bytes>

The [zero value](#zero-values) of a byte sequence is the empty sequence.

### Tuple<first_type, second_type>

Tuple represents a pair of values. The first value has type first_type and the second has type second_type.
Expand All @@ -353,6 +370,8 @@ Example `Tuple<uint16, string>`:

<uint16><uint32><utf-8 bytes>

The [zero value](#zero-values) of a tuple is the first value type's zero value followed by the second value type's zero value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do tuples ever occur outside of an array? I don't think this case is needed.


### Array<array_type>

Arrays are serialized using a `uint32` byte length followed by the serialized array elements.
Expand All @@ -365,6 +384,8 @@ An array of uint64 is specified as `Array<uint64>` and serialized as:

> Since arrays use a `uint32` byte length prefix, the maximum size of the serialized array elements cannot exceed 4,294,967,295 bytes.

The [zero value](#zero-values) of an array is the empty array.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The [zero value](#zero-values) of an array is the empty array.
The [zero value](#zero-values) of an array is an array with length zero (i.e. `0x00000000`).


### Timestamp

`uint64` nanoseconds since a user-understood epoch (i.e unix epoch, robot boot time, etc.)
Expand All @@ -383,6 +404,14 @@ A `Map<string, string>` would be serialized as:

A serialization which has duplicate keys may cause indeterminate decoding.

The [zero value](#zero-values) of a map is an empty map.

### UUID

An UUID is a 128-bit identifier described in [RFC-4122](https://datatracker.ietf.org/doc/html/rfc4122).

The [zero value](#zero-values) of an UUID is the [nil UUID](https://datatracker.ietf.org/doc/html/rfc4122#section-4.1.7).

## Diagrams

The following diagrams demonstrate various valid MCAP files.
Expand Down