You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I discovered a nefarious bug in the process of writing tests - sneaky-creeper is reassembling packets that aren't meant to go together.
For instance, a recent test returned this message as one of the final outputs:
u'FXHIXaCJJxUIdyusIsUcpWJTRYGSvbfXZajosdkxHVxllqXicErFSWwXYsHPfpIVYkObFurlFsWIkYhHleHeiighsberapKnQJFwCwsAnXSrdiwPVYwjUKqWqRIHuSyuCzonkgvwbHGedIimnEfUCoSdyVAfqHMYqiKvwQdynJHjYqNzRyyMJQGXJSewxiLbupsNPmGDuMGoYNrSaIGCLcGmxmHvgSpeWswchzMBIzTVVjccVFdjEhHkNsfOoHCXYReCvuKOwwGupYEoiNOqdIeAdKuMFUhblhRgctMmveDhqoObMLqGgyAskveIkNxBxoIAESAqnGyswZUhexDkDBNdynpfiYWgqOOnDtEFCOzyufbXKsHxORgEtuYfdMgDTPKwqRtzTsQovery secret and private message'
Which looks to me like these two packets combined:
Note that packet 2 is newer; packet 1 was followed by two more tweets, then packet 2.
This leads me to believe that our current packet headers are insufficient. I think I have a better idea.
Terminology
Message - One call to Exfil.send('some stuff here') - something the user would recognize as one "unit" Packet - One chunk of a message - the largest that the channel can fit. Each has its own header, described below. Series - A number of packets which make up a complete message when reassembled.
In the following diagrams, each series of - characters represents the position of one character in the final protocol.
Packet # - This packet's zero-indexed position in the overall message. This is a base-94 encoded number so as to ensure it is ASCII printable and therefore compatible with text-only channels (Twitter being a prime example.) Space - A literal space character Total Packets - The total number of packets in this particular message. Base-94 encoded as above.
However, this header format is not resilient to intermingling - which is a necessary condition given that the packets are not guaranteed to come back from the channel in order. If two messages are sent composed of the same number of packets, and those two messages' packets are intermingled, it is impossible to piece together the two messages correctly.
Packet Identifier - A randomly-generated string to uniquely identify a series of packets. Packet # - This packet's zero-indexed position in the overall message. This is a base-94 encoded number so as to ensure it is ASCII printable and therefore compatible with text-only channels (Twitter being a prime example.) L - If this is the last packet in the series, either a literal L or l (uppercase or lowercase L) character. If this is not the last packet in the series, any other acceptable character as padding.
The five base-94 encoded characters of the packet identifier field provide 7,339,040,224 possible combinations, making collisions very unlikely as long as the identifiers are sufficiently random.
This is heavily inspired by Section 3.1 of RFC 791, which describes IP headers. In particular, IP headers use the following indicators to reassemble fragments:
Source address (excluded in this protocol)
Destination address (also excluded in this protocol)
Identification (16 bits; "An identifying value assigned by the sender to aid in assembling the fragments of a datagram")
Fragment Offset (13 bits; "This field indicates where in the datagram this fragment belongs.")
The text was updated successfully, but these errors were encountered:
Also worth considering: checksums? Worth the extra space they would take up? Perhaps only the final packet contains a checksum, removing the need for the L character but adding the need for a header length field.
Yeah I recognize that we need something better, even if this situation happens only on Twitter (or in any other channel that has a kind of limitation on messages' length).
I'm not sure we need the extra complexity of the checksum. Yes, it will ensure that our data arrives well and sound, but we really need it?
Why not starting a branch with this new format and see how it goes? Then we can merge at some point in the future when we have more real-scenario tests.
I discovered a nefarious bug in the process of writing tests - sneaky-creeper is reassembling packets that aren't meant to go together.
For instance, a recent test returned this message as one of the final outputs:
Which looks to me like these two packets combined:
Packet 1:
Packet 2:
Note that packet 2 is newer; packet 1 was followed by two more tweets, then packet 2.
This leads me to believe that our current packet headers are insufficient. I think I have a better idea.
Terminology
Message
- One call toExfil.send('some stuff here')
- something the user would recognize as one "unit"Packet
- One chunk of a message - the largest that the channel can fit. Each has its own header, described below.Series
- A number of packets which make up a complete message when reassembled.In the following diagrams, each series of
-
characters represents the position of one character in the final protocol.Old Format
Packet #
- This packet's zero-indexed position in the overall message. This is a base-94 encoded number so as to ensure it is ASCII printable and therefore compatible with text-only channels (Twitter being a prime example.)Space
- A literal space characterTotal Packets
- The total number of packets in this particular message. Base-94 encoded as above.However, this header format is not resilient to intermingling - which is a necessary condition given that the packets are not guaranteed to come back from the channel in order. If two messages are sent composed of the same number of packets, and those two messages' packets are intermingled, it is impossible to piece together the two messages correctly.
Proposed New Format
Packet Identifier
- A randomly-generated string to uniquely identify a series of packets.Packet #
- This packet's zero-indexed position in the overall message. This is a base-94 encoded number so as to ensure it is ASCII printable and therefore compatible with text-only channels (Twitter being a prime example.)L
- If this is the last packet in the series, either a literalL
orl
(uppercase or lowercase L) character. If this is not the last packet in the series, any other acceptable character as padding.The five base-94 encoded characters of the packet identifier field provide 7,339,040,224 possible combinations, making collisions very unlikely as long as the identifiers are sufficiently random.
This is heavily inspired by Section 3.1 of RFC 791, which describes IP headers. In particular, IP headers use the following indicators to reassemble fragments:
The text was updated successfully, but these errors were encountered: