Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Headers due to Packet Reassembly Error #96

Open
DakotaNelson opened this issue Jun 8, 2016 · 2 comments
Open

New Headers due to Packet Reassembly Error #96

DakotaNelson opened this issue Jun 8, 2016 · 2 comments
Labels

Comments

@DakotaNelson
Copy link
Owner

I discovered a nefarious bug in the process of writing tests - sneaky-creeper is reassembling packets that aren't meant to go together.

For instance, a recent test returned this message as one of the final outputs:

u'FXHIXaCJJxUIdyusIsUcpWJTRYGSvbfXZajosdkxHVxllqXicErFSWwXYsHPfpIVYkObFurlFsWIkYhHleHeiighsberapKnQJFwCwsAnXSrdiwPVYwjUKqWqRIHuSyuCzonkgvwbHGedIimnEfUCoSdyVAfqHMYqiKvwQdynJHjYqNzRyyMJQGXJSewxiLbupsNPmGDuMGoYNrSaIGCLcGmxmHvgSpeWswchzMBIzTVVjccVFdjEhHkNsfOoHCXYReCvuKOwwGupYEoiNOqdIeAdKuMFUhblhRgctMmveDhqoObMLqGgyAskveIkNxBxoIAESAqnGyswZUhexDkDBNdynpfiYWgqOOnDtEFCOzyufbXKsHxORgEtuYfdMgDTPKwqRtzTsQovery secret and private message'

Which looks to me like these two packets combined:

Packet 1:
image

Packet 2:
image

Note that packet 2 is newer; packet 1 was followed by two more tweets, then packet 2.

This leads me to believe that our current packet headers are insufficient. I think I have a better idea.

Terminology

Message - One call to Exfil.send('some stuff here') - something the user would recognize as one "unit"
Packet - One chunk of a message - the largest that the channel can fit. Each has its own header, described below.
Series - A number of packets which make up a complete message when reassembled.

In the following diagrams, each series of - characters represents the position of one character in the final protocol.

Old Format

 0     1     2     3     4     5     6     7
 +-----+-----+-----+-----+-----+-----+-----+-----+
 |    Packet #     |Space|  Total Packets  |Space|
 +-----+-----+-----+-----+-----+-----+-----+-----+

Packet # - This packet's zero-indexed position in the overall message. This is a base-94 encoded number so as to ensure it is ASCII printable and therefore compatible with text-only channels (Twitter being a prime example.)
Space - A literal space character
Total Packets - The total number of packets in this particular message. Base-94 encoded as above.

However, this header format is not resilient to intermingling - which is a necessary condition given that the packets are not guaranteed to come back from the channel in order. If two messages are sent composed of the same number of packets, and those two messages' packets are intermingled, it is impossible to piece together the two messages correctly.

Proposed New Format

 0     1     2     3     4     5     6     7     8
 +-----+-----+-----+-----+-----+-----+-----+-----+-----+
 |      Packet Identifier      |    Packet #     |  L  |
 +-----+-----+-----+-----+-----+-----+-----+-----+-----+

Packet Identifier - A randomly-generated string to uniquely identify a series of packets.
Packet # - This packet's zero-indexed position in the overall message. This is a base-94 encoded number so as to ensure it is ASCII printable and therefore compatible with text-only channels (Twitter being a prime example.)
L - If this is the last packet in the series, either a literal L or l (uppercase or lowercase L) character. If this is not the last packet in the series, any other acceptable character as padding.

The five base-94 encoded characters of the packet identifier field provide 7,339,040,224 possible combinations, making collisions very unlikely as long as the identifiers are sufficiently random.

This is heavily inspired by Section 3.1 of RFC 791, which describes IP headers. In particular, IP headers use the following indicators to reassemble fragments:

  • Source address (excluded in this protocol)
  • Destination address (also excluded in this protocol)
  • Identification (16 bits; "An identifying value assigned by the sender to aid in assembling the fragments of a datagram")
  • Fragment Offset (13 bits; "This field indicates where in the datagram this fragment belongs.")
@DakotaNelson
Copy link
Owner Author

Thoughts on this are very welcome!

Also worth considering: checksums? Worth the extra space they would take up? Perhaps only the final packet contains a checksum, removing the need for the L character but adding the need for a header length field.

@davinerd
Copy link
Contributor

Yeah I recognize that we need something better, even if this situation happens only on Twitter (or in any other channel that has a kind of limitation on messages' length).

I'm not sure we need the extra complexity of the checksum. Yes, it will ensure that our data arrives well and sound, but we really need it?

Why not starting a branch with this new format and see how it goes? Then we can merge at some point in the future when we have more real-scenario tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants