Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEP Draft: Conversion Between Bencode and JSON Using Hexadecimal Encoding #16

Open
josecelano opened this issue Oct 18, 2024 Discussed in #15 · 1 comment · May be fixed by #17
Open

TEP Draft: Conversion Between Bencode and JSON Using Hexadecimal Encoding #16

josecelano opened this issue Oct 18, 2024 Discussed in #15 · 1 comment · May be fixed by #17
Assignees

Comments

@josecelano
Copy link
Member

Discussed in #15

Originally posted by josecelano February 1, 2024

  • Draft: ChatGPT
  • Todo: research

Draft

Abstract: This document proposes a standard for converting data between Bencode, the encoding format used by the BitTorrent protocol, and JSON (JavaScript Object Notation), a widely-used data interchange format. The primary challenge addressed is the representation of binary data in JSON, which is inherently text-based and encodes strings in UTF-8. This proposal recommends using hexadecimal encoding for binary data within JSON and provides a JSON schema for all valid converted objects.

  1. Introduction

Bencode is a binary format widely used in peer-to-peer file sharing systems, particularly BitTorrent. JSON, on the other hand, is a text-based format used for data interchange on the web. Converting between these two formats requires careful handling of binary data, as JSON does not natively support raw binary data.

  1. Hexadecimal Encoding for Binary Data

The primary method proposed for handling binary data in Bencode when converting it to JSON is hexadecimal encoding. This approach involves representing each byte of binary data as a two-digit hexadecimal number. For example, a byte with the value 0x1F in binary would be represented as the string "1F" in JSON.

Advantages:

  • Hexadecimal encoding is a straightforward, widely understood method.
  • It ensures compatibility with JSON's text-based format.
  • The encoded data is somewhat human-readable, which can be beneficial for debugging.

Disadvantages:

  • Increased data size due to the encoding (each byte of binary data becomes two characters in JSON).
  1. Alternative Methods (Discarded)

Other methods considered and discarded include:

a. Base64 Encoding: Converts binary data into a base-64 representation. While efficient in terms of space, it is less human-readable and can complicate encoding and decoding processes.

b. Array Representation: Involves representing binary data as an array of byte values in JSON. This method is inefficient in terms of space and handling.

c. Escape Non-UTF8 Sequences: Attempts to represent binary data as UTF-8 strings by escaping invalid sequences. This approach is complex and not universally applicable.

d. Custom Encoding Scheme: Utilizes a custom scheme for specific types of binary data. This method would require custom logic for parsing and is less generalizable.

  1. JSON Schema for Valid Objects

The JSON schema for representing Bencode data in JSON is as follows:

{
  "type": "object",
  "properties": {
    "integers": {"type": "integer"},
    "strings": {"type": "string"},
    "lists": {
      "type": "array",
      "items": {/* recursive reference to this schema */}
    },
    "dictionaries": {
      "type": "object",
      "additionalProperties": {/* recursive reference to this schema */}
    },
    "binary": {"type": "string", "pattern": "^[0-9A-Fa-f]*$"}
  }
}
  1. Examples of Conversion

Example 1: Bencode to JSON Conversion

Bencode: d3:bar4:spam3:fooi42ee
JSON: {"bar": "spam", "foo": 42}

Example 2: Handling Binary Data

Bencode: 4:\x8A\xE2\x9C\x93 (binary data in Bencode string)
JSON: {"binary": "8AE29C93"} (hexadecimal encoded)

  1. Conclusion

This proposal provides a standardized method for converting between Bencode and JSON, with a focus on the proper representation of binary data. By using hexadecimal encoding, we ensure compatibility with JSON's text-based format while maintaining the integrity of the binary data from Bencode.

  1. Links

Other approaches:

cc @da2ce7

@josecelano josecelano self-assigned this Oct 18, 2024
@josecelano josecelano linked a pull request Oct 22, 2024 that will close this issue
@josecelano
Copy link
Member Author

Someone found an issue with this representation; it's not reversible:

: torrust/bencode2json#7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant