-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: JSON Module #83
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
--- | ||
title: JSON Module | ||
status: Draft | ||
created: 2019-08-26 | ||
updated: 2019-08-26 | ||
authors: | ||
- [jmillikin](https://john-millikin.com) | ||
reviewers: | ||
- Starlark core reviewers | ||
discussion thread: [#83](https://github.com/bazelbuild/starlark/pull/83) | ||
--- | ||
|
||
# JSON Module | ||
|
||
## Abstract | ||
|
||
This document proposes an API for encoding and decoding JSON from Starlark. If accepted, implementations of Starlark | ||
that implement JSON support would be expected to implement this API. The goal is to allow users to write JSON code that | ||
behaves consistently across Starlark implementations. | ||
|
||
## Background | ||
|
||
JSON is a popular syntax for representing basic data structures. Partial support for parsing or serializing JSON from | ||
Starlark has been independently requested and/or implemented several times: | ||
|
||
* Bazel's [`struct`](https://docs.bazel.build/versions/master/skylark/lib/struct.html) type has a `to_json()` method | ||
that can generate JSON documents. Issue [bazelbuild/bazel#7879](https://github.com/bazelbuild/bazel/issues/7879) | ||
requests additional control over the behavior of this method (i.e. optional whitespace). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is addressed in my Go implementation by separating encoding and prettyprinting, in exactly the same way as these concepts are separated in the Go standard library's encoding/json package. Existing JSON strings can be prettyprinted in a single pass with very little state, without the need to decode and reencode. |
||
|
||
* [bazelbuild/bazel#3732](https://github.com/bazelbuild/bazel/issues/3732) requests support for parsing JSON from | ||
within a Bazel repository rule, for use with introspecting the state of external tools. | ||
|
||
* [`json_parser.bzl`](https://github.com/erickj/bazel_json) is a JSON parser implemented entirely within Starlark. | ||
Limitations of the Starlark language prevent this parser from being recommended for production use. | ||
|
||
* [google/starlark-go#179](https://github.com/google/starlark-go/pull/179) proposes a JSON module for the Go | ||
implementation of starlark. | ||
|
||
* [Skycfg](https://github.com/stripe/skycfg) has a `json.marshal()` method that can generate JSON documents, for use | ||
with config formats based on JSON syntax. | ||
|
||
JSON implementations in the broader community have wildly variant APIs. It is likely that ad-hoc JSON extensions to | ||
Starlark will have different APIs between Starlark implementations. | ||
|
||
## Proposed API | ||
|
||
The JSON module is a value named `json` in the global namespace. Its API comprises the following functions. Starlark | ||
implementations are not required to support the entire API, but should avoid extending the API with non-standard | ||
functions or parameters. | ||
|
||
JSON implementations should comply with the format documented as | ||
[ECMA-404](https://www.ecma-international.org/publications/standards/Ecma-404.htm) and | ||
[RFC 8259](https://www.rfc-editor.org/rfc/rfc8259.html), which supercedes earlier drafts of JSON. In particular, the | ||
only permitted character encoding for contemporary JSON is UTF-8. | ||
|
||
To maintain compatibility with existing callers, new required parameters should not be added to these functions. New | ||
optional parameters should be defined using keyword-only parameters | ||
([PEP-3102](https://www.python.org/dev/peps/pep-3102/)). | ||
|
||
### json.decode() | ||
|
||
The `json.decode()` function decodes a JSON document into a Starlark value. | ||
|
||
```python | ||
def json.decode(data): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Very cool! Is there any reason we wouldn't follow the python json interface, perhaps only the string There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm fine with following the Python API here. Historically Bazel hasn't worried too much about cross-compatibility with the Python stdlib, and I thought |
||
``` | ||
|
||
Type conversions are: | ||
* JSON arrays are decoded to Starlark `list` values. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would add: new, unfrozen list values. Ditto for dicts. |
||
* JSON objects are decoded to Starlark `dict` values. Keys are in the same order as the input data. | ||
* JSON `true`, `false`, and `null` literals are decoded to Starlark `True`, `False`, and `None` respectively. | ||
* JSON strings are decoded to Starlark `string` values. | ||
* JSON numbers with no fractional component are decoded to Starlark `int` values. Starlark implementations without | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Go implementation now emits a string of decimal integers for a Starlark int value, no matter how large; it does not truncate big ints to uint64 or int53 or int32, nor use scientific notation. This does mean that decoder implementations with integer width limitations may not be able to read these files. However, the meaning of the JSON files is quite clear. The Go impl also uses a %g floating point representation for a Starlark value of type float, even if the value is integral. This means round-tripping Starlark values via JSON preserves int vs float representation type, which seems desirable. (The decoder uses the presence of a decimal point to indicate 'float'.) |
||
arbitrary-precision integers should reject numbers that exceed their supported range. | ||
* JSON numbers with a fractional component may be decoded to an arbitrary-precision or floating-point value, if | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it conforming for implementations without floating-point values to keep them as a string? Or do they have to error? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. An implicit type conversion of float to string would be very surprising to me as a user, so I think I'd prefer the implementation to error if it can't accept a given input value. |
||
supported by the current Starlark implementation. | ||
* Starlark implementations without arbitrary-precision numeric values should reject numbers that exceed their | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that rejecting unrepresentable values is the best course of action. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if they are within the bounds, but exceed the precision to represent them unambiguously? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess "range" is a little wrong here, but I mean numbers that the implementation can't represent. In the trivial case, a value I hope that issues of floating point are mostly academic because most Starlark implementations don't allow floats by default, and the primary client codebase (Bazel extensions) doesn't allow floats at all. |
||
supported range. | ||
|
||
### json.encode() | ||
|
||
```python | ||
json.encode(value, *, indent=None, sort_keys=False) | ||
``` | ||
|
||
Type conversions are: | ||
* Starlark `list` and `tuple` values are encoded to JSON arrays. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Go implementation generalizes this to all iterable sequences (that are not iterable mappings). |
||
* Starlark `dict` values are encoded to JSON objects. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The Go implementation also encodes Starlark struct values as JSON objects. |
||
* Starlark values `True`, `False`, and `None` are encoded to JSON `true`, `false`, and `null` respectively. | ||
* Starlark strings are encoded to JSON strings. | ||
* Starlark `int` values are encoded to JSON numbers. | ||
|
||
Starlark implementations may support encoding other types | ||
|
||
If `indent` is a number, it is how many spaces to indent by. Indent levels less than 1 will only insert newlines. | ||
If `indent` is `None` (the default), JSON will be encoded in one line with no extra spaces. | ||
|
||
If `sort_keys` is `True`, then encoded objects' keys are sorted in lexicographical order. If `sort_keys` is `False` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this feature necessary? (Update: During review of the Java implementation, we decided that structs and dicts should both sort their fields/keys.) |
||
(the default), then object keys are in the same order as the `dict` keys. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sending this. I'm not sure how it escaped my attention till today, when I happened to resume work on the go.starlark.net JSON encoder/decoder (google/starlark-go#179) that you mention below.