Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: JSON Module #83

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 99 additions & 0 deletions proposals/2019-08-26-json-module.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
---
title: JSON Module
status: Draft
created: 2019-08-26
updated: 2019-08-26
authors:
- [jmillikin](https://john-millikin.com)
reviewers:
- Starlark core reviewers
discussion thread: [#83](https://github.com/bazelbuild/starlark/pull/83)
---

# JSON Module

## Abstract

This document proposes an API for encoding and decoding JSON from Starlark. If accepted, implementations of Starlark
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sending this. I'm not sure how it escaped my attention till today, when I happened to resume work on the go.starlark.net JSON encoder/decoder (google/starlark-go#179) that you mention below.

that implement JSON support would be expected to implement this API. The goal is to allow users to write JSON code that
behaves consistently across Starlark implementations.

## Background

JSON is a popular syntax for representing basic data structures. Partial support for parsing or serializing JSON from
Starlark has been independently requested and/or implemented several times:

* Bazel's [`struct`](https://docs.bazel.build/versions/master/skylark/lib/struct.html) type has a `to_json()` method
that can generate JSON documents. Issue [bazelbuild/bazel#7879](https://github.com/bazelbuild/bazel/issues/7879)
requests additional control over the behavior of this method (i.e. optional whitespace).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is addressed in my Go implementation by separating encoding and prettyprinting, in exactly the same way as these concepts are separated in the Go standard library's encoding/json package. Existing JSON strings can be prettyprinted in a single pass with very little state, without the need to decode and reencode.


* [bazelbuild/bazel#3732](https://github.com/bazelbuild/bazel/issues/3732) requests support for parsing JSON from
within a Bazel repository rule, for use with introspecting the state of external tools.

* [`json_parser.bzl`](https://github.com/erickj/bazel_json) is a JSON parser implemented entirely within Starlark.
Limitations of the Starlark language prevent this parser from being recommended for production use.

* [google/starlark-go#179](https://github.com/google/starlark-go/pull/179) proposes a JSON module for the Go
implementation of starlark.

* [Skycfg](https://github.com/stripe/skycfg) has a `json.marshal()` method that can generate JSON documents, for use
with config formats based on JSON syntax.

JSON implementations in the broader community have wildly variant APIs. It is likely that ad-hoc JSON extensions to
Starlark will have different APIs between Starlark implementations.

## Proposed API

The JSON module is a value named `json` in the global namespace. Its API comprises the following functions. Starlark
implementations are not required to support the entire API, but should avoid extending the API with non-standard
functions or parameters.

JSON implementations should comply with the format documented as
[ECMA-404](https://www.ecma-international.org/publications/standards/Ecma-404.htm) and
[RFC 8259](https://www.rfc-editor.org/rfc/rfc8259.html), which supercedes earlier drafts of JSON. In particular, the
only permitted character encoding for contemporary JSON is UTF-8.

To maintain compatibility with existing callers, new required parameters should not be added to these functions. New
optional parameters should be defined using keyword-only parameters
([PEP-3102](https://www.python.org/dev/peps/pep-3102/)).

### json.decode()

The `json.decode()` function decodes a JSON document into a Starlark value.

```python
def json.decode(data):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Is there any reason we wouldn't follow the python json interface, perhaps only the string loads/dumps versions? They might have a few extra flags, but 99% of the uses would suffice with the flags already included in this spec.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with following the Python API here. Historically Bazel hasn't worried too much about cross-compatibility with the Python stdlib, and I thought loads / dumps is a little opaque to people who don't use Python regularly, but I'd accept the functionality under ~any name.

```

Type conversions are:
* JSON arrays are decoded to Starlark `list` values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add: new, unfrozen list values. Ditto for dicts.

* JSON objects are decoded to Starlark `dict` values. Keys are in the same order as the input data.
* JSON `true`, `false`, and `null` literals are decoded to Starlark `True`, `False`, and `None` respectively.
* JSON strings are decoded to Starlark `string` values.
* JSON numbers with no fractional component are decoded to Starlark `int` values. Starlark implementations without
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go implementation now emits a string of decimal integers for a Starlark int value, no matter how large; it does not truncate big ints to uint64 or int53 or int32, nor use scientific notation. This does mean that decoder implementations with integer width limitations may not be able to read these files. However, the meaning of the JSON files is quite clear.

The Go impl also uses a %g floating point representation for a Starlark value of type float, even if the value is integral. This means round-tripping Starlark values via JSON preserves int vs float representation type, which seems desirable. (The decoder uses the presence of a decimal point to indicate 'float'.)

arbitrary-precision integers should reject numbers that exceed their supported range.
* JSON numbers with a fractional component may be decoded to an arbitrary-precision or floating-point value, if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it conforming for implementations without floating-point values to keep them as a string? Or do they have to error?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An implicit type conversion of float to string would be very surprising to me as a user, so I think I'd prefer the implementation to error if it can't accept a given input value.

supported by the current Starlark implementation.
* Starlark implementations without arbitrary-precision numeric values should reject numbers that exceed their
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that rejecting unrepresentable values is the best course of action.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if they are within the bounds, but exceed the precision to represent them unambiguously?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess "range" is a little wrong here, but I mean numbers that the implementation can't represent. In the trivial case, a value 0.00[...]01 with sufficient zeros to overflow a 64-bit IEEE754 would be rejected by most implementations.

I hope that issues of floating point are mostly academic because most Starlark implementations don't allow floats by default, and the primary client codebase (Bazel extensions) doesn't allow floats at all.

supported range.

### json.encode()

```python
json.encode(value, *, indent=None, sort_keys=False)
```

Type conversions are:
* Starlark `list` and `tuple` values are encoded to JSON arrays.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go implementation generalizes this to all iterable sequences (that are not iterable mappings).

* Starlark `dict` values are encoded to JSON objects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go implementation also encodes Starlark struct values as JSON objects.

* Starlark values `True`, `False`, and `None` are encoded to JSON `true`, `false`, and `null` respectively.
* Starlark strings are encoded to JSON strings.
* Starlark `int` values are encoded to JSON numbers.

Starlark implementations may support encoding other types

If `indent` is a number, it is how many spaces to indent by. Indent levels less than 1 will only insert newlines.
If `indent` is `None` (the default), JSON will be encoded in one line with no extra spaces.

If `sort_keys` is `True`, then encoded objects' keys are sorted in lexicographical order. If `sort_keys` is `False`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this feature necessary?

(Update: During review of the Java implementation, we decided that structs and dicts should both sort their fields/keys.)

(the default), then object keys are in the same order as the `dict` keys.
1 change: 1 addition & 0 deletions proposals/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,6 @@ proposal, see the [design process](../process.md).

Last updated | Status | Title | Author(s) | Category
------------ | ------ | ------| ----------| --------
2019-08-26 | Draft | [JSON Module](https://github.com/bazelbuild/starlark/blob/master/proposals/2019-08-26-json-module.md) | [jmillikin](https://john-millikin.com) | Modules
2018-08-17 | Draft | [Genrule setup for Starlark](https://github.com/bazelbuild/starlark/blob/master/proposals/2018-08-17-genrule-setup-for-starlark.md) | [brandjon@](https://github.com/brandjon) | Actions
2018-10-03 | Draft | [ToolchainInfo Schema](https://github.com/bazelbuild/starlark/blob/master/proposals/2018-10-03-toolchaininfo-schema.md) | [brandjon@](https://github.com/brandjon) | Toolchains