Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: JSON Module #83

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jmillikin-stripe
Copy link

This document proposes an API for encoding and decoding JSON from Starlark. If accepted, implementations of Starlark that implement JSON support would be expected to implement this API. The goal is to allow users to write JSON code that behaves consistently across Starlark implementations.

The `json.decode()` function decodes a JSON document into a Starlark value.

```python
def json.decode(data):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool! Is there any reason we wouldn't follow the python json interface, perhaps only the string loads/dumps versions? They might have a few extra flags, but 99% of the uses would suffice with the flags already included in this spec.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with following the Python API here. Historically Bazel hasn't worried too much about cross-compatibility with the Python stdlib, and I thought loads / dumps is a little opaque to people who don't use Python regularly, but I'd accept the functionality under ~any name.

@bpowers
Copy link

bpowers commented Feb 24, 2020

@laurentlb do you (or anyone else) have concerns with this?

I'd love to see this merged so that we could move towards an implementation. A tool I'm working on needs JSON, and I've ended up copy-pasting files from this PR as there isn't a standard here yet: google/starlark-go#179

alandonovan pushed a commit to google/starlark-go that referenced this pull request Jun 11, 2020
This change defines a standard Starlark module for JSON encoding and decoding.
See json.go for documentation.

It is intended to subsume, generalize, and eventually
replace Bazel's ill-conceived struct.to_json method.

The json module is predeclared in the Starlark REPL environment.

See related issues:
bazelbuild/bazel#7896
https://buganizer.corp.google.com/issues/23962735
https://buganizer.corp.google.com/issues/70210417
bazelbuild/bazel#7879 (comment)
bazelbuild/bazel#5542
bazelbuild/bazel#10176
bazelbuild/starlark#83
bazelbuild/bazel#3732

Change-Id: I297ffaee9349eedeeb52f5a88f40636a4095f997
alandonovan pushed a commit to google/starlark-go that referenced this pull request Jun 11, 2020
This change defines a standard Starlark module for JSON encoding and decoding.
See json.go for documentation.

It is intended to subsume, generalize, and eventually
replace Bazel's ill-conceived struct.to_json method.

The json module is predeclared in the Starlark REPL environment.

See related issues:
bazelbuild/bazel#7896
https://buganizer.corp.google.com/issues/23962735
https://buganizer.corp.google.com/issues/70210417
bazelbuild/bazel#7879 (comment)
bazelbuild/bazel#5542
bazelbuild/bazel#10176
bazelbuild/starlark#83
bazelbuild/bazel#3732
Copy link
Contributor

@ndmitchell ndmitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of the JSON API, and the description here seems quite good. A few questions:

What is the diff to the spec? Does this proposal go as a brand new file in the repo? Assuming its accepted, what diff would we apply to the spec?

What is a module? How do you in a standards conforming way import the "json" module? I couldn't see any existing modules in the spec (maybe I missed them?). Or maybe you mean json is an always available name with some fields on it? If so, it feels a bit weird to introduce a whole new mechanism not present elsewhere in the spec for just two functions. Or maybe you want to use this mechanism more widely?

* JSON strings are decoded to Starlark `string` values.
* JSON numbers with no fractional component are decoded to Starlark `int` values. Starlark implementations without
arbitrary-precision integers should reject numbers that exceed their supported range.
* JSON numbers with a fractional component may be decoded to an arbitrary-precision or floating-point value, if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it conforming for implementations without floating-point values to keep them as a string? Or do they have to error?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An implicit type conversion of float to string would be very surprising to me as a user, so I think I'd prefer the implementation to error if it can't accept a given input value.

arbitrary-precision integers should reject numbers that exceed their supported range.
* JSON numbers with a fractional component may be decoded to an arbitrary-precision or floating-point value, if
supported by the current Starlark implementation.
* Starlark implementations without arbitrary-precision numeric values should reject numbers that exceed their
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if they are within the bounds, but exceed the precision to represent them unambiguously?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess "range" is a little wrong here, but I mean numbers that the implementation can't represent. In the trivial case, a value 0.00[...]01 with sufficient zeros to overflow a 64-bit IEEE754 would be rejected by most implementations.

I hope that issues of floating point are mostly academic because most Starlark implementations don't allow floats by default, and the primary client codebase (Bazel extensions) doesn't allow floats at all.

@jmillikin-stripe
Copy link
Author

jmillikin-stripe commented Jul 16, 2020

What is the diff to the spec? Does this proposal go as a brand new file in the repo? Assuming its accepted, what diff would we apply to the spec?

No idea! Happy to format this in whatever way seems reasonable to you. It doesn't seem like there's much prior art here, so I formatted it as an RFC-type file.

What is a module? How do you in a standards conforming way import the "json" module? I couldn't see any existing modules in the spec (maybe I missed them?). Or maybe you mean json is an always available name with some fields on it?

Bazel defines a list of "global modules" (https://docs.bazel.build/versions/3.4.0/skylark/lib/skylark-overview.html#--global-modules) but doesn't go into detail about what exactly a "module" is at the language level. For this proposal I had imagined a new global symbol named json that would be present if the current implementation supported JSON. The presence of json is optional, but if present it should conform to this API.

If so, it feels a bit weird to introduce a whole new mechanism not present elsewhere in the spec for just two functions. Or maybe you want to use this mechanism more widely?

If this proposal is successful + accepted then I would like to propose more modules, specifically proto (for protobufs as used in Skycfg) and yaml (for use in Bazel, parsing lock files from the JavaScript ecosystem).

@ndmitchell
Copy link
Contributor

Bazel defines a list of "global modules"

But Starlark doesn't. Maybe we need a corresponding piece of the Starlark spec that says what a "module" is, whether it needs loading, how it behaves, whether it's a first-class value etc.

@alandonovan
Copy link
Contributor

Hi, back from vacation. Sorry this proposal has been neglected and that this has process not been a model of good open-source stewardship.

Regarding the the project: Starlark/Java was until very recently highly entangled with Bazel, though after a year of work nearly all its production dependencies on "Bazel proper" are now severed. However, the tests are still entangled, and there are numerous subtle constraints imposed on Starlark by Bazel that I plan to break soon to unblock significant optimizations (e.g. flat environments, byte code compilation). In addition we plan to move the Java classes to a different package (java.starlark.net) to emphasize the boundary between components, though we have no plans yet to host them in a separate repository; that would of course require the API to be refined, stabilized, and frozen. I believe the separation is in the long term interests of the Bazel project and its users, especially if Starlark becomes a widely used and more familiar language for configuration, but for the Bazel team it does pose immediate costs (in ceded control), and the team has yet to make any public commitments about Starlark.

Regarding JSON: In June I committed https://github.com/google/starlark-go/blob/master/starlarkjson/json.go#L26, a JSON module I wrote some years ago for the Go implementation of Starlark (go.starlark.net). I believe its interface addresses all the needs met by the one proposed here, and indeed was influenced by it. It is not too late to change it, so if you see any shortcomings of deficiencies, please let me know. Otherwise I propose that we port it directly to Java.

Regarding modules: Starlark doesn't yet have a standard library of packages, though there is clearly a recurring need for things like json, time, http, and html. (Brendan @b5 at https://github.com/qri-io/starlib has built many good candidates for standardization.) We don't even have a clear decision yet on how such packages should be imported. Sometimes we use load(), other times the names are predeclared (as I did with 'json' in the Go Starlark REPL, though I don't think it's very satisfactory). Personally I would like to extend load so that modules are first-class, e.g. something like load("foo.bzl"); foo.F(), similar to Python's import statement, as many users resort to doing this by hand using structs like os = struct(mkdir = _mkdir, ...).

Regarding proto: I have ported the Go implementation of the Starlark proto module so that it depends only on open-source proto reflection (which is new), so I should be able to open-source it. However, proto support for Starlark/Java will be tricky, because it has no means of representing byte strings, floating-point numbers, or integers outside the signed 32-bit range, and each of those poses surprisingly thorny problems.

BTW: Nick: I enjoyed both of your papers on build systems!

bazel-io pushed a commit to bazelbuild/bazel that referenced this pull request Oct 19, 2020
This CL ports the go.starlark.net/starlarkjson module from Go to Java.
The json module provides four functions:

  json.decode
  json.encode
  json.indent        (not yet implemented)
  json.encode_indent (not yet implemented)

It is tested through eval.ScriptTest, which adds the json module to
its environment, along with 'struct', a simple struct-like type.

Some tests are commented out, awaiting StarlarkFloat, or richer string support.

This is a first step towards removing Bazel's struct.to_json.

Updates bazelbuild/starlark#83

PiperOrigin-RevId: 337944489
@alandonovan
Copy link
Contributor

I just committed bazelbuild/bazel@6e47a40, which adds a json module to Starlark/Java. Please take a close look at the Starlark and Java interfaces, and try it out, and comment here if notice any problems.

Thanks to @brandjon for many very careful rounds of review.

@laurentlb
Copy link
Contributor

What's the use-case for the prefix argument in json.indent?

@alandonovan
Copy link
Contributor

What's the use-case for the prefix argument in json.indent?

Perhaps the caller wants to insert the result inside some other piece of text, or JSON header/footer, that imposes its own indentation.

bazel-io pushed a commit to bazelbuild/bazel that referenced this pull request Oct 21, 2020
This change predeclares 'json', the new Starlark module
for JSON encoding/decoding/indenting, in all Bazel Starlark
environments (alongside depset, select, etc).

The new function works for any legal value, not just struct,
and avoids polluting the struct field namespace.

struct.to_json is deprecated, along with struct.to_proto,
and will be disabled by the new flag --incompatible_struct_has_no_methods.
(The replacement for to_proto will be added shortly.)

Updates bazelbuild/starlark#83
Updates #3732
Updates #1813

RELNOTES: The Starlark json module is now available.
Use json.encode(x) to encode a Starlark value as JSON.
struct.to_json(x) is deprecated and will be disabled by
the --incompatible_struct_has_no_methods flag.
PiperOrigin-RevId: 338259618
@alandonovan
Copy link
Contributor

Both the Java and Go implementations now have compatible implementations of the json module. Closing.

@jmillikin-stripe
Copy link
Author

Since it's been implemented, shouldn't this PR be merged rather than closed? Optionally with updates to reflect whatever common API was decided upon by the Java/Go impl devs.

@alandonovan
Copy link
Contributor

Er, yes, sorry about that. I just noticed that I also have a bunch of old pending comments on your doc from months ago that some how never got flushed out; I'll do that now.

The Java implementation is here:
https://github.com/bazelbuild/bazel/blob/267bdaab2f0596b65b03c06c82e3f91d957fec3b/src/main/java/net/starlark/java/lib/json/Json.java
and it should match the behavior of the Go implementation here:
https://github.com/google/starlark-go/blob/master/starlarkjson/json.go
at least as far as the Starlark caller is concerned.

@alandonovan alandonovan reopened this Nov 11, 2020

## Abstract

This document proposes an API for encoding and decoding JSON from Starlark. If accepted, implementations of Starlark
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sending this. I'm not sure how it escaped my attention till today, when I happened to resume work on the go.starlark.net JSON encoder/decoder (google/starlark-go#179) that you mention below.


* Bazel's [`struct`](https://docs.bazel.build/versions/master/skylark/lib/struct.html) type has a `to_json()` method
that can generate JSON documents. Issue [bazelbuild/bazel#7879](https://github.com/bazelbuild/bazel/issues/7879)
requests additional control over the behavior of this method (i.e. optional whitespace).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is addressed in my Go implementation by separating encoding and prettyprinting, in exactly the same way as these concepts are separated in the Go standard library's encoding/json package. Existing JSON strings can be prettyprinted in a single pass with very little state, without the need to decode and reencode.

```

Type conversions are:
* JSON arrays are decoded to Starlark `list` values.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add: new, unfrozen list values. Ditto for dicts.

* JSON objects are decoded to Starlark `dict` values. Keys are in the same order as the input data.
* JSON `true`, `false`, and `null` literals are decoded to Starlark `True`, `False`, and `None` respectively.
* JSON strings are decoded to Starlark `string` values.
* JSON numbers with no fractional component are decoded to Starlark `int` values. Starlark implementations without
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go implementation now emits a string of decimal integers for a Starlark int value, no matter how large; it does not truncate big ints to uint64 or int53 or int32, nor use scientific notation. This does mean that decoder implementations with integer width limitations may not be able to read these files. However, the meaning of the JSON files is quite clear.

The Go impl also uses a %g floating point representation for a Starlark value of type float, even if the value is integral. This means round-tripping Starlark values via JSON preserves int vs float representation type, which seems desirable. (The decoder uses the presence of a decimal point to indicate 'float'.)

arbitrary-precision integers should reject numbers that exceed their supported range.
* JSON numbers with a fractional component may be decoded to an arbitrary-precision or floating-point value, if
supported by the current Starlark implementation.
* Starlark implementations without arbitrary-precision numeric values should reject numbers that exceed their
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that rejecting unrepresentable values is the best course of action.

```

Type conversions are:
* Starlark `list` and `tuple` values are encoded to JSON arrays.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go implementation generalizes this to all iterable sequences (that are not iterable mappings).


Type conversions are:
* Starlark `list` and `tuple` values are encoded to JSON arrays.
* Starlark `dict` values are encoded to JSON objects.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Go implementation also encodes Starlark struct values as JSON objects.

If `indent` is a number, it is how many spaces to indent by. Indent levels less than 1 will only insert newlines.
If `indent` is `None` (the default), JSON will be encoded in one line with no extra spaces.

If `sort_keys` is `True`, then encoded objects' keys are sorted in lexicographical order. If `sort_keys` is `False`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this feature necessary?

(Update: During review of the Java implementation, we decided that structs and dicts should both sort their fields/keys.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants