-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
starlarkjson: a standard json module for Starlark #179
Conversation
starlark/testdata/json.star
Outdated
# Exercise JSON string coding by round-tripping a string with every 16-bit code point. | ||
def codec(x): | ||
return json.decode(json.encode(x)) | ||
codepoints = ''.join(['%c' %c for c in range(65536)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: add space between %
operator and c
identifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Funny! Fixed.
starlarkjson/json.go
Outdated
func quote(buf *bytes.Buffer, s string) { | ||
if goQuoteIsSafe(s) { | ||
var quoteSpace [128]byte | ||
buf.Write(strconv.AppendQuote(quoteSpace[:0], s)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you trying to avoid memory allocation with this case? I'm not sure how inlining and escape analysis will interact, but it seems like quoteSpace
may escape, and you'll end up making a copy on the heap anyway.
Perhaps instead, narrow the optimized case to strings that don't contain anything that needs to be escaped (adding "
and \
to the list of bad runes). Then you could use buf.WriteString(s)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quite right; this was reduced from code in herb's JSON encoder which reused the quoteSpace variable as an object field across many calls, but it makes no sense here.
I've moved quote and quoteSpace inside encode, so the closure acts like an object and quoteSpace like a field.
I have an outstanding request with Bazel to add JSON support in the Java implementation (bazelbuild/bazel#3732). To see if we can settle on a reasonable shared JSON API, I sent bazelbuild/starlark#83 as a proposed standard JSON module in the Starlark spec. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Jay, I have finally gotten back to this CL (motivated by yet more bugs in struct.to_json). I've rewritten the decode function and added more tests. Care to take another look?
thanks
A
I'll take a look on Monday if that's all right. I'm just heading out now, and taking a vacation day tomorrow. |
Is something about this CL not giving you a sense of utmost urgency? ; ) Enjoy your vacation. |
starlarkjson/json.go
Outdated
"go.starlark.net/starlarkstruct" | ||
) | ||
|
||
// Module is a Starlark module of JSON-related functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Module json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
starlarkjson/json.go
Outdated
// which it converts to JSON by cases: | ||
// - A Starlark value that implements Go's standard json.Marshal | ||
// interface defines its own JSON encoding. | ||
// - None, True, and False are converted their corresponding JSON atoms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are converted to...
Though maybe just: "null, true, and false, respectively."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
starlarkjson/json.go
Outdated
} | ||
|
||
// isASCII reports whether s contains only ASCII. | ||
func isASCII(s string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: rename to isPrintableASCII
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// In principle, we could parameterize it to allow the caller to | ||
// control the returned types, but there's no compelling need yet. | ||
|
||
type failure string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A brief comment here would be useful. It's not obvious why this is needed without scrolling much further down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return starlark.String(r) | ||
|
||
case 'n': | ||
if strings.HasPrefix(s[i:], "null") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given a string like nulll
, I think this will read null
, then report an unexpected character l
after that.
Error messages would be a little clearer if you scan a run of identifier characters, then check if they are one of null
, true
, false
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it this way so that the parser never consumes more than it needs, as we may want a stream decoder in the future (like json.NewDecoder in Go), from which you can read adjacent values such as "truefalse".
starlarkjson/json.go
Outdated
i = j | ||
|
||
// Unlike most C-like languages, | ||
// JSON disallows a leading before a digit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a leading 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
starlarkjson/json.go
Outdated
|
||
// Unlike most C-like languages, | ||
// JSON disallows a leading before a digit. | ||
if z := boolint(num[0] == '-'); num[z] == '0' && z+1 < len(num) && isdigit(num[z+1]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think num == "-"
is possible here, so this might panic.
Also, the boolint
logic is a little confusing. Maybe drop boolint
and spell it out here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixed, simplified, and tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
if isASCII(s) { | ||
buf.Write(strconv.AppendQuote(quoteSpace[:0], s)) | ||
} else { | ||
data, _ := json.Marshal(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just learned about RFC 8259, which mandates UTF-8 encodings for JSON. I haven't looked at the details yet, but perhaps this would simplify the encoder here.
starlarkjson/json.go
Outdated
"go.starlark.net/starlarkstruct" | ||
) | ||
|
||
// Module is a Starlark module of JSON-related functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
starlarkjson/json.go
Outdated
// which it converts to JSON by cases: | ||
// - A Starlark value that implements Go's standard json.Marshal | ||
// interface defines its own JSON encoding. | ||
// - None, True, and False are converted their corresponding JSON atoms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
starlarkjson/json.go
Outdated
} | ||
|
||
// isASCII reports whether s contains only ASCII. | ||
func isASCII(s string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
// In principle, we could parameterize it to allow the caller to | ||
// control the returned types, but there's no compelling need yet. | ||
|
||
type failure string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
return starlark.String(r) | ||
|
||
case 'n': | ||
if strings.HasPrefix(s[i:], "null") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it this way so that the parser never consumes more than it needs, as we may want a stream decoder in the future (like json.NewDecoder in Go), from which you can read adjacent values such as "truefalse".
starlarkjson/json.go
Outdated
i = j | ||
|
||
// Unlike most C-like languages, | ||
// JSON disallows a leading before a digit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
starlarkjson/json.go
Outdated
|
||
// Unlike most C-like languages, | ||
// JSON disallows a leading before a digit. | ||
if z := boolint(num[0] == '-'); num[z] == '0' && z+1 < len(num) && isdigit(num[z+1]) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Fixed, simplified, and tested.
This change defines a standard Starlark module for JSON encoding and decoding. See json.go for documentation. It is intended to subsume, generalize, and eventually replace Bazel's ill-conceived struct.to_json method. The json module is predeclared in the Starlark REPL environment. See related issues: bazelbuild/bazel#7896 https://buganizer.corp.google.com/issues/23962735 https://buganizer.corp.google.com/issues/70210417 bazelbuild/bazel#7879 (comment) bazelbuild/bazel#5542 bazelbuild/bazel#10176 bazelbuild/starlark#83 bazelbuild/bazel#3732 Change-Id: I297ffaee9349eedeeb52f5a88f40636a4095f997
This is a sketch of a standard module for JSON encoding/decoding for Starlark.
It is intended to subsume, generalize, and eventually
replace the ill-conceived struct.to_json method.
See related issues:
bazelbuild/bazel#7896
https://b/23962735 (Googlers only)
https://b/70210417 (Googlers only)
bazelbuild/bazel#7879 (comment)
bazelbuild/bazel#5542