Add parser options [WIP] #33

travisbrown · 2017-06-09T13:31:01Z

This is another attempt to fix #31 (see also #32, @jeremyrsmith's PR here, and this discussion).

The key issue (which I've refined a bit in my head since opening #31) is that YAML supports a number of "Language-Independent Types" that are supported by SnakeYAML but not (currently) circe-yaml.

One missing feature is merge keys (see example in #31). Another is YAML's more lenient representation of booleans, integers, etc.:

// Current behavior:
scala> io.circe.yaml.parser.parse("answer: NO").map(_.noSpaces)
res0: scala.util.Either[io.circe.ParsingFailure,String] = Left(io.circe.ParsingFailure: For input string: "NO")

// This PR:
scala> io.circe.yaml.parser.parse("answer: NO").map(_.noSpaces)
res0: scala.util.Either[io.circe.ParsingFailure,String] = Right({"answer":false})

Another (more problematic) one is timestamps. In a current Jackson-powered pipeline for working with YAML in circe, we're relying on the fact that Jackson's YAML processing transforms YAML's (kind of ridiculous) timestamp representation to epoch seconds. This is really nice in that it means that we don't have to reproduce YAML's silly rules for what counts as a timestamp for ourselves on the circe decoding side. In the current version of circe-yaml, YAML timestamps are just passed through to the JSON unchanged as strings, which means we do.

The advantage of circe-yaml's current approach is that no information (timezone, etc.) gets dropped. These YAML things are called timestamps, though, so I'm not sure I care that much about losing timezone information, and even if we only supported decoding YAML's timestamp as longs, users would still be able to put their dates in YAML strings if they really wanted the exact representation to be available in their JSON.

(We could also transform YAML timestamps into JSON strings with a single standard date format instead of epoch seconds, but the long approach has the advantage of matching what Jackson does and not making us choose the format).

What I've done in this PR is to add a new Parser case class with options for a few language-independent types so that users don't have to choose (or contort their YAML). By default all of the language-independent types mentioned above are enabled, but they can all be turned off if people want the old behavior.

I've also taken a quick stab at fixing stack safety for deep documents with EitherT. I haven't tested it, and that really should have happened in a different PR, but oh well.

This is just a sketch, and needs several things before it could be merged (if we decide to go this route):

Tests for both sides of the options.
Some documentation for the options, with links.
Tests for stack-safety (or just skip that for now).
Miscellaneous clean-up.
Support for YAML's binary type?

Note that while this PR is source compatible with 0.6.1, it's not binary compatible (unlike the simpler version in #32).

codecov-io · 2017-06-09T13:31:04Z

Codecov Report

Merging #33 into master will decrease coverage by 9.37%.
The diff coverage is 79.66%.

@@            Coverage Diff             @@
##           master      #33      +/-   ##
==========================================
- Coverage     100%   90.62%   -9.38%     
==========================================
  Files           4        5       +1     
  Lines         102      128      +26     
  Branches        6       13       +7     
==========================================
+ Hits          102      116      +14     
- Misses          0       12      +12

Impacted Files	Coverage Δ
src/main/scala/io/circe/yaml/package.scala	`100% <100%> (ø)`
src/main/scala/io/circe/yaml/Parser.scala	`79.31% <79.31%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a27c48f...971faf0. Read the comment docs.

non · 2017-06-09T16:02:26Z

@travisbrown I like your approach. Given the various impedance mismatches that are possible, I think the best option is to have defaults that support all of YAML, but allow particular callers to disable things they don't want to support. This also supports cases where there's not a good one-size-fits-all option available.

What would an alternative to this be, if any? I don't see any other options that are as good as something like this.

travisbrown · 2017-06-09T17:03:43Z

@non The alternative would be picking a standard set of language-independent types that are supported (merge, maybe bool, probably not timestamp) and making those the behavior instead of complicating things by adding options to Parser.

Since Printer already has parameters I don't necessarily see that as a big hurdle, but I can understand @jeremyrsmith's concern.

This approach moves the node handling to an object algebra. This allows for relatively easily adding configurable stuff that's able to change how various YAML types are handled as the need arises. One can even plug in their own algebra if they want to get nitty-gritty. It also allows adding optional error accumulation, which is something that's always felt missing. A nice thing here is by default we don't have to even check any configuration - only when a parser instance is configured do we change to `ConfiguredAlg`. Also should be binary compatible (and `yaml.parser.configure(...)` is a nice way to get a configured parser!). TODO: Add tests around error accumulation and such (coverage currently lacking)

travisbrown · 2017-06-10T00:35:43Z

Closing in favor of @jeremyrsmith's #33.

travisbrown-stripe added 2 commits June 9, 2017 07:58

Support merge keys

9c61c57

Sketch of Parser with options, stack-safety

971faf0

travisbrown closed this Jun 10, 2017

jeremyrsmith mentioned this pull request Oct 9, 2020

WIP - An alternative approach to #31 (see also #33) #34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parser options [WIP] #33

Add parser options [WIP] #33

travisbrown commented Jun 9, 2017 •

edited

Loading

codecov-io commented Jun 9, 2017 •

edited

Loading

non commented Jun 9, 2017

travisbrown commented Jun 9, 2017

travisbrown commented Jun 10, 2017

Add parser options [WIP] #33

Add parser options [WIP] #33

Conversation

travisbrown commented Jun 9, 2017 • edited Loading

codecov-io commented Jun 9, 2017 • edited Loading

Codecov Report

non commented Jun 9, 2017

travisbrown commented Jun 9, 2017

travisbrown commented Jun 10, 2017

travisbrown commented Jun 9, 2017 •

edited

Loading

codecov-io commented Jun 9, 2017 •

edited

Loading