Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parser options [WIP] #33

Closed
wants to merge 2 commits into from
Closed

Add parser options [WIP] #33

wants to merge 2 commits into from

Conversation

travisbrown
Copy link
Member

@travisbrown travisbrown commented Jun 9, 2017

This is another attempt to fix #31 (see also #32, @jeremyrsmith's PR here, and this discussion).

The key issue (which I've refined a bit in my head since opening #31) is that YAML supports a number of "Language-Independent Types" that are supported by SnakeYAML but not (currently) circe-yaml.

One missing feature is merge keys (see example in #31). Another is YAML's more lenient representation of booleans, integers, etc.:

// Current behavior:
scala> io.circe.yaml.parser.parse("answer: NO").map(_.noSpaces)
res0: scala.util.Either[io.circe.ParsingFailure,String] = Left(io.circe.ParsingFailure: For input string: "NO")

// This PR:
scala> io.circe.yaml.parser.parse("answer: NO").map(_.noSpaces)
res0: scala.util.Either[io.circe.ParsingFailure,String] = Right({"answer":false})

Another (more problematic) one is timestamps. In a current Jackson-powered pipeline for working with YAML in circe, we're relying on the fact that Jackson's YAML processing transforms YAML's (kind of ridiculous) timestamp representation to epoch seconds. This is really nice in that it means that we don't have to reproduce YAML's silly rules for what counts as a timestamp for ourselves on the circe decoding side. In the current version of circe-yaml, YAML timestamps are just passed through to the JSON unchanged as strings, which means we do.

The advantage of circe-yaml's current approach is that no information (timezone, etc.) gets dropped. These YAML things are called timestamps, though, so I'm not sure I care that much about losing timezone information, and even if we only supported decoding YAML's timestamp as longs, users would still be able to put their dates in YAML strings if they really wanted the exact representation to be available in their JSON.

(We could also transform YAML timestamps into JSON strings with a single standard date format instead of epoch seconds, but the long approach has the advantage of matching what Jackson does and not making us choose the format).

What I've done in this PR is to add a new Parser case class with options for a few language-independent types so that users don't have to choose (or contort their YAML). By default all of the language-independent types mentioned above are enabled, but they can all be turned off if people want the old behavior.

I've also taken a quick stab at fixing stack safety for deep documents with EitherT. I haven't tested it, and that really should have happened in a different PR, but oh well.

This is just a sketch, and needs several things before it could be merged (if we decide to go this route):

  1. Tests for both sides of the options.
  2. Some documentation for the options, with links.
  3. Tests for stack-safety (or just skip that for now).
  4. Miscellaneous clean-up.
  5. Support for YAML's binary type?

Note that while this PR is source compatible with 0.6.1, it's not binary compatible (unlike the simpler version in #32).

@codecov-io
Copy link

codecov-io commented Jun 9, 2017

Codecov Report

Merging #33 into master will decrease coverage by 9.37%.
The diff coverage is 79.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #33      +/-   ##
==========================================
- Coverage     100%   90.62%   -9.38%     
==========================================
  Files           4        5       +1     
  Lines         102      128      +26     
  Branches        6       13       +7     
==========================================
+ Hits          102      116      +14     
- Misses          0       12      +12
Impacted Files Coverage Δ
src/main/scala/io/circe/yaml/package.scala 100% <100%> (ø)
src/main/scala/io/circe/yaml/Parser.scala 79.31% <79.31%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a27c48f...971faf0. Read the comment docs.

@non
Copy link

non commented Jun 9, 2017

@travisbrown I like your approach. Given the various impedance mismatches that are possible, I think the best option is to have defaults that support all of YAML, but allow particular callers to disable things they don't want to support. This also supports cases where there's not a good one-size-fits-all option available.

What would an alternative to this be, if any? I don't see any other options that are as good as something like this.

@travisbrown
Copy link
Member Author

@non The alternative would be picking a standard set of language-independent types that are supported (merge, maybe bool, probably not timestamp) and making those the behavior instead of complicating things by adding options to Parser.

Since Printer already has parameters I don't necessarily see that as a big hurdle, but I can understand @jeremyrsmith's concern.

jeremyrsmith added a commit that referenced this pull request Jun 10, 2017
This approach moves the node handling to an object algebra. This allows
for relatively easily adding configurable stuff that's able to change
how various YAML types are handled as the need arises. One can even plug
in their own algebra if they want to get nitty-gritty.

It also allows adding optional error accumulation, which is something
that's always felt missing.

A nice thing here is by default we don't have to even check any
configuration - only when a parser instance is configured do we change
to `ConfiguredAlg`.

Also should be binary compatible (and `yaml.parser.configure(...)` is a
nice way to get a configured parser!).

TODO: Add tests around error accumulation and such (coverage currently
lacking)
@travisbrown
Copy link
Member Author

Closing in favor of @jeremyrsmith's #33.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support merge keys
4 participants