Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streaming support for Data.Text #3

Open
mjepronk opened this issue Aug 21, 2018 · 7 comments
Open

Streaming support for Data.Text #3

mjepronk opened this issue Aug 21, 2018 · 7 comments

Comments

@mjepronk
Copy link

Would it be possible to add streaming support for Data.Attoparsec.Text?
If so, would you accept patches for Text support to this library?

@fosskers
Copy link
Collaborator

fosskers commented Aug 21, 2018

Caveat: I am one of the maintainers of this package but not the original author.

I imagine that ByteString was chosen for raw speed, since there isn't overhead in processing locale (i.e. which you would've had to have done to get Text to begin with).

Overall I'm not against Text support, so long as it doesn't:

  • slow down the ByteString parts
  • mangle the API (too much)
  • mangle the code

@mjepronk
Copy link
Author

mjepronk commented Aug 21, 2018

Yes, I understand that Text has an additional decoding step with regards to ByteString. However, I do not want to handle decoding of Unicode characters (UTF-8) in my parser.

I was thinking to basically copy Data.Attoparsec.ByteString.Streaming into a new module Data.Attoparsec.Text.Streaming and modify the bits needed. However, that will increase the maintenance to basically two almost identical modules. However, I think this is the same approach as taken in Attoparsec where Data.Attoparsec.ByteString and Data.Attoparsec.Text have some duplicate code.

@fosskers
Copy link
Collaborator

My guess is that the ByteString version is "easier" because we're using the ByteString m x type from streaming-bytestring (notice the types here). You might have to do more work to get that to work with Text.

@mjepronk
Copy link
Author

Aah, you're right of course. We're missing a lot of functionality to handle Text with streaming (chunking etc). It would be nice to add this once the streaming ecosystem has the functionality.

I guess I'm going to use pipes instead for my use-case, as it seems to have the required functionality.

@mjepronk
Copy link
Author

I played a little bit more with Attoparsec and it's Text parsers in combination with Streaming, and I came up with the following solution:
https://gist.github.com/mjepronk/81d2d673b80084fd1931448aa0c90371
Would you consider including something like this in streaming-attoparsec?
With this approach, it would require one additional dependency: streaming-commons.

@fosskers
Copy link
Collaborator

fosskers commented Sep 20, 2018

The code looks to do what's advertised, and the exposed API matches the ByteString use-case, so I'd say this would be a good addition.

As for the deps, here's a look at their two dep graphs:

deps
deps

Unioning the two, streaming-commons would only be pulling in async, zlib, network and random as new things, that perhaps might stick out to streaming-attoparsec users who aren't interested in Text. Those libs are pretty common though, and are likely to already exist in the dep graph of a significant application, so I don't personally think they're a problem.

Okay, if you can put a PR together, we can then talk about docstrings, tests, performance comparisons, etc. Overall I think this will be a good addition :) Thank you for sticking with it.

@mjepronk
Copy link
Author

OK, thank you for the feedback. I will have a stab at this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants