-
-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async (non-blocking) support for CSV #287
Comments
Full source. |
Currently only JSON and Smile format backends support async parsing. Some others (like CBOR) might be relatively straight-forward to support. Fundamentally there is nothing preventing CSV from being supported, but someone would have to spend quite a bit of time to implement it all -- and it probably could not use much code from JSON or Smile codecs due to decoding being rather different. However: I think your question is more related to Spring-side of things, so Spring WebFlux folks/user community can probably talk more about intervening functionality, requirements. |
Hello, just checking if there is any update on this? I would very much like to use async CSV parsing if/when it becomes avaialble. We have a data component that converts to/from a number of supported formats using Apache Arrow as the common intermediate. We can receive incoming JSON with true streaming, but for incoming CSV we have to insert a buffering stage. For large datasets we're convering CSV -> JSON in the client, if we had CSV streaming we could send large CSV files straight up which would be a lot faster. |
No update; I don't really have time to work on this, although if someone was to tackle it, I'd do my best to help. One thing to note tho is that for just simple streaming, module already has that. Amount of buffering used by default is not much more than JSON parsing (no requirement to decode full line if I recall), basically only needs one full cell. |
Hello, thanks forgetting back to me. I can't promise if/when I'll get any time (such is life!) but let me see if I've got the shape of the problem:
Is that the shape of it or am I way off the mark? Appreciate there's a lot more detail! I was slightly confused by the "simple streaming" bit (I saw this on the README page as well) - is this referring to the implementation, i.e. the decoder doesn't buffer the whole content from the stream, it consumes tokens one at a time? I haven't looked at all at the object mapper level, I don't use it myself. |
@martin-traverse you can take a look at how the non-blocking json parsers in jackson-core are implemented. |
Thanks @yawkat just had a quick look. Seems like it required a largely separate implementation for both UTF-8 decoding and JSON parser / state logic. The decoding bit could perhaps be factored out so it can be shared between parsers? I'll try to find some time to sketch out a PR - can't promise though, the next few weeks are pretty busy.... |
Right, I think async decoder for CSV would probably quite a bit simpler, but some aspects (UTF-8 decoding) are similar. I doubt refactoring is possible, however, partly since combining UTF-8 character decoding and tokenization is (for JSON and Smile at least) an important reason for good performance. The part that differs is the state machine, needed to keep exact state for cases that you mention (end of content within token, or even within one UTF-8 character). And yes, trying to support encodings other than UTF-8 would be tricky with approach I used for JSON and Smile codecs (and Aalto XML as well). I guess additional complexity for CSV would be configurable escaping settings. Alternatively a completely different approach would be one where decoding of character encoding was separate from tokenization. This would probably be slightly simpler; and first part in particular could probably be done for a buffer at a time (and only decoding to be incremental). There'd be some more work in syncing those layers but that could probably lead to somewhat more reusable code -- I wouldn't retrofit that for JSON/Smile for various reasons, but it could be used for other textual formats for sure. |
This question relates to usage of Jackson DataFormats CSV within Spring WebFlux.
I have a REST controller (Spring WebFlux) which returns a Flux from a Stream of a simple bean (MapInfo). The endpoint serves application/json (only for demo) and application/x-ndjson (for large datasets). I'd like to add support for text/csv also.
Invoking this for text/csv results in HTTP 500 with message:
"No Encoder for [com.mizuho.fo.dataservices.hc.controller.MapInfoController$MapInfo] with preset Content-Type 'null'"
I have not yet added any Jackson Dataformat libraries so this is expected.
Question: Does Jackson Dataformat CSV have the necessary non-blocking support in order for it to be used to convert the payload to CSV?
Of course I could be imperative and write code to format as CSV, returning a Flux of String for instance, but my hope is to effect CSV output without lowering the abstraction level beyond what is already there (returning a Flux from a Stream of the POJO).
The text was updated successfully, but these errors were encountered: