Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pondering implementing InputStream and OutputStream #5

Open
helins opened this issue Jun 3, 2021 · 4 comments
Open

Pondering implementing InputStream and OutputStream #5

helins opened this issue Jun 3, 2021 · 4 comments

Comments

@helins
Copy link
Owner

helins commented Jun 3, 2021

@sh54 Following your comment in #2.

I did consider adding support for InputStream and OutputStream some time ago. The protocols are split in different categories so that in theory it is expected that a type might not implement everything (eg. a stream which does not have absolute positioning). However in the end I've decided not to do it since those interfaces are indeed limited. It is not the same thing but for some IO problems you can used memory-mapping. And with NIO ByteBuffer is first-class so I didn't feel it was urgent to implement streams.

However I keep an open mind. Do you envision a useful case for those interfaces? Given that they could implement only a small portions of the protocols.

@sh54
Copy link
Contributor

sh54 commented Jun 5, 2021

All the stuff I am writing right now has ByteBuffer and js/DataView as the optimal underlying.

I could see enough reasons to add support for stream classes:

  1. You may not have a "choice" about the underlying that you are coding against. If another library you are working against gives you a stream then it is nice to be able to just wrap it up in a binf/view. The user can always create their own implementation but InputStream and OutputStream feel common enough for core binf to cover them.
  2. If you can limit yourself to reading and writing sequentially then a stream is generally going to be more memory efficient than allocating a ByteBuffer. The big counter example being if the underlying is a MappedByteBuffer.
  3. If for some reason you are manipulating a bunch of large mmapped files on 32 bit system then you may run out of virtual memory. Pretty niche though!
  4. If you want to process anything that is naturally a stream where you don't have the full contents yet. e.g. processing a large body from an http request as it is coming in. In this case you may or may not know the response size thus limit.
  5. Writing file when you don't know how big the result will be. With mmapped files out the box you are going to have to overallocate then resize the file at the end or rebase the view as you go along.

Off the top of my head all the *r-* functions should be a decent interface against a stream and should have very natural implementations. *a-* are not a natural fit and throwing because of relevant protocol not being implemented feels fine. limit may not be able to return an actual number if say the underlying stream does not have an end (or a known end anyway). skip might not be able to skip backwards. seek might be limited or impossible. Maybe binf.protocol/IPosition could be split up?

@helins
Copy link
Owner Author

helins commented Jun 7, 2021

I guess the implementation could simply reify reading and writing, not needing all that positioning stuff since those are unreliable on streams (most don't have that, some only provide an estimation for some features such as the number of available bytes, etc)

However, remains the problem of which class/interface to implement. Something like ObjectOutputStream seem to implement all interesting methods for R/W primitives whereas OutputStream is at the top of the hierarchy but it is just a byte stream. In means re-implementing a primitive interface on top of that (all integers, floats...), which I managed to avoid up to now.

@sh54
Copy link
Contributor

sh54 commented Jun 7, 2021

Just to note I have another branch that reads and writes clojure data structures to/from a stream based on some layout generated by stuff in the cabi namespace. A pr should be imminent.

I was already doing something very similar to whats going on in the cabi namespace so I figured I may as well port it over and see how it feels there.

I would think that it would be best to just implement InputStream and OutputStream. Then the underlying is always just bytes. If the wants to use an ObjectOutputStream then it would still just be treated as a stream of bytes which will be fine assuming ObjectOutputStream does not violate Liskov.

@sh54
Copy link
Contributor

sh54 commented Jun 7, 2021

See #6.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants