Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XzWriter/LzmaWriter - where they are? #1

Open
dimzon opened this issue Aug 20, 2017 · 5 comments
Open

XzWriter/LzmaWriter - where they are? #1

dimzon opened this issue Aug 20, 2017 · 5 comments

Comments

@dimzon
Copy link

dimzon commented Aug 20, 2017

XzWriter/LzmaWriter - where they are?

@djp952
Copy link
Owner

djp952 commented Aug 20, 2017

There aren't XzWriter/LzmaWriter classes because the overall length of the uncompressed input data needs to be known in order to create the XZ/LZMA outputs. The other methods lend themselves better to a "Writer" stream as they just build as they go, the input length isn't important.

If you check out the XzEncoder/LzmaEncoder classes, they each ultimately work their way down to a version of ::Encode that requires the input stream length. To create a Writer class you would have to read the stream to the end and then reset it so you know how long it is, which should be fine for things like file-based streams, but if the calling code is creating the stream on the fly or the input stream doesn't support seeking, you'd end up in a pickle.

There could be better ways to use the XZ/LZMA protocols than I am aware of, of course :)

@dimzon
Copy link
Author

dimzon commented Aug 20, 2017

No, you doesnt need the overall length of the uncompressed input data needs to be known in order to create the XZ/LZMA outputs.

https://github.com/goldenbull/ManagedXZ

@djp952
Copy link
Owner

djp952 commented Aug 21, 2017

You are correct, it's been a while since I looked at this :) The XZ and LZMA encoders use the LZMA SDK which doesn't support writing compressed data in chunks like the other protocols do, you call a single Encode method and it just does it's thing. Having a Writer class doesn't work here, you have to have all the input you're going to have available at the time of encoding (again, for the LZMA SDK implementation).

Now, you can implement your own input Stream and pass that to XzEncoder/LzmaEncoder if you'd like to provide input incrementally, but an in-built Writer class would ultimately need to cache the entire input stream and then call the underlying Encoder when it's done, and that's not feasible.

@dimzon
Copy link
Author

dimzon commented Aug 21, 2017

https://tukaani.org/xz/format.html
The .xz (LZMA2) file format your encoder produce is
Streamable: It is always possible to create and decompress .xz files in a pipe; no seeking is required.

LZMA (lzma-alone) also doesn't need to cache entry stream

liblzma have a nice streaming api btw...

@djp952
Copy link
Owner

djp952 commented Aug 21, 2017

"Streaming" in and of itself, yes, you will just have to provide your own input stream since the API doesn't allow you to do it in chunks. There is one API function to call to create the compressed stream in its entirety, how would you propose an XzWriter/LzmaWriter class call that function just one time without storing all of the uncompressed data somewhere and then how would you trigger the call to the API, when the Writer is closed/disposed? Add a method .Write()? Far easier to just create a task-specific input Stream implementation and pass it to the XzEncoder/LzmaEncoder. It's how I use it, works great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants