gunzip: improve EOF handling #40

cyphar · 2021-02-19T08:36:14Z

This fixes #38 and #39, though I'm not entirely sure if you're happy with this approach.

To solve #39, we switch from using a channel for the block pool and instead use a sync.Pool. This does have the downside that the read-ahead goroutine can now end up allocating more blocks than the user requested. If this is not acceptable I can try to figure out a different solution for this problem. By using sync.Pool, there is no issue of blocking on a goroutine channel send when there are no other threads reading from it.

To solve #38, some extra io.EOF special casing was needed in both WriteTo and Read. I think that these changes are reasonable -- it seems as though z.err should never store io.EOF (and there were only a few cases where it would -- which I've now fixed), but let me know what you think.

Fixes #38
Fixes #39

Signed-off-by: Aleksa Sarai [email protected]

This matches NewWriter, and reduces the possibility of forgetting to update one of the initialisation functions when making changes. Signed-off-by: Aleksa Sarai <[email protected]>

This stops us from causing goroutine deadlocks when we re-add a buffer after the stream has been read and the read-ahead goroutine is dead. Note that with this change, it is possible for more blocks to be allocated than the user requested. Signed-off-by: Aleksa Sarai <[email protected]>

WriteTo should not return io.EOF because it is assumed to have io.Copy semantics (namely, on success you return no error -- even if there were no bytes copied). Several parts of WriteTo would return io.EOF -- all of which need to be switched to special-case io.EOF. In addition, Read would save io.EOF in z.err in some specific corner cases -- these appear to be oversights and thus are fixed to not store io.EOF in z.err. Signed-off-by: Aleksa Sarai <[email protected]>

Before this patchset, these tests would either lock up or fail with spurrious EOF errors. Signed-off-by: Aleksa Sarai <[email protected]>

klauspost · 2021-02-19T10:22:43Z

@cyphar Thank you for looking through my ancient code. I remember it being quite painful when I looked through it a year ago.

klauspost

LGTM

klauspost · 2021-02-19T10:25:24Z

This does have the downside that the read-ahead goroutine can now end up allocating more blocks than the user requested

@cyphar It this without limit? There must be a limit to how far it can read ahead, otherwise you will easily run the system out of memory.

cyphar · 2021-02-19T10:27:35Z

@cyphar It this without limit? There must be a limit to how far it can read ahead, otherwise you will easily run the system out of memory.

Yeah it's without limit -- the issue is that sync.Pool doesn't have a way of limiting the size of the pool -- so it is possible you'd end up with an endlessly growing number of blocks. I will have to come up with another way of fixing the deadlock issue -- I think the issue with the channel approach taken is that the channel is getting full for some reason. I'll take another look.

klauspost · 2021-02-19T10:50:15Z

@cyphar It would be great if you could look at that. Unbounded decompression is an issue. Thanks!

rubenv · 2022-05-11T08:03:11Z

Got bitten by this, hard. @klauspost unbounded decompression is an issue, but currently the results are just wrong, which may lead to data loss.

Could we consider at least getting it to return data? I'd rather see my server blow up than lose data.

cyphar · 2022-05-11T13:22:26Z

I'll take another look at how to fix the deadlock issue without switching to sync.Pool...

klauspost · 2022-05-11T13:51:28Z

Details escape me, but I would think that persisting errors, so future calls are rejected should be enough.

rubenv · 2022-05-11T14:21:35Z

Problem is that if you do an io.Copy() on a reader that has buffered fully, you won't get an error. It'll just, silently, copy 0 bytes and call it done.

I'd be happy with an error, even if it was just "TODO". But not silently eating data, that's dangerous.

cyphar · 2022-05-11T22:07:55Z

@klauspost The io.EOF handling is fixed in a separate commit (gunzip: handle io.EOF errors correctly in WriteTo), but the unit tests require the deadlocks to be fixed.

@rubenv

Problem is that if you do an io.Copy() on a reader that has buffered fully, you won't get an error. It'll just, silently, copy 0 bytes and call it done.

io.Copy()'s semantics are that it returns nil when the end of the file is reached, so you won't get io.EOF if there's no data to read. I guess the issue is that the compressor hasn't generated all the data? Is this issue also fixed by this PR? Do you have a test case I can add?

klauspost · 2023-05-04T12:17:57Z

The sync.Pool change doesn't work for us here and will cause serious regressions. As you proposed initially, please consider a solution that retains the original pool.

You are welcome to submit separate PRs, if that will simplify code review.

cyphar added 4 commits February 19, 2021 18:52

gunzip: implement NewReader and NewReaderN using Reset

278430d

This matches NewWriter, and reduces the possibility of forgetting to update one of the initialisation functions when making changes. Signed-off-by: Aleksa Sarai <[email protected]>

gunzip: add EOF-handling tests

6038332

Before this patchset, these tests would either lock up or fail with spurrious EOF errors. Signed-off-by: Aleksa Sarai <[email protected]>

cyphar force-pushed the eof-handling branch from 4842137 to 6038332 Compare February 19, 2021 08:44

cyphar mentioned this pull request Feb 19, 2021

oci: unpack: slurp up raw layer stream before Close() opencontainers/umoci#360

Merged

klauspost approved these changes Feb 19, 2021

View reviewed changes

klauspost mentioned this pull request May 4, 2023

Please release version 1.2.6 or later #52

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gunzip: improve EOF handling #40

gunzip: improve EOF handling #40

cyphar commented Feb 19, 2021

klauspost commented Feb 19, 2021

klauspost left a comment

klauspost commented Feb 19, 2021

cyphar commented Feb 19, 2021 •

edited

Loading

klauspost commented Feb 19, 2021

rubenv commented May 11, 2022

cyphar commented May 11, 2022

klauspost commented May 11, 2022

rubenv commented May 11, 2022

cyphar commented May 11, 2022

klauspost commented May 4, 2023

gunzip: improve EOF handling #40

Are you sure you want to change the base?

gunzip: improve EOF handling #40

Conversation

cyphar commented Feb 19, 2021

klauspost commented Feb 19, 2021

klauspost left a comment

Choose a reason for hiding this comment

klauspost commented Feb 19, 2021

cyphar commented Feb 19, 2021 • edited Loading

klauspost commented Feb 19, 2021

rubenv commented May 11, 2022

cyphar commented May 11, 2022

klauspost commented May 11, 2022

rubenv commented May 11, 2022

cyphar commented May 11, 2022

klauspost commented May 4, 2023

cyphar commented Feb 19, 2021 •

edited

Loading