Skip to content
This repository has been archived by the owner on May 15, 2024. It is now read-only.

Zip: omit Disk Start Number from the Zip64 Central Directory Entry (when possible) #262

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

BhaaLseN
Copy link

@BhaaLseN BhaaLseN commented Aug 5, 2022

Clause 4.5.3 of the PKWare spec clearly states that it MUST only appear in
the Extra block when the corresponding field in the CDE (or LDE) was set to
-1 (0xFFFF in the case of the Disk Start Number.)

Previously, this would only be the case when Zip64 was presumed or the disk
number was actually outside the range for the CDE. In every other case, the
Disk Start Number would go into both locations, violating the spec.

Some tools (such as 7-Zip) show warnings in this case, but often work as
expected. Others (such as older versions of System.IO.Compression) follow
the spec more stringently and will refuse to use the values from the Zip64
Extra block when the related values do not match their expectation.

However, fixing this to always write a conformant CDE that has its Disk
Start Number as 0xFFFF with the actual value in the Extra block breaks other
tools (such as marmelroy/Zip on iOS) that do not fully support Zip64.

The spec does allow the Extra block to omit the Disk Start Number (and the
Relativ Header Offset) when they fit in the CDE, so we do just that for
general compatibility.

Tested with both 7-Zip and System.IO.Compression (which would
otherwise bail out and return 0x0000_0000_ffff_ffff as sizes,) as well
as marmelroy/Zip (which never looks at the extra field for the disk start,)
but I don't really see where (or rather: how) to add a sensible test for this.
Mainly because I'd rather not reimplement Zip reading in the test, and I don't
think pulling in 7-Zip for a test is the way to go. Nor is attempting to
use System.IO.Compression, because they since made their code more
accepting of non-conforming Zip files that are otherwise fine.

Fixes #260

@vijfhoek
Copy link

vijfhoek commented Mar 7, 2023

I'm currently running into this issue as well, as WinRAR fails to extract zip64 archives presumably because of this issue. Any chance to get this pull request merged?

}
else
{
// If reading a segmneted archive and saving to a regular archive,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If reading a segmneted archive and saving to a regular archive,
// If reading a segmented archive and saving to a regular archive,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I specifically did not fix this typo because it was existing code that I just indented.

I'll let a maintainer decide if they want it though.

@BhaaLseN
Copy link
Author

BhaaLseN commented Mar 7, 2023

Do note that we finally got around to trying this out with iOS on the other side (I don't remember the exact library at the moment, maybe marmelroy/Zip?) which initially got me to look at this...and it still fails. Presumably because the underlying minizip does not correctly look at the Zip64 Extra block when it contains the Disk Start Number but instead uses the -1; breaking more than before.

I have an alternate fix that simply writes a smaller Zip64 Extra block that omits the Disk Start Number completely; and that one works in all cases...but I don't have time at the moment to update my branch. So I'm setting this to Draft until I get around to do this.

@BhaaLseN BhaaLseN marked this pull request as draft March 7, 2023 18:48
Clause 4.5.3 of the PKWare spec clearly states that it MUST only appear in
the Extra block when the corresponding field in the CDE (or LDE) was set to
-1 (0xFFFF in the case of the Disk Start Number.)

Previously, this would only be the case when Zip64 was presumed or the disk
number was actually outside the range for the CDE. In every other case, the
Disk Start Number would go into both locations, violating the spec.

Some tools (such as 7-Zip) show warnings in this case, but often work as
expected. Others (such as older versions of System.IO.Compression) follow
the spec more stringently and will refuse to use the values from the Zip64
Extra block when the related values do not match their expectation.

However, fixing this to always write a conformant CDE that has its Disk
Start Number as 0xFFFF with the actual value in the Extra block breaks other
tools (such as marmelroy/Zip on iOS) that do not fully support Zip64.

The spec does allow the Extra block to omit the Disk Start Number (and the
Relativ Header Offset) when they fit in the CDE, so we do just that for
general compatibility.

Fixes haf#260
@BhaaLseN BhaaLseN changed the title Zip: force Disk Start Number to -1 in a Zip64 Central Directory Entry Zip: omit Disk Start Number from the Zip64 Central Directory Entry (when possible) Mar 10, 2023
@BhaaLseN
Copy link
Author

Leaving this as Draft for now, since I didn't have the time (yet) to fully test this with more than 2 small zip files.

Rather than fixing the CDE to use -1 for the disk start number, this just omits it (because some tools still read and use it, despite it being in the extra field as per spec.)

Note that the build fail is from ruby and unrelated to my code changes.

@vijfhoek
Copy link

vijfhoek commented Mar 13, 2023

I've just tested your code with

  • An archive with a single 4.5GB file
  • An archive with a 4.5GB and a 6.1GB file
  • An archive with 70,000 small text files

And it worked like a charm in 7zip, WinRAR and Windows Explorer.

EDIT I just tried it in our code base, but sadly it still seems to be broken when encryption is enabled. I am not familiar enough with ZIP to say if this is actually related to this issue though

EDIT 2: I am failing to reproduce my issue using the ZipIt example, I may be calling it wrong in our code base - hold on

@BhaaLseN
Copy link
Author

Thanks for the testing, but unfortunately I have no idea how encryption works in there. It is a flag somewhere, but this feels unrelated to Zip64 🤔

@vijfhoek
Copy link

vijfhoek commented Mar 16, 2023

I have narrowed my exact problem down to our repeated calling of the ZipFile.Save function, which seems to function incorrectly when Zip64 is enabled. This is unrelated to this PR, but still related to Zip64. I worked around it by.. just not calling Save repeatedly 🙂

MicrosoftTeams-image (4)

Note that this screenshot shows encryption being enabled, but the same happens when it is not.

I have taken a glance at the DotNetZip source, and it seems like the problem lies in ZipEntry.CopyThroughOneEntry. By commenting out that function, surrounding if statement and early return in ZipEntry.Write, the problem no longer occurs.

Tl;dr your PR seems to work exactly as it's supposed to, I'm having issues do to a mostly unrelated problem

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Zip64 writes incorrect central directory file header/extended information extra field
2 participants