Initial support for tokio #121

SapryWenInera · 2024-05-13T02:09:51Z

Start implementing tokio support for reading zip and separating the io operations of sync and async code using the features sync and tokio. Starting work to closes #108.

…missing

Pr0methean · 2024-05-13T21:16:34Z

src/build.rs

@@ -4,4 +4,9 @@ fn main() {
    if var("CARGO_FEATURE_DEFLATE_MINIZ").is_ok() {
        println!("cargo:warning=Feature `deflate-miniz` is deprecated; replace it with `deflate`");
    }
+    #[cfg(not(any(feature = "sync", feature = "tokio")))]
+    compile_error!("Missing Required feature");


Update ci.yml to enable sync wherever it has --no-default-features, or else convert it to a no-sync feature (which would be unidiomatic but have the advantage of being backward-compatible for more users).

Once the tokio feature is properly working in tests I will update the ci. Currently that is broken.

Thanks for letting me know; but if it's going to take longer than about another 3 days, then I'd also greatly appreciate a CI-able update once or twice per week so we could get an idea of how the PR was progressing toward a releasable state where all further work could be deferred to follow-up PRs.

Then I will try doing it this week after I some of my college exams.

@SapryWenInera Have you finished your exams? If not, when do you expect to be able to address the comments on this PR? An ETA would be helpful not only for me, but also for the authors of the other major PRs, given the likelihood of merge conflicts.

Pr0methean · 2024-05-13T21:24:07Z

src/build.rs

+    compile_error!("Missing Required feature");
+
+    #[cfg(all(feature = "sync", feature = "tokio"))]
+    compile_error!("The features sync and tokio cannot be used together")


This restriction may turn out to be a problem for some users. It's possible to import two configurations of a crate twice by renaming one, but then they won't recognize each other's struct types or traits. I can think of two solutions:

(my preference) Give the async methods different names (e.g. parse_async) instead. If they're used in the same calling code, move that code to a macro and use method-scoped type names (e.g. type Read = AsyncRead; and macro parameters to differentiate them. See the example in my separate comment.

Make a separate crate for the Tokio features, and another for the shared core.

I'm thinking of pushing the old synchronous code in a module of read named sync and then the async code would be in a tokio module, this would avoid conflicts between the codebases, by the end of the week i'm probably gonna push a PR for that since it doesn't require any feature or ci changes.

That sounds good, but will there be a shared-core module as well?

The idea is that the structs and enums can be shared but the actual logic can be separate. This should avoid import conflicts and if there are code changes in the sync part it should not conflict with the async part. Do u think this or the macro one is better?

I favor sharing as much code as possible, based on the Don't Repeat Yourself principle. But the modular design should be compatible with the macro approach: define the macros in the shared core module, and invoke them in the sync and tokio modules.

@SapryWenInera Could you please prioritize this issue for a fix, so that I can run CI on your work in progress and use the results to estimate how much longer this PR will take?

Pr0methean · 2024-05-13T21:52:52Z

src/spec.rs

+#[cfg(feature = "tokio")]
+impl Zip64CentralDirectoryEndLocator {
+    pub async fn parse<T>(reader: &mut T) -> ZipResult<Self>
+    where
+        T: AsyncRead + Unpin,
+    {
+        let magic = reader.read_u32_le().await?;
+        if magic != ZIP64_CENTRAL_DIRECTORY_END_LOCATOR_SIGNATURE {
+            return Err(ZipError::InvalidArchive(
+                "Invalid zip64 locator digital signature header",
+            ));
+        }
+        let disk_with_central_directory = reader.read_u32_le().await?;
+        let end_of_central_directory_offset = reader.read_u64_le().await?;
+        let number_of_disks = reader.read_u32_le().await?;
+
+        Ok(Self {
+            disk_with_central_directory,
+            end_of_central_directory_offset,
+            number_of_disks,
+        })
+    }
+}


Suggested change

#[cfg(feature = "tokio")]

impl Zip64CentralDirectoryEndLocator {

pub async fn parse<T>(reader: &mut T) -> ZipResult<Self>

where

T: AsyncRead + Unpin,

{

let magic = reader.read_u32_le().await?;

if magic != ZIP64_CENTRAL_DIRECTORY_END_LOCATOR_SIGNATURE {

return Err(ZipError::InvalidArchive(

"Invalid zip64 locator digital signature header",

));

}

let disk_with_central_directory = reader.read_u32_le().await?;

let end_of_central_directory_offset = reader.read_u64_le().await?;

let number_of_disks = reader.read_u32_le().await?;

Ok(Self {

disk_with_central_directory,

end_of_central_directory_offset,

number_of_disks,

})

}

}

macro_rules! parse {

($maybe_await:ident) => {

let magic = maybe_await(reader.read_u32_le())?;

if magic != ZIP64_CENTRAL_DIRECTORY_END_LOCATOR_SIGNATURE {

return Err(ZipError::InvalidArchive(

"Invalid zip64 locator digital signature header",

));

let disk_with_central_directory = maybe_await(reader.read_u32_le())?;

let end_of_central_directory_offset = maybe_await(reader.read_u64_le()?);

let number_of_disks = maybe_await(reader.read_u32_le())?;

Ok(Self {

disk_with_central_directory,

end_of_central_directory_offset,

number_of_disks,

})

}

}

#[cfg(feature = "tokio")]

pub(crate) async fn await_identity<T: ?Sized>(operand: T) -> T {

T.await

}

impl Zip64CentralDirectoryEndLocator {

#[cfg(feature = "tokio")]

pub async fn parse<T>(reader: &mut T) -> ZipResult<Self>

where

T: AsyncRead + Unpin,

{

use std::future::Future;

async fn await<T>(operand: Future<T>) -> T {

T.await

}

parse!(await_identity)

}

#[cfg(feature = "tokio")]

pub fn parse<T>(reader: &mut T) -> ZipResult<Self>

where

T: Read,

{

parse!(std::convert::identity)

}

I've been brainstorming using macros for sharing code between async and sync code but i can't find a way to create a macro that can handle both .await calls and calls without .await without breaking the other, like in the code u just showed, is the goal of supporting async is to have a different code pathway for people to use or to just be a wrapper around current sync functions?

Maybe we need an extension trait that's implemented for both Read and AsyncRead, and other for Write and AsyncWrite. For the compression and decompression themselves, I think the wrapper approach is probably adequate, since they're CPU-heavy.

I can make a demo in another branch of using async as just a wrapper around sync, but my preferred aproach would actually be using async all io calls and trying to create shareable sync functions for non io operations.

I still don't understand why we can't use macros whose parameters have sync and async definitions, to share code between sync and async versions of each function (e.g. Zip64CentralDirectoryEndLocator would have its sync and async parse method bodies generated by separate calls to a parse! macro that took as arguments $reader_read_u32_le_maybe_async:expr and $reader_read_u64_le_maybe_async:expr, both of which could be shared with other methods by having other macros define them at impl-block or wider scope). Could you please explain why you don't think that's feasible?

Sorry for the long break. There's two issues in trying to share code between async and sync functions using macros, the first is that as long as you're using async-await syntax calling async inside the macro would break sync code and not calling .await breaks async code and using .await outside async code block is not allowed.

macro_rules! maybe_await { (maybe_async:expr) => { $maybe_async().await.unwrap(); } } fn main() { #[cfg!(feature = "async")] let string = maybe_await!(async {read_to_string("/tmp/foo")}); // Code breaks due to main not being async #[cfg!(feature = "sync")] let string = maybe_await!(read_to_string("/tmp/foo")) // Code breaks to to .await being called on sync function println!("{}", string)

I don't see a way to solve both problems while sharing any code whatsoever, if u can have any idea on how to due this then i'm all ears.

For tests, one fix might be to use a maybe_block_on! macro as well. But maybe this is more trouble than it's worth; let's wait and see how much duplicated code is left after rebasing against #93 and factoring out as much as possible.

It occurs to me that one solution might involve 0th-order macros called sync_defs! and async_defs! that would provide different definitions of 1st-order macros such as maybe_await! and maybe_block_on! and read_or_async_read!, and then definitions of sync and async versions of a function would invoke their different 0th-order macros and then the same 2nd-order macro (which would contain the shared code). Could that possibly work?

Pr0methean · 2024-05-14T01:40:26Z

PS. Beware of possible conflicts with #120 and #93, the two other large PRs currently in active development. #93 is probably the one that will be merged first.

Pr0methean · 2024-05-14T04:22:01Z

FYI: Be aware that some other major PRs are currently open.

Signed-off-by: Chris Hennick <[email protected]>

This is to enable `doc_auto_cfg` feature with Docs.rs.

deflate-zlib was an omission; deflate64 is a different, backward-incompatible algorithm. Signed-off-by: Chris Hennick <[email protected]>

Pr0methean · 2024-05-18T03:05:33Z

I've retargeted this PR to the same branch as #134, but we're getting complicated merge conflicts. Please address them.

docs: Enable `doc_auto_cfg` feature with Docs.rs

…missing

Pr0methean · 2024-05-19T19:08:24Z

Please revert the fuzz/corpus directory. Once the current major PRs are merged or dormant, I will update the corpus using the output of the next cargo fuzz run, plus a run with no seed corpus, and then iteratively minimize it with cargo cmin until there's no further reduction. If you'd like to add corpus entries from other sources, please do so in a separate PR.

Pr0methean · 2024-05-19T19:09:49Z

src/read.rs

@@ -316,16 +318,4 @@ mod test {
        let mut file = reader.by_index(0).unwrap();
        assert_eq!(file.read(&mut decompressed).unwrap(), 12);
    }
-
-    #[test]


Why are you deleting this?

Pr0methean · 2024-05-19T19:11:09Z

src/lib.rs

@@ -25,6 +25,7 @@
 //! | ZipCrypto deprecated encryption | ✅ | ✅ |
 //!
 //!
+#![cfg_attr(docsrs, feature(doc_auto_cfg))]


Why this change?

Pr0methean · 2024-05-19T19:18:10Z

src/read/tokio.rs

+        .await?;
+
+        let results: Vec<Result<CentralDirectoryInfo, ZipError>> = search_results.iter().map(|(footer64, archive_offset)| {
+            let directory_start_result = footer64.central_directory_offset.checked_add(*archive_offset).ok_or(ZipError::InvalidArchive("Invalid central directory size or effect"));


Suggested change

let directory_start_result = footer64.central_directory_offset.checked_add(*archive_offset).ok_or(ZipError::InvalidArchive("Invalid central directory size or effect"));

let directory_start_result = footer64.central_directory_offset.checked_add(*archive_offset).ok_or(ZipError::InvalidArchive("Invalid central directory size or offset"));

Pr0methean · 2024-05-19T19:20:19Z

src/read/tokio.rs

+        )
+        .await?;
+
+        let results: Vec<Result<CentralDirectoryInfo, ZipError>> = search_results.iter().map(|(footer64, archive_offset)| {


Factor out lines 109-130 and their copies in the sync version to a shared method. It can be called validate_zip64_footers.

Pr0methean · 2024-05-19T19:26:06Z

src/read/tokio.rs

+                    #[allow(deprecated)]
+                    let compression_method =
+                        CompressionMethod::from_u16(reader.read_u16_le().await?);
+


Factor out lines 286-313 into a shared method; it can be called decode_aes_extra_data.

Pr0methean · 2024-05-19T19:32:23Z

src/spec.rs

+#[cfg(feature = "tokio")]
+impl Zip64CentralDirectoryEndLocator {
+    pub async fn parse<T>(reader: &mut T) -> ZipResult<Self>
+    where
+        T: AsyncRead + Unpin,
+    {
+        let magic = reader.read_u32_le().await?;
+        if magic != ZIP64_CENTRAL_DIRECTORY_END_LOCATOR_SIGNATURE {
+            return Err(ZipError::InvalidArchive(
+                "Invalid zip64 locator digital signature header",
+            ));
+        }
+        let disk_with_central_directory = reader.read_u32_le().await?;
+        let end_of_central_directory_offset = reader.read_u64_le().await?;
+        let number_of_disks = reader.read_u32_le().await?;
+
+        Ok(Self {
+            disk_with_central_directory,
+            end_of_central_directory_offset,
+            number_of_disks,
+        })
+    }
+}


I still don't understand why we can't use macros whose parameters have sync and async definitions, to share code between sync and async versions of each function (e.g. Zip64CentralDirectoryEndLocator would have its sync and async parse method bodies generated by separate calls to a parse! macro that took as arguments $reader_read_u32_le_maybe_async:expr and $reader_read_u64_le_maybe_async:expr, both of which could be shared with other methods by having other macros define them at impl-block or wider scope). Could you please explain why you don't think that's feasible?

Pr0methean · 2024-05-29T21:11:31Z

.github/workflows/ci.yaml

@@ -21,7 +21,7 @@ jobs:
      matrix:
        os: [ubuntu-latest, macOS-latest, windows-latest]
        rustalias: [stable, nightly, msrv]
-        feature_flag: ["--all-features", "--no-default-features", ""]
+        feature_flag: ["--no-default-features --features sync_all", "--no-default-features --features tokio_all", "--no-default-features --features sync", "--no-default-features --features tokio", ""]


Could you please make it possible to build with both sync and tokio? To do this, you can invoke the traits with an explicit self parameter, e.g. Read::read_exact(self, buf) and AsyncRead::read_exact(self, buf) instead of self.read_exact(buf). Since methods with the same name don't conflict when they're specified in different traits, this should be the only change needed.

FYI, I found a crate that does something similar to this, but with proc macros: https://crates.io/crates/async-generic. Looks like it may still be worth layering a few regular macros on top, for when async-generic functions call other async-generic functions.

cosmicexplorer · 2024-07-16T23:49:09Z

#207 refactors readers to pass through the parameterized reader type rather than flatten it to &mut dyn Read. I'm not sure whether it will be accepted yet, but my original goal for that change was to enable async wrapping, since it means ZipFile (ZipEntry in that PR) is now Send if the parameterized reader type R is Send, so you may find it useful.

SapryWenInera added 6 commits May 11, 2024 23:54

Add sync feature to Cargo.toml

2afb5a1

Add initial error message for when io logic is missing do to feature …

e0b6be7

…missing

Add sync and tokio features and sync is on default

975fbec

Add compile error for missing/conflicting features

c056382

Begin async support on spec module

8afd567

Merge remote-tracking branch 'upstream/master' into async_pr

5f548a3

Pr0methean requested changes May 13, 2024

View reviewed changes

Pr0methean reviewed May 13, 2024

View reviewed changes

Move synchronous code into sync submodule

45f918c

Pr0methean added the major This >~1000 SLOC PR is likely to cause hard-to-resolve conflicts with other open PRs once merged. label May 14, 2024

Pr0methean mentioned this pull request May 15, 2024

Breaking change to ZipWriter::finish() in 1.2.0 release #124

Closed

Pr0methean added the Please address review comments Some review comments are still open. label May 16, 2024

Merge branch 'master' into async_pr

aae1098

Signed-off-by: Chris Hennick <[email protected]>

Pr0methean added the Please fix failing tests Tests are failing with this change; please fix them. label May 16, 2024

Merge branch 'master' into async_pr

a14f6a7

SapryWenInera mentioned this pull request May 17, 2024

Move read sync code into sync submodule #134

Merged

sorairolake and others added 2 commits May 18, 2024 09:42

docs: Add package.metadata.docs.rs

62788e2

This is to enable `doc_auto_cfg` feature with Docs.rs.

Enable deflate-zlib as well, and keep deflate64 separate

933ccc4

deflate-zlib was an omission; deflate64 is a different, backward-incompatible algorithm. Signed-off-by: Chris Hennick <[email protected]>

Pr0methean changed the base branch from master to sync-async May 18, 2024 02:36

Pr0methean added the merge conflict This PR has a merge conflict with a PR that was in the merge queue when this label was applied. label May 18, 2024

Pr0methean and others added 7 commits May 17, 2024 22:40

ci(fuzz): Update seed corpora

57eaa50

Merge pull request zip-rs#135 from sorairolake/docsrs

4b295d3

docs: Enable `doc_auto_cfg` feature with Docs.rs

Merge branch 'zip-rs:master' into async_pr

dc83532

Add sync feature to Cargo.toml

af5752b

Add initial error message for when io logic is missing do to feature …

f871fff

…missing

Add sync and tokio features and sync is on default

951184a

Add compile error for missing/conflicting features

ff97edd

SapryWenInera added 3 commits May 18, 2024 12:36

Finished rebasing to upstream/sync-async branch

8129f2e

Begin async support on spec module

e5af56f

Move synchronous code into sync submodule

c95da50

Pr0methean removed Please fix failing tests Tests are failing with this change; please fix them. merge conflict This PR has a merge conflict with a PR that was in the merge queue when this label was applied. labels May 19, 2024

Pr0methean requested changes May 19, 2024

View reviewed changes

SapryWenInera added 2 commits May 29, 2024 15:10

Updated CI

2e6053e

Sync feature enablement

794a1d0

Pr0methean requested changes May 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial support for tokio #121

Initial support for tokio #121

SapryWenInera commented May 13, 2024 •

edited

Loading

Pr0methean May 13, 2024

SapryWenInera May 14, 2024

Pr0methean May 14, 2024 •

edited

Loading

SapryWenInera May 14, 2024

Pr0methean May 20, 2024

Pr0methean May 13, 2024 •

edited

Loading

SapryWenInera May 14, 2024

Pr0methean May 14, 2024

SapryWenInera May 14, 2024

Pr0methean May 14, 2024 •

edited

Loading

Pr0methean May 16, 2024

Pr0methean May 13, 2024 •

edited

Loading

SapryWenInera May 17, 2024

Pr0methean May 18, 2024

SapryWenInera May 18, 2024

Pr0methean May 19, 2024 •

edited

Loading

SapryWenInera May 27, 2024

Pr0methean May 28, 2024

Pr0methean Jun 2, 2024 •

edited

Loading

Pr0methean commented May 14, 2024 •

edited

Loading

Pr0methean commented May 14, 2024

Pr0methean commented May 18, 2024

Pr0methean commented May 19, 2024

Pr0methean May 19, 2024

Pr0methean May 19, 2024

Pr0methean May 19, 2024

Pr0methean May 19, 2024

Pr0methean May 19, 2024

Pr0methean May 19, 2024 •

edited

Loading

Pr0methean May 29, 2024 •

edited

Loading

Pr0methean Jun 2, 2024 •

edited

Loading

cosmicexplorer commented Jul 16, 2024

	let directory_start_result = footer64.central_directory_offset.checked_add(*archive_offset).ok_or(ZipError::InvalidArchive("Invalid central directory size or effect"));
	let directory_start_result = footer64.central_directory_offset.checked_add(*archive_offset).ok_or(ZipError::InvalidArchive("Invalid central directory size or offset"));

Initial support for tokio #121

Are you sure you want to change the base?

Initial support for tokio #121

Conversation

SapryWenInera commented May 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Pr0methean May 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Pr0methean May 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Pr0methean May 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Pr0methean May 13, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Pr0methean May 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Pr0methean Jun 2, 2024 • edited Loading

Choose a reason for hiding this comment

Pr0methean commented May 14, 2024 • edited Loading

Pr0methean commented May 14, 2024

Pr0methean commented May 18, 2024

Pr0methean commented May 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Pr0methean May 19, 2024 • edited Loading

Choose a reason for hiding this comment

Pr0methean May 29, 2024 • edited Loading

Choose a reason for hiding this comment

Pr0methean Jun 2, 2024 • edited Loading

Choose a reason for hiding this comment

cosmicexplorer commented Jul 16, 2024

SapryWenInera commented May 13, 2024 •

edited

Loading

Pr0methean May 14, 2024 •

edited

Loading

Pr0methean May 13, 2024 •

edited

Loading

Pr0methean May 14, 2024 •

edited

Loading

Pr0methean May 13, 2024 •

edited

Loading

Pr0methean May 19, 2024 •

edited

Loading

Pr0methean Jun 2, 2024 •

edited

Loading

Pr0methean commented May 14, 2024 •

edited

Loading

Pr0methean May 19, 2024 •

edited

Loading

Pr0methean May 29, 2024 •

edited

Loading

Pr0methean Jun 2, 2024 •

edited

Loading