Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for tokio #121

Open
wants to merge 23 commits into
base: sync-async
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 6 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
2afb5a1
Add sync feature to Cargo.toml
SapryWenInera May 12, 2024
e0b6be7
Add initial error message for when io logic is missing do to feature …
SapryWenInera May 12, 2024
975fbec
Add sync and tokio features and sync is on default
SapryWenInera May 13, 2024
c056382
Add compile error for missing/conflicting features
SapryWenInera May 13, 2024
8afd567
Begin async support on spec module
SapryWenInera May 13, 2024
5f548a3
Merge remote-tracking branch 'upstream/master' into async_pr
SapryWenInera May 13, 2024
45f918c
Move synchronous code into sync submodule
SapryWenInera May 14, 2024
aae1098
Merge branch 'master' into async_pr
Pr0methean May 16, 2024
a14f6a7
Merge branch 'master' into async_pr
Pr0methean May 17, 2024
62788e2
docs: Add `package.metadata.docs.rs`
sorairolake May 18, 2024
933ccc4
Enable deflate-zlib as well, and keep deflate64 separate
Pr0methean May 18, 2024
57eaa50
ci(fuzz): Update seed corpora
Pr0methean May 18, 2024
4b295d3
Merge pull request #135 from sorairolake/docsrs
Pr0methean May 18, 2024
dc83532
Merge branch 'zip-rs:master' into async_pr
SapryWenInera May 18, 2024
af5752b
Add sync feature to Cargo.toml
SapryWenInera May 12, 2024
f871fff
Add initial error message for when io logic is missing do to feature …
SapryWenInera May 12, 2024
951184a
Add sync and tokio features and sync is on default
SapryWenInera May 13, 2024
ff97edd
Add compile error for missing/conflicting features
SapryWenInera May 13, 2024
8129f2e
Finished rebasing to upstream/sync-async branch
SapryWenInera May 18, 2024
e5af56f
Begin async support on spec module
SapryWenInera May 13, 2024
c95da50
Move synchronous code into sync submodule
SapryWenInera May 14, 2024
2e6053e
Updated CI
SapryWenInera May 29, 2024
794a1d0
Sync feature enablement
SapryWenInera May 29, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,8 @@ zstd = { version = "0.13.1", optional = true, default-features = false }
zopfli = { version = "0.8.0", optional = true }
deflate64 = { version = "0.1.8", optional = true }
lzma-rs = { version = "0.3.0", default-features = false, optional = true }
true = { version = "0.1.0", optional = true }
tokio = { version = "1.37.0", optional = true }

[target.'cfg(any(all(target_arch = "arm", target_pointer_width = "32"), target_arch = "mips", target_arch = "powerpc"))'.dependencies]
crossbeam-utils = "0.8.19"
Expand All @@ -67,11 +69,12 @@ deflate = ["flate2/rust_backend", "_deflate-any"]

# DEPRECATED: previously enabled `flate2/miniz_oxide` which is equivalent to `flate2/rust_backend`
deflate-miniz = ["deflate", "_deflate-any"]

deflate-zlib = ["flate2/zlib", "_deflate-any"]
deflate-zlib-ng = ["flate2/zlib-ng", "_deflate-any"]
deflate-zopfli = ["zopfli", "_deflate-any"]
lzma = ["lzma-rs/stream"]
sync = []
tokio = ["tokio/io-util"]
unreserved = []
default = [
"aes-crypto",
Expand All @@ -81,6 +84,7 @@ default = [
"deflate-zlib-ng",
"deflate-zopfli",
"lzma",
"sync",
"time",
"zstd",
]
Expand Down
5 changes: 5 additions & 0 deletions src/build.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,9 @@
if var("CARGO_FEATURE_DEFLATE_MINIZ").is_ok() {
println!("cargo:warning=Feature `deflate-miniz` is deprecated; replace it with `deflate`");
}
#[cfg(not(any(feature = "sync", feature = "tokio")))]
compile_error!("Missing Required feature");

Check failure on line 8 in src/build.rs

View workflow job for this annotation

GitHub Actions / Build and test --no-default-features: ubuntu-latest, msrv

Missing Required feature

Check failure on line 8 in src/build.rs

View workflow job for this annotation

GitHub Actions / Build and test --no-default-features: ubuntu-latest, nightly

Missing Required feature

Check failure on line 8 in src/build.rs

View workflow job for this annotation

GitHub Actions / Build and test --no-default-features: ubuntu-latest, stable

Missing Required feature

Check failure on line 8 in src/build.rs

View workflow job for this annotation

GitHub Actions / Build and test --no-default-features: macOS-latest, msrv

Missing Required feature

Check failure on line 8 in src/build.rs

View workflow job for this annotation

GitHub Actions / style_and_docs (--no-default-features)

Missing Required feature
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update ci.yml to enable sync wherever it has --no-default-features, or else convert it to a no-sync feature (which would be unidiomatic but have the advantage of being backward-compatible for more users).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the tokio feature is properly working in tests I will update the ci. Currently that is broken.

Copy link
Member

@Pr0methean Pr0methean May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for letting me know; but if it's going to take longer than about another 3 days, then I'd also greatly appreciate a CI-able update once or twice per week so we could get an idea of how the PR was progressing toward a releasable state where all further work could be deferred to follow-up PRs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I will try doing it this week after I some of my college exams.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SapryWenInera Have you finished your exams? If not, when do you expect to be able to address the comments on this PR? An ETA would be helpful not only for me, but also for the authors of the other major PRs, given the likelihood of merge conflicts.


#[cfg(all(feature = "sync", feature = "tokio"))]
compile_error!("The features sync and tokio cannot be used together")
Copy link
Member

@Pr0methean Pr0methean May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This restriction may turn out to be a problem for some users. It's possible to import two configurations of a crate twice by renaming one, but then they won't recognize each other's struct types or traits. I can think of two solutions:

  • (my preference) Give the async methods different names (e.g. parse_async) instead. If they're used in the same calling code, move that code to a macro and use method-scoped type names (e.g. type Read = AsyncRead; and macro parameters to differentiate them. See the example in my separate comment.
  • Make a separate crate for the Tokio features, and another for the shared core.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of pushing the old synchronous code in a module of read named sync and then the async code would be in a tokio module, this would avoid conflicts between the codebases, by the end of the week i'm probably gonna push a PR for that since it doesn't require any feature or ci changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good, but will there be a shared-core module as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that the structs and enums can be shared but the actual logic can be separate. This should avoid import conflicts and if there are code changes in the sync part it should not conflict with the async part. Do u think this or the macro one is better?

Copy link
Member

@Pr0methean Pr0methean May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I favor sharing as much code as possible, based on the Don't Repeat Yourself principle. But the modular design should be compatible with the macro approach: define the macros in the shared core module, and invoke them in the sync and tokio modules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SapryWenInera Could you please prioritize this issue for a fix, so that I can run CI on your work in progress and use the results to estimate how much longer this PR will take?

}
92 changes: 92 additions & 0 deletions src/spec.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
#[cfg(feature = "tokio")]
use tokio::io::{AsyncRead, AsyncReadExt, AsyncSeek, AsyncSeekExt};

use crate::result::{ZipError, ZipResult};
use crate::unstable::{LittleEndianReadExt, LittleEndianWriteExt};
use core::mem::size_of_val;
Expand Down Expand Up @@ -112,6 +115,7 @@ pub struct Zip64CentralDirectoryEndLocator {
pub number_of_disks: u32,
}

#[cfg(feature = "sync")]
impl Zip64CentralDirectoryEndLocator {
pub fn parse<T: Read>(reader: &mut T) -> ZipResult<Zip64CentralDirectoryEndLocator> {
let magic = reader.read_u32_le()?;
Expand Down Expand Up @@ -140,6 +144,30 @@ impl Zip64CentralDirectoryEndLocator {
}
}

#[cfg(feature = "tokio")]
impl Zip64CentralDirectoryEndLocator {
pub async fn parse<T>(reader: &mut T) -> ZipResult<Self>
where
T: AsyncRead + Unpin,
{
let magic = reader.read_u32_le().await?;
if magic != ZIP64_CENTRAL_DIRECTORY_END_LOCATOR_SIGNATURE {
return Err(ZipError::InvalidArchive(
"Invalid zip64 locator digital signature header",
));
}
let disk_with_central_directory = reader.read_u32_le().await?;
let end_of_central_directory_offset = reader.read_u64_le().await?;
let number_of_disks = reader.read_u32_le().await?;

Ok(Self {
disk_with_central_directory,
end_of_central_directory_offset,
number_of_disks,
})
}
}
Comment on lines +147 to +169
Copy link
Member

@Pr0methean Pr0methean May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg(feature = "tokio")]
impl Zip64CentralDirectoryEndLocator {
pub async fn parse<T>(reader: &mut T) -> ZipResult<Self>
where
T: AsyncRead + Unpin,
{
let magic = reader.read_u32_le().await?;
if magic != ZIP64_CENTRAL_DIRECTORY_END_LOCATOR_SIGNATURE {
return Err(ZipError::InvalidArchive(
"Invalid zip64 locator digital signature header",
));
}
let disk_with_central_directory = reader.read_u32_le().await?;
let end_of_central_directory_offset = reader.read_u64_le().await?;
let number_of_disks = reader.read_u32_le().await?;
Ok(Self {
disk_with_central_directory,
end_of_central_directory_offset,
number_of_disks,
})
}
}
macro_rules! parse {
($maybe_await:ident) => {
let magic = maybe_await(reader.read_u32_le())?;
if magic != ZIP64_CENTRAL_DIRECTORY_END_LOCATOR_SIGNATURE {
return Err(ZipError::InvalidArchive(
"Invalid zip64 locator digital signature header",
));
let disk_with_central_directory = maybe_await(reader.read_u32_le())?;
let end_of_central_directory_offset = maybe_await(reader.read_u64_le()?);
let number_of_disks = maybe_await(reader.read_u32_le())?;
Ok(Self {
disk_with_central_directory,
end_of_central_directory_offset,
number_of_disks,
})
}
}
#[cfg(feature = "tokio")]
pub(crate) async fn await_identity<T: ?Sized>(operand: T) -> T {
T.await
}
impl Zip64CentralDirectoryEndLocator {
#[cfg(feature = "tokio")]
pub async fn parse<T>(reader: &mut T) -> ZipResult<Self>
where
T: AsyncRead + Unpin,
{
use std::future::Future;
async fn await<T>(operand: Future<T>) -> T {
T.await
}
parse!(await_identity)
}
#[cfg(feature = "tokio")]
pub fn parse<T>(reader: &mut T) -> ZipResult<Self>
where
T: Read,
{
parse!(std::convert::identity)
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been brainstorming using macros for sharing code between async and sync code but i can't find a way to create a macro that can handle both .await calls and calls without .await without breaking the other, like in the code u just showed, is the goal of supporting async is to have a different code pathway for people to use or to just be a wrapper around current sync functions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need an extension trait that's implemented for both Read and AsyncRead, and other for Write and AsyncWrite. For the compression and decompression themselves, I think the wrapper approach is probably adequate, since they're CPU-heavy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make a demo in another branch of using async as just a wrapper around sync, but my preferred aproach would actually be using async all io calls and trying to create shareable sync functions for non io operations.

Copy link
Member

@Pr0methean Pr0methean May 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why we can't use macros whose parameters have sync and async definitions, to share code between sync and async versions of each function (e.g. Zip64CentralDirectoryEndLocator would have its sync and async parse method bodies generated by separate calls to a parse! macro that took as arguments $reader_read_u32_le_maybe_async:expr and $reader_read_u64_le_maybe_async:expr, both of which could be shared with other methods by having other macros define them at impl-block or wider scope). Could you please explain why you don't think that's feasible?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long break. There's two issues in trying to share code between async and sync functions using macros, the first is that as long as you're using async-await syntax calling async inside the macro would break sync code and not calling .await breaks async code and using .await outside async code block is not allowed.

macro_rules! maybe_await {
     (maybe_async:expr) => {
              $maybe_async().await.unwrap();
     }
}

fn main() {
     #[cfg!(feature = "async")]
     let string = maybe_await!(async {read_to_string("/tmp/foo")}); // Code breaks due to main not being async
     #[cfg!(feature = "sync")]
     let string = maybe_await!(read_to_string("/tmp/foo")) // Code breaks to to .await being called on sync function
     println!("{}", string)

I don't see a way to solve both problems while sharing any code whatsoever, if u can have any idea on how to due this then i'm all ears.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tests, one fix might be to use a maybe_block_on! macro as well. But maybe this is more trouble than it's worth; let's wait and see how much duplicated code is left after rebasing against #93 and factoring out as much as possible.

Copy link
Member

@Pr0methean Pr0methean Jun 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that one solution might involve 0th-order macros called sync_defs! and async_defs! that would provide different definitions of 1st-order macros such as maybe_await! and maybe_block_on! and read_or_async_read!, and then definitions of sync and async versions of a function would invoke their different 0th-order macros and then the same 2nd-order macro (which would contain the shared code). Could that possibly work?


pub struct Zip64CentralDirectoryEnd {
pub version_made_by: u16,
pub version_needed_to_extract: u16,
Expand All @@ -152,6 +180,7 @@ pub struct Zip64CentralDirectoryEnd {
//pub extensible_data_sector: Vec<u8>, <-- We don't do anything with this at the moment.
}

#[cfg(feature = "sync")]
impl Zip64CentralDirectoryEnd {
pub fn find_and_parse<T: Read + Seek>(
reader: &mut T,
Expand Down Expand Up @@ -227,6 +256,69 @@ impl Zip64CentralDirectoryEnd {
}
}

#[cfg(feature = "tokio")]
impl Zip64CentralDirectoryEnd {
pub async fn find_and_parse<T>(
reader: &mut T,
nominal_offset: u64,
search_upper_bound: u64,
) -> ZipResult<Vec<(Self, u64)>>
where
T: AsyncRead + AsyncSeek + Unpin,
{
let mut results = Vec::new();
let mut pos = search_upper_bound;

while pos >= nominal_offset {
let mut have_signature = false;
reader.seek(tokio::io::SeekFrom::Start(pos)).await?;
if reader.read_u32_le().await? == ZIP64_CENTRAL_DIRECTORY_END_SIGNATURE {
have_signature = true;
let archive_offset = pos - nominal_offset;

let _record_size = reader.read_u64_le().await?;
let version_made_by = reader.read_u16_le().await?;
let version_needed_to_extract = reader.read_u16_le().await?;
let disk_number = reader.read_u32_le().await?;
let disk_with_central_directory = reader.read_u32_le().await?;
let number_of_files_on_this_disk = reader.read_u64_le().await?;
let number_of_files = reader.read_u64_le().await?;
let central_directory_size = reader.read_u64_le().await?;
let central_directory_offset = reader.read_u64_le().await?;

results.push((
Self {
version_made_by,
version_needed_to_extract,
disk_number,
disk_with_central_directory,
number_of_files_on_this_disk,
number_of_files,
central_directory_size,
central_directory_offset,
},
archive_offset,
));
}
pos = match pos.checked_sub(if have_signature {
size_of_val(&ZIP64_CENTRAL_DIRECTORY_END_SIGNATURE) as u64
} else {
1
}) {
None => break,
Some(p) => p,
}
}
if results.is_empty() {
Err(ZipError::InvalidArchive(
"Could not find ZIP64 central directory end",
))
} else {
Ok(results)
}
}
}

/// Converts a path to the ZIP format (forward-slash-delimited and normalized).
pub(crate) fn path_to_string<T: AsRef<Path>>(path: T) -> Box<str> {
let mut maybe_original = None;
Expand Down
Loading