Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial support for tokio #121

Open
wants to merge 23 commits into
base: sync-async
Choose a base branch
from

Conversation

SapryWenInera
Copy link

@SapryWenInera SapryWenInera commented May 13, 2024

Start implementing tokio support for reading zip and separating the io operations of sync and async code using the features sync and tokio. Starting work to closes #108.

@@ -4,4 +4,9 @@ fn main() {
if var("CARGO_FEATURE_DEFLATE_MINIZ").is_ok() {
println!("cargo:warning=Feature `deflate-miniz` is deprecated; replace it with `deflate`");
}
#[cfg(not(any(feature = "sync", feature = "tokio")))]
compile_error!("Missing Required feature");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update ci.yml to enable sync wherever it has --no-default-features, or else convert it to a no-sync feature (which would be unidiomatic but have the advantage of being backward-compatible for more users).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the tokio feature is properly working in tests I will update the ci. Currently that is broken.

Copy link
Member

@Pr0methean Pr0methean May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for letting me know; but if it's going to take longer than about another 3 days, then I'd also greatly appreciate a CI-able update once or twice per week so we could get an idea of how the PR was progressing toward a releasable state where all further work could be deferred to follow-up PRs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I will try doing it this week after I some of my college exams.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SapryWenInera Have you finished your exams? If not, when do you expect to be able to address the comments on this PR? An ETA would be helpful not only for me, but also for the authors of the other major PRs, given the likelihood of merge conflicts.

compile_error!("Missing Required feature");

#[cfg(all(feature = "sync", feature = "tokio"))]
compile_error!("The features sync and tokio cannot be used together")
Copy link
Member

@Pr0methean Pr0methean May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This restriction may turn out to be a problem for some users. It's possible to import two configurations of a crate twice by renaming one, but then they won't recognize each other's struct types or traits. I can think of two solutions:

  • (my preference) Give the async methods different names (e.g. parse_async) instead. If they're used in the same calling code, move that code to a macro and use method-scoped type names (e.g. type Read = AsyncRead; and macro parameters to differentiate them. See the example in my separate comment.
  • Make a separate crate for the Tokio features, and another for the shared core.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking of pushing the old synchronous code in a module of read named sync and then the async code would be in a tokio module, this would avoid conflicts between the codebases, by the end of the week i'm probably gonna push a PR for that since it doesn't require any feature or ci changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good, but will there be a shared-core module as well?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that the structs and enums can be shared but the actual logic can be separate. This should avoid import conflicts and if there are code changes in the sync part it should not conflict with the async part. Do u think this or the macro one is better?

Copy link
Member

@Pr0methean Pr0methean May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I favor sharing as much code as possible, based on the Don't Repeat Yourself principle. But the modular design should be compatible with the macro approach: define the macros in the shared core module, and invoke them in the sync and tokio modules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SapryWenInera Could you please prioritize this issue for a fix, so that I can run CI on your work in progress and use the results to estimate how much longer this PR will take?

Comment on lines +147 to +169
#[cfg(feature = "tokio")]
impl Zip64CentralDirectoryEndLocator {
pub async fn parse<T>(reader: &mut T) -> ZipResult<Self>
where
T: AsyncRead + Unpin,
{
let magic = reader.read_u32_le().await?;
if magic != ZIP64_CENTRAL_DIRECTORY_END_LOCATOR_SIGNATURE {
return Err(ZipError::InvalidArchive(
"Invalid zip64 locator digital signature header",
));
}
let disk_with_central_directory = reader.read_u32_le().await?;
let end_of_central_directory_offset = reader.read_u64_le().await?;
let number_of_disks = reader.read_u32_le().await?;

Ok(Self {
disk_with_central_directory,
end_of_central_directory_offset,
number_of_disks,
})
}
}
Copy link
Member

@Pr0methean Pr0methean May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg(feature = "tokio")]
impl Zip64CentralDirectoryEndLocator {
pub async fn parse<T>(reader: &mut T) -> ZipResult<Self>
where
T: AsyncRead + Unpin,
{
let magic = reader.read_u32_le().await?;
if magic != ZIP64_CENTRAL_DIRECTORY_END_LOCATOR_SIGNATURE {
return Err(ZipError::InvalidArchive(
"Invalid zip64 locator digital signature header",
));
}
let disk_with_central_directory = reader.read_u32_le().await?;
let end_of_central_directory_offset = reader.read_u64_le().await?;
let number_of_disks = reader.read_u32_le().await?;
Ok(Self {
disk_with_central_directory,
end_of_central_directory_offset,
number_of_disks,
})
}
}
macro_rules! parse {
($maybe_await:ident) => {
let magic = maybe_await(reader.read_u32_le())?;
if magic != ZIP64_CENTRAL_DIRECTORY_END_LOCATOR_SIGNATURE {
return Err(ZipError::InvalidArchive(
"Invalid zip64 locator digital signature header",
));
let disk_with_central_directory = maybe_await(reader.read_u32_le())?;
let end_of_central_directory_offset = maybe_await(reader.read_u64_le()?);
let number_of_disks = maybe_await(reader.read_u32_le())?;
Ok(Self {
disk_with_central_directory,
end_of_central_directory_offset,
number_of_disks,
})
}
}
#[cfg(feature = "tokio")]
pub(crate) async fn await_identity<T: ?Sized>(operand: T) -> T {
T.await
}
impl Zip64CentralDirectoryEndLocator {
#[cfg(feature = "tokio")]
pub async fn parse<T>(reader: &mut T) -> ZipResult<Self>
where
T: AsyncRead + Unpin,
{
use std::future::Future;
async fn await<T>(operand: Future<T>) -> T {
T.await
}
parse!(await_identity)
}
#[cfg(feature = "tokio")]
pub fn parse<T>(reader: &mut T) -> ZipResult<Self>
where
T: Read,
{
parse!(std::convert::identity)
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been brainstorming using macros for sharing code between async and sync code but i can't find a way to create a macro that can handle both .await calls and calls without .await without breaking the other, like in the code u just showed, is the goal of supporting async is to have a different code pathway for people to use or to just be a wrapper around current sync functions?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need an extension trait that's implemented for both Read and AsyncRead, and other for Write and AsyncWrite. For the compression and decompression themselves, I think the wrapper approach is probably adequate, since they're CPU-heavy.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make a demo in another branch of using async as just a wrapper around sync, but my preferred aproach would actually be using async all io calls and trying to create shareable sync functions for non io operations.

Copy link
Member

@Pr0methean Pr0methean May 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why we can't use macros whose parameters have sync and async definitions, to share code between sync and async versions of each function (e.g. Zip64CentralDirectoryEndLocator would have its sync and async parse method bodies generated by separate calls to a parse! macro that took as arguments $reader_read_u32_le_maybe_async:expr and $reader_read_u64_le_maybe_async:expr, both of which could be shared with other methods by having other macros define them at impl-block or wider scope). Could you please explain why you don't think that's feasible?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the long break. There's two issues in trying to share code between async and sync functions using macros, the first is that as long as you're using async-await syntax calling async inside the macro would break sync code and not calling .await breaks async code and using .await outside async code block is not allowed.

macro_rules! maybe_await {
     (maybe_async:expr) => {
              $maybe_async().await.unwrap();
     }
}

fn main() {
     #[cfg!(feature = "async")]
     let string = maybe_await!(async {read_to_string("/tmp/foo")}); // Code breaks due to main not being async
     #[cfg!(feature = "sync")]
     let string = maybe_await!(read_to_string("/tmp/foo")) // Code breaks to to .await being called on sync function
     println!("{}", string)

I don't see a way to solve both problems while sharing any code whatsoever, if u can have any idea on how to due this then i'm all ears.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tests, one fix might be to use a maybe_block_on! macro as well. But maybe this is more trouble than it's worth; let's wait and see how much duplicated code is left after rebasing against #93 and factoring out as much as possible.

Copy link
Member

@Pr0methean Pr0methean Jun 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that one solution might involve 0th-order macros called sync_defs! and async_defs! that would provide different definitions of 1st-order macros such as maybe_await! and maybe_block_on! and read_or_async_read!, and then definitions of sync and async versions of a function would invoke their different 0th-order macros and then the same 2nd-order macro (which would contain the shared code). Could that possibly work?

@Pr0methean
Copy link
Member

Pr0methean commented May 14, 2024

PS. Beware of possible conflicts with #120 and #93, the two other large PRs currently in active development. #93 is probably the one that will be merged first.

@Pr0methean Pr0methean added the major This >~1000 SLOC PR is likely to cause hard-to-resolve conflicts with other open PRs once merged. label May 14, 2024
@Pr0methean
Copy link
Member

FYI: Be aware that some other major PRs are currently open.

@Pr0methean Pr0methean added the Please fix failing tests Tests are failing with this change; please fix them. label May 16, 2024
sorairolake and others added 2 commits May 18, 2024 09:42
This is to enable `doc_auto_cfg` feature with Docs.rs.
deflate-zlib was an omission; deflate64 is a different, backward-incompatible algorithm.

Signed-off-by: Chris Hennick <[email protected]>
@Pr0methean Pr0methean changed the base branch from master to sync-async May 18, 2024 02:36
@Pr0methean
Copy link
Member

I've retargeted this PR to the same branch as #134, but we're getting complicated merge conflicts. Please address them.

@Pr0methean Pr0methean added the merge conflict This PR has a merge conflict with a PR that was in the merge queue when this label was applied. label May 18, 2024
@Pr0methean Pr0methean removed Please fix failing tests Tests are failing with this change; please fix them. merge conflict This PR has a merge conflict with a PR that was in the merge queue when this label was applied. labels May 19, 2024
@Pr0methean
Copy link
Member

Please revert the fuzz/corpus directory. Once the current major PRs are merged or dormant, I will update the corpus using the output of the next cargo fuzz run, plus a run with no seed corpus, and then iteratively minimize it with cargo cmin until there's no further reduction. If you'd like to add corpus entries from other sources, please do so in a separate PR.

@@ -316,16 +318,4 @@ mod test {
let mut file = reader.by_index(0).unwrap();
assert_eq!(file.read(&mut decompressed).unwrap(), 12);
}

#[test]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you deleting this?

@@ -25,6 +25,7 @@
//! | ZipCrypto deprecated encryption | ✅ | ✅ |
//!
//!
#![cfg_attr(docsrs, feature(doc_auto_cfg))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

.await?;

let results: Vec<Result<CentralDirectoryInfo, ZipError>> = search_results.iter().map(|(footer64, archive_offset)| {
let directory_start_result = footer64.central_directory_offset.checked_add(*archive_offset).ok_or(ZipError::InvalidArchive("Invalid central directory size or effect"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let directory_start_result = footer64.central_directory_offset.checked_add(*archive_offset).ok_or(ZipError::InvalidArchive("Invalid central directory size or effect"));
let directory_start_result = footer64.central_directory_offset.checked_add(*archive_offset).ok_or(ZipError::InvalidArchive("Invalid central directory size or offset"));

)
.await?;

let results: Vec<Result<CentralDirectoryInfo, ZipError>> = search_results.iter().map(|(footer64, archive_offset)| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factor out lines 109-130 and their copies in the sync version to a shared method. It can be called validate_zip64_footers.

#[allow(deprecated)]
let compression_method =
CompressionMethod::from_u16(reader.read_u16_le().await?);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Factor out lines 286-313 into a shared method; it can be called decode_aes_extra_data.

Comment on lines +147 to +169
#[cfg(feature = "tokio")]
impl Zip64CentralDirectoryEndLocator {
pub async fn parse<T>(reader: &mut T) -> ZipResult<Self>
where
T: AsyncRead + Unpin,
{
let magic = reader.read_u32_le().await?;
if magic != ZIP64_CENTRAL_DIRECTORY_END_LOCATOR_SIGNATURE {
return Err(ZipError::InvalidArchive(
"Invalid zip64 locator digital signature header",
));
}
let disk_with_central_directory = reader.read_u32_le().await?;
let end_of_central_directory_offset = reader.read_u64_le().await?;
let number_of_disks = reader.read_u32_le().await?;

Ok(Self {
disk_with_central_directory,
end_of_central_directory_offset,
number_of_disks,
})
}
}
Copy link
Member

@Pr0methean Pr0methean May 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't understand why we can't use macros whose parameters have sync and async definitions, to share code between sync and async versions of each function (e.g. Zip64CentralDirectoryEndLocator would have its sync and async parse method bodies generated by separate calls to a parse! macro that took as arguments $reader_read_u32_le_maybe_async:expr and $reader_read_u64_le_maybe_async:expr, both of which could be shared with other methods by having other macros define them at impl-block or wider scope). Could you please explain why you don't think that's feasible?

@@ -21,7 +21,7 @@ jobs:
matrix:
os: [ubuntu-latest, macOS-latest, windows-latest]
rustalias: [stable, nightly, msrv]
feature_flag: ["--all-features", "--no-default-features", ""]
feature_flag: ["--no-default-features --features sync_all", "--no-default-features --features tokio_all", "--no-default-features --features sync", "--no-default-features --features tokio", ""]
Copy link
Member

@Pr0methean Pr0methean May 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please make it possible to build with both sync and tokio? To do this, you can invoke the traits with an explicit self parameter, e.g. Read::read_exact(self, buf) and AsyncRead::read_exact(self, buf) instead of self.read_exact(buf). Since methods with the same name don't conflict when they're specified in different traits, this should be the only change needed.

Copy link
Member

@Pr0methean Pr0methean Jun 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I found a crate that does something similar to this, but with proc macros: https://crates.io/crates/async-generic. Looks like it may still be worth layering a few regular macros on top, for when async-generic functions call other async-generic functions.

@cosmicexplorer
Copy link
Contributor

#207 refactors readers to pass through the parameterized reader type rather than flatten it to &mut dyn Read. I'm not sure whether it will be accepted yet, but my original goal for that change was to enable async wrapping, since it means ZipFile (ZipEntry in that PR) is now Send if the parameterized reader type R is Send, so you may find it useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major This >~1000 SLOC PR is likely to cause hard-to-resolve conflicts with other open PRs once merged. Please address review comments Some review comments are still open.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Async support using tokio for Read/Write calls
4 participants