Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove hardcoded BufRead trait bounds #288

Merged
merged 1 commit into from
Jul 5, 2024

Conversation

cosmicexplorer
Copy link
Contributor

@cosmicexplorer cosmicexplorer commented Jul 2, 2024

Problem

Rust structs with generic type parameters are typically not declared with trait bounds on the struct itself, as this makes it difficult to compose into other generic structs. The zip crate depends on zstd, and is unable to declare a generic ZipFileReader enum (e.g. https://github.com/zip-rs/zip2/blob/b6e0a0693ba2d07f541cd9017b60df7e65817e48/src/read.rs#L190) without creating a vtable &'a mut dyn Read, since the R: BufRead bound on the Decoder struct declaration requires a concrete type implementing BufRead, whereas io::BufReader doesn't have the same R: Read bound on the struct declaration.

In essence, we currently have this:

pub(crate) enum ZipFileReader<'a> {
    NoReader,
    Raw(io::Take<&'a mut dyn Read>),
    Stored(Crc32Reader<&'a mut dyn Read>),
    #[cfg(feature = "_deflate-any")]
    Deflated(Crc32Reader<DeflateDecoder<&'a mut dyn Read>>),
    #[cfg(feature = "deflate64")]
    Deflate64(Crc32Reader<Deflate64Decoder<io::BufReader<&'a mut dyn Read>>>),
    #[cfg(feature = "bzip2")]
    Bzip2(Crc32Reader<BzDecoder<&'a mut dyn Read>>),
    #[cfg(feature = "zstd")]
    Zstd(Crc32Reader<ZstdDecoder<'a, io::BufReader<&'a mut dyn Read>>>),
    #[cfg(feature = "lzma")]
    Lzma(Crc32Reader<Box<LzmaDecoder<&'a mut dyn Read>>>),
}

but we would like to be able to have this instead:

pub(crate) enum EntryReader<R> {
    NoReader,
    Raw(R),
    Stored(Crc32Reader<R>),
    #[cfg(feature = "_deflate-any")]
    Deflated(Crc32Reader<DeflateDecoder<R>>),
    #[cfg(feature = "deflate64")]
    Deflate64(Crc32Reader<Deflate64Decoder<io::BufReader<R>>>),
    #[cfg(feature = "bzip2")]
    Bzip2(Crc32Reader<BzDecoder<R>>),
    #[cfg(feature = "zstd")]
    Zstd(Crc32Reader<ZstdDecoder<'static, io::BufReader<R>>>),
    #[cfg(feature = "lzma")]
    Lzma(Crc32Reader<Box<LzmaDecoder<R>>>),
}

This avoids the creation of a vtable indirection and unnecessary lifetime bound.

Solution

Remove the R: BufRead bounds on the Encoder and Decoder struct declarations.

Result

This is not a breaking change.

@gyscos
Copy link
Owner

gyscos commented Jul 5, 2024

Indeed - thanks for the PR!

@gyscos
Copy link
Owner

gyscos commented Jul 5, 2024

since the R: BufRead bound on the Decoder struct declaration requires a concrete type implementing BufRead

I don't fully understand the original issue though - couldn't you just forward the bound, and have pub(crate) enum EntryReader<R: Read>?

@gyscos gyscos merged commit a3738d6 into gyscos:main Jul 5, 2024
5 checks passed
@cosmicexplorer
Copy link
Contributor Author

Yes, that's totally right, but this would require forwarding the R: Read bound everywhere else in the zip crate, which would break user code!

@gyscos
Copy link
Owner

gyscos commented Jul 5, 2024

Well, moving from a lifetime parameter to a type parameter is already breaking change, right? You could start with a R: Read bound on the type definition, and later remove it, leaving it only on the impl that need it.

I agree that omitting the bound in the type definition is nice to have, but I'm not sure it was really a blocker for this?

@cosmicexplorer
Copy link
Contributor Author

Hm, you're totally right, sorry! There are a few reasons why this would have been convenient, but I think you're probably right that it's not technically a blocker (sorry again!):

  • Our ZipFile<'a> struct imposes that lifetime requirement already, in order to ensure we only hand out a single mutable reference to the underlying Read handle:
pub struct ZipFile<'a> {
    pub(crate) data: Cow<'a, ZipFileData>,
    pub(crate) crypto_reader: Option<CryptoReader<'a>>,
    pub(crate) reader: ZipFileReader<'a>,
}
  • In perf: Pipelined parallel extract  zip-rs/zip2#72 (which I am redoing in zip-rs/zip2@master...cosmicexplorer:zip:pipelining, and in particular this commit cosmicexplorer/zip@33f000d, which is what is actually unblocked by this upstream change), I am trying to make our zip extraction methods thread-parallelizable. As part of our own refactoring (not yours), it becomes significantly easier to make generic zip readers without going through the CryptoReader<'a> interface if we can avoid the need to specify a concrete type parameter. Part of the reason the zip crate internally has some very messy mutable state (separate from the necessary mutability of any Read usage) is because of the difficulty of abstraction incurred by needing to provide a concrete Read impl in our type parameters, as this "poisons" all of our intermediate Read adapters and forces them to all be concrete as well.
  • Similar to the previous point, this makes ZipFileReader and ZipFile non-Send unless we manually add &'a mut dyn Read + Send to all cases of the ZipFileReader enum (which is tiresome), or add an unsafe impl Send (which is unsound), and that is an annoyance to overcome as I work on providing parallelized zip extraction and creation operations.

Thank you for asking further, because you're absolutely right that this was not a blocker in a real sense. However, ZStd{En,De}coder is the only one of our dependencies blocking me from making the refactoring I'd like to propose in cosmicexplorer/zip@33f000d, so I broke out this upstream PR as part of my parallelization work to avoid dumping another massive monolithic PR on @Pr0methean. Please let me know if I still haven't quite made the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants