-
-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide zstd compressed install only release artifacts? #304
Comments
Actually, the time savings may be <1.0 now with the stripped distributions. The size of the debug symbols and raw object files made zstd a very obvious advantage with the full archives. But maybe gzip with stripped is small enough the >5x slower decompression doesn't translate to a meaningful wall time difference? |
We could try it out and benchmark it in uv? We seamlessly support gzip and zstd already, so we’d just need to generate the assets. |
For reasons that I don't fully understand, using zstd here appears to be slower? Even with a basic local benchmark:
use async_compression::tokio::bufread::{GzipDecoder, ZstdDecoder};
use tokio::fs::File;
use tokio::io::{self, AsyncReadExt, BufReader};
use std::time::Instant;
use tempfile::tempdir;
use tokio_tar::Archive;
use std::fs::File as SyncFile;
use std::io::{BufReader as SyncBufReader};
use flate2::read::GzDecoder;
use zstd::stream::Decoder as ZstdSyncDecoder;
use tar::Archive as SyncArchive;
async fn decompress_gzip(file_path: &str, use_sync: bool) -> io::Result<()> {
println!("Decompressing Gzip: {}", file_path);
// Create temporary directory
let temp_dir = tempdir()?;
println!("Extracting to: {}", temp_dir.path().display());
let start = Instant::now();
if use_sync {
// Synchronous implementation
let file = SyncFile::open(file_path)?;
let buf_reader = SyncBufReader::new(file);
let decoder = GzDecoder::new(buf_reader);
let mut archive = SyncArchive::new(decoder);
archive.unpack(temp_dir.path())?;
} else {
// Asynchronous implementation
let file = File::open(file_path).await?;
let buf_reader = BufReader::new(file);
let decoder = GzipDecoder::new(buf_reader);
let mut archive = Archive::new(decoder);
archive.unpack(temp_dir.path()).await?;
}
let duration = start.elapsed();
println!(
"Gzip extraction complete in {:?}",
duration
);
Ok(())
}
async fn decompress_zstd(file_path: &str, use_sync: bool) -> io::Result<()> {
println!("Decompressing Zstandard: {}", file_path);
// Create temporary directory
let temp_dir = tempdir()?;
println!("Extracting to: {}", temp_dir.path().display());
let start = Instant::now();
if use_sync {
// Synchronous implementation
let file = SyncFile::open(file_path)?;
let buf_reader = SyncBufReader::new(file);
let decoder = ZstdSyncDecoder::new(buf_reader)?;
let mut archive = SyncArchive::new(decoder);
archive.unpack(temp_dir.path())?;
} else {
// Asynchronous implementation
let file = File::open(file_path).await?;
let buf_reader = BufReader::new(file);
let decoder = ZstdDecoder::new(buf_reader);
let mut archive = Archive::new(decoder);
archive.unpack(temp_dir.path()).await?;
}
let duration = start.elapsed();
println!(
"Zstandard extraction complete in {:?}",
duration
);
Ok(())
}
#[tokio::main]
async fn main() -> std::io::Result<()> {
let gzip_file = "cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.gz"; // Path to the .tar.gz file
let zstd_file = "cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.zst"; // Path to the .tar.zst file
let use_sync = true; // Set to true to use synchronous implementation
println!("Starting extraction benchmarks...\n");
if let Err(e) = decompress_gzip(gzip_file, use_sync).await {
eprintln!("Failed to extract Gzip file: {}", e);
}
println!("\n------------------------\n");
if let Err(e) = decompress_zstd(zstd_file, use_sync).await {
eprintln!("Failed to extract Zstandard file: {}", e);
}
Ok(())
} |
If I don't "unpack", though, it's much faster:
|
This project's release artifacts are continuing to gain popularity.
I initially only published the zstd compressed full archives for use with PyOxidizer. Then when people discovered the utility of the distributions and they wanted smaller downloadable artifacts, we made the
install_only
archive variants. I chose.tar.gz
at the time because of the ubiquity of zlib and knew I couldn't get away with zstd only.We still need to provide gzip archives for compatibility I suspect. But I'm wondering if we should provide zstd compressed archives so customers could speed up decompression by a few seconds. This could matter for things like GitHub Actions. Every second can count!
WDYK @charliermarsh? Would uv benefit from the speedup from zstd archives?
The text was updated successfully, but these errors were encountered: