Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide zstd compressed install only release artifacts? #304

Open
indygreg opened this issue Aug 25, 2024 · 4 comments
Open

Provide zstd compressed install only release artifacts? #304

indygreg opened this issue Aug 25, 2024 · 4 comments
Assignees
Labels
performance Potential performance improvement

Comments

@indygreg
Copy link
Collaborator

This project's release artifacts are continuing to gain popularity.

I initially only published the zstd compressed full archives for use with PyOxidizer. Then when people discovered the utility of the distributions and they wanted smaller downloadable artifacts, we made the install_only archive variants. I chose .tar.gz at the time because of the ubiquity of zlib and knew I couldn't get away with zstd only.

We still need to provide gzip archives for compatibility I suspect. But I'm wondering if we should provide zstd compressed archives so customers could speed up decompression by a few seconds. This could matter for things like GitHub Actions. Every second can count!

WDYK @charliermarsh? Would uv benefit from the speedup from zstd archives?

@indygreg
Copy link
Collaborator Author

Actually, the time savings may be <1.0 now with the stripped distributions. The size of the debug symbols and raw object files made zstd a very obvious advantage with the full archives. But maybe gzip with stripped is small enough the >5x slower decompression doesn't translate to a meaningful wall time difference?

@charliermarsh
Copy link
Member

We could try it out and benchmark it in uv? We seamlessly support gzip and zstd already, so we’d just need to generate the assets.

@charliermarsh charliermarsh self-assigned this Dec 17, 2024
@charliermarsh charliermarsh added the performance Potential performance improvement label Dec 18, 2024
@charliermarsh
Copy link
Member

For reasons that I don't fully understand, using zstd here appears to be slower? Even with a basic local benchmark:

Starting extraction benchmarks...

Decompressing Gzip: cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.gz
Extracting to: /var/folders/nt/6gf2v7_s3k13zq_t3944rwz40000gn/T/.tmphWYnXY
Gzip extraction complete in 308.071959ms

------------------------

Decompressing Zstandard: cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.zst
Extracting to: /var/folders/nt/6gf2v7_s3k13zq_t3944rwz40000gn/T/.tmpZsHsHK
Zstandard extraction complete in 407.046833ms
use async_compression::tokio::bufread::{GzipDecoder, ZstdDecoder};
use tokio::fs::File;
use tokio::io::{self, AsyncReadExt, BufReader};
use std::time::Instant;
use tempfile::tempdir;
use tokio_tar::Archive;
use std::fs::File as SyncFile;
use std::io::{BufReader as SyncBufReader};
use flate2::read::GzDecoder;
use zstd::stream::Decoder as ZstdSyncDecoder;
use tar::Archive as SyncArchive;

async fn decompress_gzip(file_path: &str, use_sync: bool) -> io::Result<()> {
    println!("Decompressing Gzip: {}", file_path);

    // Create temporary directory
    let temp_dir = tempdir()?;
    println!("Extracting to: {}", temp_dir.path().display());

    let start = Instant::now();

    if use_sync {
        // Synchronous implementation
        let file = SyncFile::open(file_path)?;
        let buf_reader = SyncBufReader::new(file);
        let decoder = GzDecoder::new(buf_reader);
        let mut archive = SyncArchive::new(decoder);
        archive.unpack(temp_dir.path())?;
    } else {
        // Asynchronous implementation
        let file = File::open(file_path).await?;
        let buf_reader = BufReader::new(file);
        let decoder = GzipDecoder::new(buf_reader);
        let mut archive = Archive::new(decoder);
        archive.unpack(temp_dir.path()).await?;
    }

    let duration = start.elapsed();
    println!(
        "Gzip extraction complete in {:?}",
        duration
    );
    Ok(())
}

async fn decompress_zstd(file_path: &str, use_sync: bool) -> io::Result<()> {
    println!("Decompressing Zstandard: {}", file_path);

    // Create temporary directory
    let temp_dir = tempdir()?;
    println!("Extracting to: {}", temp_dir.path().display());

    let start = Instant::now();

    if use_sync {
        // Synchronous implementation
        let file = SyncFile::open(file_path)?;
        let buf_reader = SyncBufReader::new(file);
        let decoder = ZstdSyncDecoder::new(buf_reader)?;
        let mut archive = SyncArchive::new(decoder);
        archive.unpack(temp_dir.path())?;
    } else {
        // Asynchronous implementation
        let file = File::open(file_path).await?;
        let buf_reader = BufReader::new(file);
        let decoder = ZstdDecoder::new(buf_reader);
        let mut archive = Archive::new(decoder);
        archive.unpack(temp_dir.path()).await?;
    }

    let duration = start.elapsed();
    println!(
        "Zstandard extraction complete in {:?}",
        duration
    );
    Ok(())
}

#[tokio::main]
async fn main() -> std::io::Result<()> {
    let gzip_file = "cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.gz"; // Path to the .tar.gz file
    let zstd_file = "cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.zst"; // Path to the .tar.zst file
    let use_sync = true; // Set to true to use synchronous implementation

    println!("Starting extraction benchmarks...\n");

    if let Err(e) = decompress_gzip(gzip_file, use_sync).await {
        eprintln!("Failed to extract Gzip file: {}", e);
    }

    println!("\n------------------------\n");

    if let Err(e) = decompress_zstd(zstd_file, use_sync).await {
        eprintln!("Failed to extract Zstandard file: {}", e);
    }

    Ok(())
}

@charliermarsh
Copy link
Member

If I don't "unpack", though, it's much faster:

Decompressing Gzip: cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.gz
Gzip extraction complete in 122.582292ms

------------------------

Decompressing Zstandard: cpython-3.13.1+20241206-aarch64-apple-darwin-install_only_stripped.tar.zst
Zstandard extraction complete in 36.125333ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement
Projects
None yet
Development

No branches or pull requests

2 participants