Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request]: Is there a real-world benchmark for xz? #83

Open
svenha opened this issue Feb 24, 2024 · 6 comments
Open

[Feature Request]: Is there a real-world benchmark for xz? #83

svenha opened this issue Feb 24, 2024 · 6 comments

Comments

@svenha
Copy link

svenha commented Feb 24, 2024

Describe the Feature

A makefile target that downloads adequate data to run a real-world benchmark (compression and decompression). Or something similar.

Expected Complications

No response

Will I try to implement this new feature?

No

@JiaT75
Copy link
Contributor

JiaT75 commented Feb 28, 2024

Hello!

Thank you for the feature request. Currently, we do not have any official benchmark framework for any of the XZ projects. When we develop new features that require benchmarking data, we tend to collect the files with characteristics that best fit that feature (data type, size, entropy, etc.). Often times community members will also help us benchmark since they may have access to machines, data, or ideas that the maintainers do not.

As such, we do not have any plans for a more official, robust, and structured benchmark framework at this time. We unfortunately have a few high priority tasks to attend to first. Eventually, this could be a nice thing to have when we revisit encoder/decoder optimizations to make it easier for the community to help us test various ideas. We would likely maintain a separate repository for this so it could be useful for other .xz implementations.

If you have ideas on good ways to do this or bad things we should avoid, we are always open to suggestions :) . We probably wouldn't want to actually host the benchmark data ourselves due to storage requirements and potential file distribution copyright complexities, but a bring-your-own-data framework could be useful for people. Such a thing may already exist, so we would need to start by surveying what solutions other projects use for something like this.

@LaurentBonnaud
Copy link

Hi,
zstd has a nice integrated benchmark feature:

$ zstd -b
 3#Synthetic 50%     :  10000000 ->   3230847 (x3.095),  346.2 MB/s, 2616.6 MB/s

It is useful to have an easily reproducible test.
In xz it could help to test which variant among

  • Basic C version
  • Branchless C
  • x86-64 inline assembly

is the fastest on a given system.

It would be even better if all 3 variants could be compiled into the same binary and chosen at runtime.

@alerque

This comment was marked as off-topic.

@axeld-galadrim

This comment was marked as off-topic.

@alerque

This comment was marked as off-topic.

@Sur3
Copy link

Sur3 commented Apr 17, 2024

The only true benchmark for any compression software is the hutter prize: http://prize.hutter1.net/
XZ is ranked place 75 in that regard, which is not bad: http://mattmahoney.net/dc/text.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants