Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add script to run hash algorithm benchmark #336

Merged
merged 6 commits into from
Jan 15, 2025

Conversation

spencerschrock
Copy link
Contributor

@spencerschrock spencerschrock commented Jan 13, 2025

Summary

Builds upon the work in #306 and starts to define individual experiments. This one is aimed specifically at hashing algorithm.

As far as hashing is concerned, bytes are bytes. By generating our own bytes, we avoid I/O associated with reading models from disk. While we could read actual models into memory, recreating the filesystem seems unecessary for this benchmark.

$ hatch run +py=3.11 bench:hash --repeat 5 --methods sha256 blake2
algorithm: sha256, size: 1.0 KB, best time: 2.569984644651413e-06s
algorithm: blake2, size: 1.0 KB, best time: 3.1301751732826233e-06s
algorithm: sha256, size: 1.0 MB, best time: 0.0007016910240054131s
algorithm: blake2, size: 1.0 MB, best time: 0.001357788685709238s
algorithm: sha256, size: 512.0 MB, best time: 0.41755866911262274s
algorithm: blake2, size: 512.0 MB, best time: 0.7319729649461806s
algorithm: sha256, size: 1.0 GB, best time: 0.8285107491537929s
algorithm: blake2, size: 1.0 GB, best time: 1.466184071265161s
algorithm: sha256, size: 4.0 GB, best time: 3.375688728876412s
algorithm: blake2, size: 4.0 GB, best time: 5.8439111188054085s
algorithm: sha256, size: 16.0 GB, best time: 14.146526537369937s
algorithm: blake2, size: 16.0 GB, best time: 23.482591222040355s
algorithm: sha256, size: 32.0 GB, best time: 21.890016920864582s
algorithm: blake2, size: 32.0 GB, best time: 42.17644724994898s

Release Note

NONE

Documentation

NONE

As far as hashing is concerned, bytes are bytes. By generating our own
bytes, we avoid I/O associated with reading models from disk. While we
could read the model into memory, recreating the filesystem seems
complicated.

Signed-off-by: Spencer Schrock <[email protected]>
@spencerschrock spencerschrock marked this pull request as ready for review January 14, 2025 19:46
@spencerschrock spencerschrock requested review from a team as code owners January 14, 2025 19:46
Copy link
Collaborator

@mihaimaruseac mihaimaruseac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, it looks great. I have a few comments / discussion starters to make this useful both for humans and machines (plotting, comparing between runs)

benchmarks/exp_hash.py Outdated Show resolved Hide resolved
hasher = _get_hasher(algorithm)

def hash(hasher=hasher, size=size):
hasher.update(data[:size])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we reinitialize the hasher too under the measured scope? We can make _get_hasher return just the constructor and call it here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is in the inner most loop, it's always a new hasher. I dont think we need to reset anything

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But each call of hash from timeit just hashes data, it doesn't time the time it takes to init the hasher.

benchmarks/exp_hash.py Outdated Show resolved Hide resolved
benchmarks/exp_hash.py Outdated Show resolved Hide resolved
benchmarks/exp_hash.py Show resolved Hide resolved
benchmarks/exp_hash.py Outdated Show resolved Hide resolved
@mihaimaruseac mihaimaruseac merged commit 3006f76 into sigstore:main Jan 15, 2025
33 checks passed
@spencerschrock spencerschrock deleted the benchmark branch January 15, 2025 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants