add script to run hash algorithm benchmark #336

spencerschrock · 2025-01-13T19:20:48Z

Summary

Builds upon the work in #306 and starts to define individual experiments. This one is aimed specifically at hashing algorithm.

As far as hashing is concerned, bytes are bytes. By generating our own bytes, we avoid I/O associated with reading models from disk. While we could read actual models into memory, recreating the filesystem seems unecessary for this benchmark.

$ hatch run +py=3.11 bench:hash --repeat 5 --methods sha256 blake2
algorithm: sha256, size: 1.0 KB, best time: 2.569984644651413e-06s
algorithm: blake2, size: 1.0 KB, best time: 3.1301751732826233e-06s
algorithm: sha256, size: 1.0 MB, best time: 0.0007016910240054131s
algorithm: blake2, size: 1.0 MB, best time: 0.001357788685709238s
algorithm: sha256, size: 512.0 MB, best time: 0.41755866911262274s
algorithm: blake2, size: 512.0 MB, best time: 0.7319729649461806s
algorithm: sha256, size: 1.0 GB, best time: 0.8285107491537929s
algorithm: blake2, size: 1.0 GB, best time: 1.466184071265161s
algorithm: sha256, size: 4.0 GB, best time: 3.375688728876412s
algorithm: blake2, size: 4.0 GB, best time: 5.8439111188054085s
algorithm: sha256, size: 16.0 GB, best time: 14.146526537369937s
algorithm: blake2, size: 16.0 GB, best time: 23.482591222040355s
algorithm: sha256, size: 32.0 GB, best time: 21.890016920864582s
algorithm: blake2, size: 32.0 GB, best time: 42.17644724994898s

Release Note

NONE

Documentation

NONE

Signed-off-by: Spencer Schrock <[email protected]>

As far as hashing is concerned, bytes are bytes. By generating our own bytes, we avoid I/O associated with reading models from disk. While we could read the model into memory, recreating the filesystem seems complicated. Signed-off-by: Spencer Schrock <[email protected]>

benchmarks/exp_hash.py

mihaimaruseac

Thank you, it looks great. I have a few comments / discussion starters to make this useful both for humans and machines (plotting, comparing between runs)

benchmarks/exp_hash.py

mihaimaruseac · 2025-01-14T20:04:45Z

benchmarks/exp_hash.py

+            hasher = _get_hasher(algorithm)
+
+            def hash(hasher=hasher, size=size):
+                hasher.update(data[:size])


Should we reinitialize the hasher too under the measured scope? We can make _get_hasher return just the constructor and call it here

since this is in the inner most loop, it's always a new hasher. I dont think we need to reset anything

But each call of hash from timeit just hashes data, it doesn't time the time it takes to init the hasher.

benchmarks/exp_hash.py

Signed-off-by: Spencer Schrock <[email protected]>

benchmarks/exp_hash.py

Signed-off-by: Spencer Schrock <[email protected]>

add script to run hash algorithm experiment

1ed45f0

Signed-off-by: Spencer Schrock <[email protected]>

spencerschrock force-pushed the benchmark branch from a94a8ed to 1ed45f0 Compare January 13, 2025 21:13

spencerschrock force-pushed the benchmark branch from 840f809 to 6395b78 Compare January 14, 2025 18:35

spencerschrock commented Jan 14, 2025

View reviewed changes

benchmarks/exp_hash.py Outdated Show resolved Hide resolved

spencerschrock marked this pull request as ready for review January 14, 2025 19:46

spencerschrock requested review from a team as code owners January 14, 2025 19:46

mihaimaruseac reviewed Jan 14, 2025

View reviewed changes

spencerschrock added 3 commits January 14, 2025 15:03

use default data sizes better for log scale

96bdae2

Signed-off-by: Spencer Schrock <[email protected]>

alter output per PR feedback

5563943

Signed-off-by: Spencer Schrock <[email protected]>

generate data as needed for each size

43d4f12

Signed-off-by: Spencer Schrock <[email protected]>

mihaimaruseac reviewed Jan 14, 2025

View reviewed changes

benchmarks/exp_hash.py Outdated Show resolved Hide resolved

right align the measurement and limit to 5 decimals

c4d190d

Signed-off-by: Spencer Schrock <[email protected]>

mihaimaruseac approved these changes Jan 15, 2025

View reviewed changes

mihaimaruseac merged commit 3006f76 into sigstore:main Jan 15, 2025
33 checks passed

spencerschrock deleted the benchmark branch January 15, 2025 15:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add script to run hash algorithm benchmark #336

add script to run hash algorithm benchmark #336

spencerschrock commented Jan 13, 2025 •

edited

Loading

mihaimaruseac left a comment

mihaimaruseac Jan 14, 2025

spencerschrock Jan 14, 2025

mihaimaruseac Jan 14, 2025

add script to run hash algorithm benchmark #336

add script to run hash algorithm benchmark #336

Conversation

spencerschrock commented Jan 13, 2025 • edited Loading

Summary

Release Note

Documentation

mihaimaruseac left a comment

Choose a reason for hiding this comment

mihaimaruseac Jan 14, 2025

Choose a reason for hiding this comment

spencerschrock Jan 14, 2025

Choose a reason for hiding this comment

mihaimaruseac Jan 14, 2025

Choose a reason for hiding this comment

spencerschrock commented Jan 13, 2025 •

edited

Loading