Hyperlight uses the Criterion framework to run and analyze benchmarks. A benefit to this framework is that it doesn't require the nightly toolchain.
-
Every time a branch gets a push
- Compares the current branch benchmarking results to the "dev-latest" release (which is the most recent push to "main" branch). This is done as part of
dep_rust.yml
, which is invoked byValidatePullRequest.yml
. These benchmarks are for the developer to compare their branch to main, and the results can only be seen in the GitHub action logs, and nothing is saved.
sandboxes/create_sandbox time: [33.803 ms 34.740 ms 35.763 ms] change: [+0.7173% +3.7017% +7.1346%] (p = 0.03 < 0.05) Change within noise threshold.*
- Compares the current branch benchmarking results to the "dev-latest" release (which is the most recent push to "main" branch). This is done as part of
-
For each release
- For each release, benchmarks are ran as part of the release pipeline in
CreateRelease.yml
, which invokesBenchmarks.yml
. These benchmark results are compared to the previous release, and are uploaded as port of the "Release assets" on the GitHub release page.
- For each release, benchmarks are ran as part of the release pipeline in
Currently, benchmarks are ran on windows, linux-kvm (ubuntu), and linux-hyperv (mariner). Only release builds are benchmarked, not debug.
When running cargo bench -- --save-baseline my_baseline
, criterion runs all benchmarks defined in src/hyperlight_host/benches/
, prints the results to the stdout, as well as produces several artifacts. All artifacts can be found in target/criterion/
. For each benchmarking group, for each benchmark, a subfolder with the name of the benchmark is created. This folder in turn contains folders my_baseline
, new
and report
. When running cargo bench
, criterion always creates new
and report
, which always contains the most recent benchmark result and html report, but because we provided the --save-baseline
flag, we also have a my_baseline
folder, which is an exact copy of new
. Moreover, if this my_baseline
folder already existed before we ran cargo bench -- --save-baseline my_baseline
, criterion would also compare the benchmark results with the old my_baseline
folder, and then overwrite the folder.
The first time we run cargo bench -- --save-baseline my_baseline
(starting with a clean project), we get the following structure.
target/criterion/
|-- report
`-- sandboxes
|-- create_sandbox
| |-- my_baseline
| |-- new
| `-- report
|-- create_sandbox_and_call_context
| |-- my_baseline
| |-- new
| `-- report
`-- report
If we run the exact same command again, we get
target/criterion/
|-- report
`-- sandboxes
|-- create_sandbox
| |-- change
| |-- my_baseline
| |-- new
| `-- report
|-- create_sandbox_and_call_context
| |-- change
| |-- my_baseline
| |-- new
| `-- report
`-- report
Note that it overwrote the previous my_baseline
with the new result. But notably, there is a new change
folder, which contains the benchmarking difference between the two runs. In addition, on stdout you'll also find a comparison to our previous my_baseline
run.
time: [40.434 ms 40.777 ms 41.166 ms]
change: [+0.0506% +1.1399% +2.2775%] (p = 0.06 > 0.05)
No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
Note that Criterion does not differ between release and debug/dev benchmark results, so it's up to the developer to make sure baselines of the same config are compared.
Use just bench [debug/release]
parameter to run benchmarks. Comparing local benchmarks results to github-saved benchmarks doesn't make much sense, since you'd be using different hardware, but you can use just bench-download os hypervisor [tag]
to download and extract the GitHub release benchmarks to the correct place folder. You can then run just bench-ci main
to compare to (and overwrite) the previous release benchmarks. Note that main
is the name of the baselines stored in GitHub.