-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kani-cov
: A coverage tool for Kani
#3121
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we planning to release this tool or is it only for developers?
The plan is to release the tool. More details in the upcoming RFC. |
kanicov
: A tool to visualize coverage in Kanikani-cov
: A tool to visualize coverage in Kani
kani-cov
: A tool to visualize coverage in Kanikani-cov
: A coverage tool for Kani
Note to self: Consider highlighting uncovered spans in red as the default behavior. At the very least, discuss in the RFC. |
This PR replaces the line-based coverage instrumentation we introduced in #2609 with the standard source-based code coverage instrumentation performed by the Rust compiler. As a result, we now insert code coverage checks in the `StatementKind::Coverage(..)` statements produced by the Rust compiler during compilation. These checks include coverage-relevant information[^note-internal] such as the coverage counter/expression they represent [^note-instrument]. Both the coverage metadata (`kanimap`) and coverage results (`kaniraw`) are saved into files after the verification stage. Unfortunately, we currently have a chicken-egg problem with this PR and #3121, where we introduce a tool named `kani-cov` to postprocess coverage results. As explained in #3143, `kani-cov` is expected to be an alias for the `cov` subcommand and provide most of the postprocessing features for coverage-related purposes. But, the tool will likely be introduced after this change. Therefore, we propose to temporarily print a list of the regions in each function with their associated coverage status (i.e., `COVERED` or `UNCOVERED`). ### Source-based code coverage: An example The main advantage of source-based coverage results is their precision with respect to the source code. The [Source-based Code Coverage](https://clang.llvm.org/docs/SourceBasedCodeCoverage.html) documentation explains more details about the LLVM coverage workflow and its different options. For example, let's take this Rust code: ```rust 1 fn _other_function() { 2 println!("Hello, world!"); 3 } 4 5 fn test_cov(val: u32) -> bool { 6 if val < 3 || val == 42 { 7 true 8 } else { 9 false 10 } 11 } 12 13 #[cfg_attr(kani, kani::proof)] 14 fn main() { 15 let test1 = test_cov(1); 16 let test2 = test_cov(2); 17 assert!(test1); 18 assert!(test2); 19 } ``` Compiling and running the program with `rustc` and the `-C instrument-coverage` flag, and using the LLVM tools can get us the following coverage result: ![Image](https://github.com/model-checking/kani/assets/73246657/9070e390-6e0b-4add-828d-d9f9caacad07) In contrast, the `cargo kani --coverage -Zsource-coverage` command currently generates: ``` src/main.rs (main) * 14:1 - 19:2 COVERED src/main.rs (test_cov) * 5:1 - 6:15 COVERED * 6:19 - 6:28 UNCOVERED * 7:9 - 7:13 COVERED * 9:9 - 9:14 UNCOVERED * 11:1 - 11:2 COVERED ``` which is a verification-based coverage result almost equivalent to the runtime coverage results. ### Benchmarking We have evaluated the performance impact of the instrumentation using the `kani-perf.sh` suite (14 benchmarks). For each test, we compare the average time to run standard verification against the average time to run verification with the source-based code coverage feature enabled[^note-line-evaluation]. The evaluation has been performed on an EC2 `m5a.4xlarge` instance running Ubuntu 22.04. The experimental data has been obtained by running the `kani-perf.sh` script 10 times for each version (`only verification` and `verification + coverage`), computing the average and standard deviation. We've split this data into `small` (tests taking 60s or less) and `large` (tests taking more than 60s) and drawn the two graphs below. #### Performance comparison - `small` benchmarks ![performance_comparison_small](https://github.com/user-attachments/assets/679cf412-0193-4b0c-a78c-2d0fb702706f) #### Performance comparison - `large` benchmarks ![performance_comparison_large](https://github.com/user-attachments/assets/4bb5a895-7f57-49e0-86b5-5fea67fad939) #### Comments on performance Looking at the small tests, the performance impact seems negligible in such cases. The difference is more noticeable in the large tests, where the time to run verification and coverage can take 2x or even more. It wouldn't be surprising that, as programs become larger, the complexity of the coverage checking grows exponentially as well. However, since most verification jobs don't take longer than 30min (1800s), it's OK to say that coverage checking represents a 100-200% slowdown in the worst case w.r.t. standard verification. It's also worth noting a few other things: * The standard deviation remains similar in most cases, meaning that the coverage feature doesn't have an impact on their stability. * We haven't tried any SAT solvers other than the ones used by default for each benchmark. It's possible that other solvers perform better/worse with the coverage feature enabled. ### Call-outs * The soundness issue documented in #3441. * The issue with saving coverage mappings for non-reachable functions documented in #3445. * I've modified the test cases in `tests/coverage/` to test this feature. Since this technique is simpler, we don't need that many test cases. However, it's possible I've left some test cases which don't contribute much. Please let me know if you want to add/remove a test case. [^note-internal]: The coverage mappings can't be accessed through the StableMIR interface so we retrieve them through the internal API. [^note-instrument]: The instrumentation replaces certain counters with expressions based on other counters when possible to avoid a part of the runtime overhead. More details can be found [here](https://github.com/rust-lang/rustc-dev-guide/blob/master/src/llvm-coverage-instrumentation.md#mir-pass-instrumentcoverage). Unfortunately, we can't avoid instrumenting expressions at the moment. [^note-line-evaluation]: We have not compared performance against the line-based code coverage feature because it doesn't seem worth it. The line-based coverage feature is guaranteed to include more coverage checks than the source-based one for any function. In addition, source-based results are more precise than line-based ones. So this change represents both a quantitative and qualitative improvement. By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for implementing all my comments. I have one last before approving it. Why the issue #3543 is classified as a soundness issue now? As far as I understood, the coverage report feature will crash in the cases you mentioned in the issue. Will it report false coverage reports in any case? This is important to fix before stabilization, but it seems to me unnecessary to classify this as a soundness issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend making the type aliases public and incorporating them throughout merge.rs
and summary.rs
as well, but not a blocker for this PR.
Excited to use this feature!
The reason #3543 was classified as a soundness issue is because we were filtering out the results which could lead to a crash in Regardless, I've changed how we handle the checks with these out-of-bounds regions to replicate the spurious failure instead. Since we only need to extend the line by one space to represent the check in such cases, we just do that now. In other words, we are now reporting the same spurious failures, which may be more confusing for users but avoids soundness concerns. For example, for the test in
Note the single region represented at the end of line 13? In the original source code, that space doesn't exist. But we can create that space in this representation and report the same result that the instrumentation gives us. It's less noticeable in the terminal output, but we also "draw it" there: In addition to this change, I've added some more documentation to the code handling this special case and updated #3543 (labeled as "spurious failure" now). Please don't hesitate to ask for more clarifications if you need them. @carolynzech I've also implemented your suggestions to use the type aliases in other modules of |
This PR introduces
kani-cov
, a coverage-oriented tool for Kani. This new tool will help users aggregate raw coverage results produced by Kani and generate human-readable coverage summaries and reports.kani-cov
allows using three different subcommands:merge
,summary
andreport
. In the following, we will show how each one of these commands can be used.The
merge
subcommandThe
merge
subcommand takes raw coverage results from Kani (AKA "kaniraw" files) and generates aggregated coverage results which are stored into another file (AKA a "kanicov" file). This new file can then be used for generating summaries or reports.For example, if we assume this Rust code is in
main.rs
:and we execute Kani with coverage enabled
the raw coverage results will be saved to a folder
we now can aggregate those results with the
merge
subcommandwhich by default produces a
default_kanicov.json
file. We can also use the--output <path>
option to change the name of the file.The type of file we just produced is known as the "profile" or the "kanicov" file, and contains aggregated coverage results which may have been originated in multiple verification sessions. This file is a required input for the other two subcommands.
The
summary
subcommandNow that we have both the "kanicov" file and the "kanimap" file, we are ready to produce coverage metrics with the
summary
subcommand:This outputs a table including coverage metrics for functions, lines and regions for each file referenced in the coverage mappings:
Since the default format is markdown, the table can also be rendered as in this PR description:
At present, no other formats are available. But the JSON and CSV formats would be really nice and easy to have.
The
report
subcommandCoverage reports go a little further and allow users to visualize coverage information on the source code.
The
report
requires the same input files as thesummary
subcommand:When calling this command from a terminal, it will render like this:
However, if you're redirecting the output to a file, it will back off to another text-based mode which simply uses
```
and'''
to represent opening and closing escapes for the coverage regions. This is just another format calledescapes
which can be explicitly requested asSimilar to the
summary
command, this will iterate over the files in the coverage mappings, compute coverage information and highlight any uncovered regions (except #3543).Testing
This PR also changes
compiletest
to automatically call themerge
andreport
subcommands on thecoverage
test suite. The tests have been blessed using the--fix-expected
option and this Python3 script that removes the first line (path to the file) and adds\
after each line so an exact match is expected.Callouts / Known issues
This PR only introduces
kani-cov
. It doesn't plugkani-cov
intokani-driver
so it's automatically run when using the--coverage
option, nor it prepares thekani-cov
binary for distribution. This is better to address later.This PR also spawns a few issues:
kani-cov
#3541 is to avoid duplicating certain data structures that currently live inkani-driver
. We should probably have a different crate for these structures and import them fromkani-driver
andkani-cov
, but this is more of a distribution problem that doesn't need to be tackled here.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.