Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kani-cov: A coverage tool for Kani #3121

Merged
merged 78 commits into from
Oct 10, 2024

Conversation

adpaco-aws
Copy link
Contributor

@adpaco-aws adpaco-aws commented Apr 3, 2024

This PR introduces kani-cov, a coverage-oriented tool for Kani. This new tool will help users aggregate raw coverage results produced by Kani and generate human-readable coverage summaries and reports.

kani-cov allows using three different subcommands: merge, summary and report. In the following, we will show how each one of these commands can be used.

The merge subcommand

The merge subcommand takes raw coverage results from Kani (AKA "kaniraw" files) and generates aggregated coverage results which are stored into another file (AKA a "kanicov" file). This new file can then be used for generating summaries or reports.

For example, if we assume this Rust code is in main.rs:

1 fn _other_function() {
2    println!("Hello, world!");
3 }
4
5 fn test_cov(val: u32) -> bool {
6     if val < 3 || val == 42 {
7         true
8     } else {
9         false
10    }
11 }
12
13 #[cfg_attr(kani, kani::proof)]
14 fn main() {
15    let test1 = test_cov(1);
16    let test2 = test_cov(2);
17    assert!(test1);
18    assert!(test2);
19 }

and we execute Kani with coverage enabled

kani main.rs --coverage -Zsource-coverage

the raw coverage results will be saved to a folder

[info] Coverage results saved to /home/ubuntu/coverage-experiments/src/kanicov_2024-09-23_23-49

we now can aggregate those results with the merge subcommand

kani-cov merge kanicov_2024-09-23_23-49/*kaniraw.json

which by default produces a default_kanicov.json file. We can also use the --output <path> option to change the name of the file.

The type of file we just produced is known as the "profile" or the "kanicov" file, and contains aggregated coverage results which may have been originated in multiple verification sessions. This file is a required input for the other two subcommands.

The summary subcommand

Now that we have both the "kanicov" file and the "kanimap" file, we are ready to produce coverage metrics with the summary subcommand:

kani-cov summary kanicov_2024-09-23_23-49/kanicov_2024-09-23_23-49_kanimap.json --profile default_kanicov.json

This outputs a table including coverage metrics for functions, lines and regions for each file referenced in the coverage mappings:

| Filename                                      | Function (%) | Line (%)     | Region (%)  |
| --------------------------------------------- | ------------ | ------------ | ----------- |
| /home/ubuntu/coverage-experiments/src/main.rs | 2/3 (66.67)  | 8/12 (66.67) | 4/6 (66.67) |

Since the default format is markdown, the table can also be rendered as in this PR description:

Filename Function (%) Line (%) Region (%)
/home/ubuntu/coverage-experiments/src/main.rs 2/3 (66.67) 8/12 (66.67) 4/6 (66.67)

At present, no other formats are available. But the JSON and CSV formats would be really nice and easy to have.

The report subcommand

Coverage reports go a little further and allow users to visualize coverage information on the source code.

The report requires the same input files as the summary subcommand:

kani-cov report kanicov_2024-09-23_23-49/kanicov_2024-09-23_23-49_kanimap.json --profile default_kanicov.json

When calling this command from a terminal, it will render like this:

Screenshot 2024-10-04 at 5 49 51 PM

However, if you're redirecting the output to a file, it will back off to another text-based mode which simply uses ``` and ''' to represent opening and closing escapes for the coverage regions. This is just another format called escapes which can be explicitly requested as

kani-cov report kanicov_2024-09-23_23-49/kanicov_2024-09-23_23-49_kanimap.json --profile default_kanicov.json --format=escapes

Similar to the summary command, this will iterate over the files in the coverage mappings, compute coverage information and highlight any uncovered regions (except #3543).

Testing

This PR also changes compiletest to automatically call the merge and report subcommands on the coverage test suite. The tests have been blessed using the --fix-expected option and this Python3 script that removes the first line (path to the file) and adds \ after each line so an exact match is expected.

import os

def process_files(directory):
    for filename in os.listdir(directory):
        if filename == "expected":
            file_path = os.path.join(directory, filename)
            with open(file_path, "r") as file:
                lines = file.readlines()
            
            # Remove the first line
            lines = lines[1:]
            
            # Add the character `\` after every line
            processed_lines = []
            for i, line in enumerate(lines):
                processed_lines.append(line.replace("\n", "\\\n"))
            
            # Write the processed lines back to the file
            with open(file_path, "w") as file:
                file.writelines(processed_lines)

Callouts / Known issues

This PR only introduces kani-cov. It doesn't plug kani-cov into kani-driver so it's automatically run when using the --coverage option, nor it prepares the kani-cov binary for distribution. This is better to address later.

This PR also spawns a few issues:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 and MIT licenses.

@github-actions github-actions bot added the Z-BenchCI Tag a PR to run benchmark CI label Apr 3, 2024
Copy link
Contributor

@celinval celinval left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we planning to release this tool or is it only for developers?

Cargo.toml Outdated Show resolved Hide resolved
tools/kanicov/Cargo.toml Outdated Show resolved Hide resolved
@adpaco-aws
Copy link
Contributor Author

Are we planning to release this tool or is it only for developers?

The plan is to release the tool. More details in the upcoming RFC.

@adpaco-aws adpaco-aws changed the title [Draft] kanicov: A tool to visualize coverage in Kani [Draft] kani-cov: A tool to visualize coverage in Kani Apr 9, 2024
@adpaco-aws adpaco-aws changed the title [Draft] kani-cov: A tool to visualize coverage in Kani [Draft] kani-cov: A coverage tool for Kani Apr 9, 2024
@adpaco-aws
Copy link
Contributor Author

Note to self: Consider highlighting uncovered spans in red as the default behavior. At the very least, discuss in the RFC.

github-merge-queue bot pushed a commit that referenced this pull request Aug 27, 2024
This PR replaces the line-based coverage instrumentation we introduced
in #2609 with the standard source-based code coverage instrumentation
performed by the Rust compiler.

As a result, we now insert code coverage checks in the
`StatementKind::Coverage(..)` statements produced by the Rust compiler
during compilation. These checks include coverage-relevant
information[^note-internal] such as the coverage counter/expression they
represent [^note-instrument]. Both the coverage metadata (`kanimap`) and
coverage results (`kaniraw`) are saved into files after the verification
stage.

Unfortunately, we currently have a chicken-egg problem with this PR and
#3121, where we introduce a tool named `kani-cov` to postprocess
coverage results. As explained in #3143, `kani-cov` is expected to be an
alias for the `cov` subcommand and provide most of the postprocessing
features for coverage-related purposes. But, the tool will likely be
introduced after this change. Therefore, we propose to temporarily print
a list of the regions in each function with their associated coverage
status (i.e., `COVERED` or `UNCOVERED`).

### Source-based code coverage: An example

The main advantage of source-based coverage results is their precision
with respect to the source code. The [Source-based Code
Coverage](https://clang.llvm.org/docs/SourceBasedCodeCoverage.html)
documentation explains more details about the LLVM coverage workflow and
its different options.

For example, let's take this Rust code:
```rust
1 fn _other_function() {
2    println!("Hello, world!");
3 }
4
5 fn test_cov(val: u32) -> bool {
6     if val < 3 || val == 42 {
7         true
8     } else {
9         false
10    }
11 }
12
13 #[cfg_attr(kani, kani::proof)]
14 fn main() {
15    let test1 = test_cov(1);
16    let test2 = test_cov(2);
17    assert!(test1);
18    assert!(test2);
19 }
```

Compiling and running the program with `rustc` and the `-C
instrument-coverage` flag, and using the LLVM tools can get us the
following coverage result:


![Image](https://github.com/model-checking/kani/assets/73246657/9070e390-6e0b-4add-828d-d9f9caacad07)


In contrast, the `cargo kani --coverage -Zsource-coverage` command
currently generates:

```
src/main.rs (main)
 * 14:1 - 19:2 COVERED

src/main.rs (test_cov)
 * 5:1 - 6:15 COVERED
 * 6:19 - 6:28 UNCOVERED
 * 7:9 - 7:13 COVERED
 * 9:9 - 9:14 UNCOVERED
 * 11:1 - 11:2 COVERED
```

which is a verification-based coverage result almost equivalent to the
runtime coverage results.

### Benchmarking

We have evaluated the performance impact of the instrumentation using
the `kani-perf.sh` suite (14 benchmarks). For each test, we compare the
average time to run standard verification against the average time to
run verification with the source-based code coverage feature
enabled[^note-line-evaluation].

The evaluation has been performed on an EC2 `m5a.4xlarge` instance
running Ubuntu 22.04. The experimental data has been obtained by running
the `kani-perf.sh` script 10 times for each version (`only verification`
and `verification + coverage`), computing the average and standard
deviation. We've split this data into `small` (tests taking 60s or less)
and `large` (tests taking more than 60s) and drawn the two graphs below.

#### Performance comparison - `small` benchmarks


![performance_comparison_small](https://github.com/user-attachments/assets/679cf412-0193-4b0c-a78c-2d0fb702706f)

#### Performance comparison - `large` benchmarks


![performance_comparison_large](https://github.com/user-attachments/assets/4bb5a895-7f57-49e0-86b5-5fea67fad939)

#### Comments on performance

Looking at the small tests, the performance impact seems negligible in
such cases. The difference is more noticeable in the large tests, where
the time to run verification and coverage can take 2x or even more. It
wouldn't be surprising that, as programs become larger, the complexity
of the coverage checking grows exponentially as well. However, since
most verification jobs don't take longer than 30min (1800s), it's OK to
say that coverage checking represents a 100-200% slowdown in the worst
case w.r.t. standard verification.

It's also worth noting a few other things:
* The standard deviation remains similar in most cases, meaning that the
coverage feature doesn't have an impact on their stability.
* We haven't tried any SAT solvers other than the ones used by default
for each benchmark. It's possible that other solvers perform
better/worse with the coverage feature enabled.

### Call-outs
 * The soundness issue documented in #3441.
* The issue with saving coverage mappings for non-reachable functions
documented in #3445.
* I've modified the test cases in `tests/coverage/` to test this
feature. Since this technique is simpler, we don't need that many test
cases. However, it's possible I've left some test cases which don't
contribute much. Please let me know if you want to add/remove a test
case.

[^note-internal]: The coverage mappings can't be accessed through the
StableMIR interface so we retrieve them through the internal API.

[^note-instrument]: The instrumentation replaces certain counters with
expressions based on other counters when possible to avoid a part of the
runtime overhead. More details can be found
[here](https://github.com/rust-lang/rustc-dev-guide/blob/master/src/llvm-coverage-instrumentation.md#mir-pass-instrumentcoverage).
Unfortunately, we can't avoid instrumenting expressions at the moment.

[^note-line-evaluation]: We have not compared performance against the
line-based code coverage feature because it doesn't seem worth it. The
line-based coverage feature is guaranteed to include more coverage
checks than the source-based one for any function. In addition,
source-based results are more precise than line-based ones. So this
change represents both a quantitative and qualitative improvement.

By submitting this pull request, I confirm that my contribution is made
under the terms of the Apache 2.0 and MIT licenses.
tools/kani-cov/src/merge.rs Outdated Show resolved Hide resolved
tools/kani-cov/src/coverage.rs Outdated Show resolved Hide resolved
tools/kani-cov/src/coverage.rs Show resolved Hide resolved
Copy link
Contributor

@feliperodri feliperodri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing all my comments. I have one last before approving it. Why the issue #3543 is classified as a soundness issue now? As far as I understood, the coverage report feature will crash in the cases you mentioned in the issue. Will it report false coverage reports in any case? This is important to fix before stabilization, but it seems to me unnecessary to classify this as a soundness issue.

Copy link
Contributor

@carolynzech carolynzech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend making the type aliases public and incorporating them throughout merge.rs and summary.rs as well, but not a blocker for this PR.
Excited to use this feature!

tools/kani-cov/src/coverage.rs Show resolved Hide resolved
tools/kani-cov/src/coverage.rs Outdated Show resolved Hide resolved
tools/kani-cov/src/coverage.rs Outdated Show resolved Hide resolved
tools/kani-cov/src/coverage.rs Outdated Show resolved Hide resolved
tools/kani-cov/src/coverage.rs Outdated Show resolved Hide resolved
tools/kani-cov/src/coverage.rs Outdated Show resolved Hide resolved
tools/kani-cov/src/summary.rs Outdated Show resolved Hide resolved
@adpaco-aws
Copy link
Contributor Author

adpaco-aws commented Oct 7, 2024

Why the issue #3543 is classified as a soundness issue now? As far as I understood, the coverage report feature will crash in the cases you mentioned in the issue. Will it report false coverage reports in any case? This is important to fix before stabilization, but it seems to me unnecessary to classify this as a soundness issue.

The reason #3543 was classified as a soundness issue is because we were filtering out the results which could lead to a crash in kani-cov. Filtering out these results can be considered a soundness issue because, even if they're spurious failures coming from the Rust coverage instrumentation, we could be hiding other issues. It all depends on how we define soundness for coverage and the origin of these coverage artifacts.

Regardless, I've changed how we handle the checks with these out-of-bounds regions to replicate the spurious failure instead. Since we only need to extend the line by one space to represent the check in such cases, we just do that now. In other words, we are now reporting the same spurious failures, which may be more confusing for users but avoids soundness concerns. For example, for the test in tests/coverage/assert we get this report:

/home/ubuntu/kani/tests/coverage/assert/test.rs
   1|     | // Copyright Kani Contributors
   2|     | // SPDX-License-Identifier: Apache-2.0 OR MIT
   3|     | 
   4|     | #[kani::proof]
   5|    1| fn foo() {
   6|    1|     let x: i32 = kani::any();
   7|    1|     if x > 5 {
   8|     |         // fails
   9|    1|         assert!(x < 4);
  10|    1|         if x < 3 ```{'''
  11|    0| ```            // unreachable'''
  12|    0| ```            assert!(x == 2);'''
  13|    0| ```        }'''``` '''
  14|    1|     } else {
  15|    1|         // passes
  16|    1|         assert!(x <= 5);
  17|    1|     }
  18|     | }

Note the single region represented at the end of line 13? In the original source code, that space doesn't exist. But we can create that space in this representation and report the same result that the instrumentation gives us. It's less noticeable in the terminal output, but we also "draw it" there:

Screenshot 2024-10-07 at 7 06 22 PM

In addition to this change, I've added some more documentation to the code handling this special case and updated #3543 (labeled as "spurious failure" now). Please don't hesitate to ask for more clarifications if you need them. @carolynzech I've also implemented your suggestions to use the type aliases in other modules of kani-cov. Thanks!

@adpaco-aws adpaco-aws added this pull request to the merge queue Oct 10, 2024
Merged via the queue into model-checking:main with commit 53d9a52 Oct 10, 2024
27 checks passed
@adpaco-aws adpaco-aws deleted the kanicov-tool branch October 10, 2024 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[E] User Experience An UX enhancement for an existing feature. Including deprecation of an existing one. Z-BenchCI Tag a PR to run benchmark CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants