Enable arm coverage test for kvm-ioctl #87

MrXinWang · 2020-02-22T08:17:55Z

As kcov supports arm64 now and also refer to issue #9 , this PR will enable the arm coverage
test in the CI of kvm-ioctl repository by uploading a coverage file for aarch64 as well as update the rust-vmm-ci submodule.

Signed-off-by: Henry Wang <[email protected]>

MrXinWang · 2020-02-22T08:30:07Z

@andreeaflorescu I think the GIC related stuff (of the CI machine, or the current code) probably lead to this coverage-arm failed? (it works well on my local machine (⊙ˍ⊙) and also weird that it is a test case fail instead of the coverage related fail...) I will give it a deeper investigation now.

As the output shows:

test ioctls::device::tests::test_create_device ... FAILED
...
thread 'ioctls::device::tests::test_create_device' panicked at 'assertion failed: `(left == right)`

michael2012z · 2020-02-24T04:58:58Z

The critical error information is:
thread 'ioctls::device::tests::test_create_device' panicked at 'assertion failed: (left == right)
left: 22, right: 25', src/ioctls/device.rs:143:9

Besides creating a device, the test case includes some additional steps, including setting attribute, that is where the error was triggered. Usually we expect error 25 ENOTTY which means (reference to: http://man7.org/linux/man-pages/man2/ioctl.2.html):
fd is not associated with a character special device.
or
The specified request does not apply to the kind of object that the file descriptor fd references.
Because we didn't really create a device.

But now the error is 22 - "Invalid argument". That's interesting, considering the same test case was OK in another arm CI job.

I tried to run pytest rust-vmm-ci/integration_tests/test_coverage.py in latest rustvmm/dev:v5 container on my local machines: one is with kernel 4.16 and GICv3, the other is with kernel 5.3 and GICv2 (a RPi 4B). All tests on both machines passed well.

Would anybody like to give some configuration info of the ARM CI machine (or maybe it is a VM) ?

andreeaflorescu · 2020-02-24T09:02:13Z

Both the unit tests and the kcov test run on the same machine. The tests are running on a a1.metal instance on EC2 with Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-1045-aws aarch64).

Since they're running on the same machine, both the unit tests and the kcov tests should fail in the same way. This doesn't seem to happen, so I am suspecting it is something related to how the kcov tests are run. I will try to reproduce this on the machine where the CI is running, and I'll also double check the test.

andreeaflorescu · 2020-02-24T09:33:26Z

I just tried it on the CI machine and it fails as in this PR. It would be interesting to find out when does that ioctl return EINVAL and when does it return ENOTTY. I guess this is the only way we can find out why is the behavior different when running with kcov. I suspected it might be pytest for some reason, but running the unit tests with pytest works. So just when using kcov it fails.

michael2012z · 2020-02-24T09:59:37Z

Agree. I will check some source code for the possible scenario when EINVAL is returned.

BTW, as I remember, kcov works in this way: It inserts (saying "insert" is not precise, but we can understand in this way) break-points into every instructions of a binary. If a BP is triggered, it means this line of code is covered. When the program finish, all the BPs that were not triggered indicated the uncovered code.

So, basically I think kcov shouldn't change the logic of the binary, although it definitely slow things down.

andreeaflorescu · 2020-02-24T10:19:40Z

So, basically I think kcov shouldn't change the logic of the binary, although it definitely slow things down.

Yap, I agree. Kcov shouldn't change the logic of the binary, but it practice it seems that it does. So there must be some incorrect assumption somewhere. In any case, I wouldn't invest too much time into this issue because that test is just covering the error cases with a test device which in practice is not going to be used anyway.

michael2012z · 2020-02-25T07:41:36Z

I have a theory to connect kcov to the test case failure: we call the ioctl to set device attribute here: https://github.com/rust-vmm/kvm-ioctls/blob/master/src/ioctls/device.rs#L142, and check the error in the next line of code. Normally there is no problem, because the global variable error was not changed. But with kcov, more happens between every 2 lines of code. (break point, etc.) Maybe something in kcov changed the global error variable immediately after executing the ioctl, so when we check the error code, it is no longer the result of the ioctl.

That's just my wild guess. I tried to verify it by strace command, like strace -f -e trace=ioctl ..., which prints the result of all ioctl's. But unfortunately it doesn't work with kcov, an error was seen:

"Can't set me as ptraced: Operation not permitted".

michael2012z · 2020-02-27T07:41:56Z

Update:

We (together with @MrXinWang) did some more hack on the test case problem. We tried different ARM servers, reproduced the problem in only one of them in this way:

Start a rust-vmm/dev:v5 container
Run pytest rust-vmm-ci/integration_tests/test_coverage.py. The error can be seen.
A binary "/kcov_build/debug/kvm_ioctls-542fb5c9b91f8e97" was generated in the last step. Run the binary can also trigger the error.

Note, if I run cargo test, there was no error.

We did some extra check (though not quite necessary): We tried strace to monitor the syscalls. The result looks like:
_[pid 934] ioctl(5, _IOC(IOC_WRITE, 0xae, 0xe1, 0x18), 0xffffb48f7ef8) = -1 EINVAL (Invalid argument)
test ioctls::device::tests::test_create_device ... FAILED

So, in this binary, seemingly the ioctl DO returned EINVAL, instead of ENOTTY.

It may be worthy to examine how is the binary made.

MrXinWang · 2020-03-02T03:40:48Z

Add some little details on @michael2012z's comment:

Personally I hardly think this is a kcov derived problem, as we tried to run this case without kcov as well as the coverage test in this way (but I am not that sure as kcov and strace are both based on ptrace):

Add some print logs for identifying the fd in the failing case

        println!("-----DEVICE_FD={:?}-----", device_fd);
        println!("-----DEVICE_FD={:?}-----", device_fd);
        println!("-----DEVICE_FD={:?}-----", device_fd);


        println!("-----RAW_FD={:?}-----", raw_fd);
        println!("-----RAW_FD={:?}-----", raw_fd);
        println!("-----RAW_FD={:?}-----", raw_fd);
        // We are just creating a test device. Creating a real device would make the CI dependent
        // on host configuration (like having /dev/vfio). We expect this to fail.
        assert!(device_fd.set_device_attr(&dist_attr).is_err());
        assert_eq!(errno::Error::last().errno(), 25);

Inside the rust-vmm/dev:v5 container

# Build the test case binary, binary will be located in ./tmp
CARGO_TARGET_DIR=tmp cargo test --no-run

# Using strace to track the ioctls
strace -f -e trace=ioctl ./tmp/debug/kvm_ioctls-542fb5c9b91f8e97

Key outputs (Full log is attached as Full log.txt):

[pid   378] ioctl(16, _IOC(_IOC_WRITE, 0xae, 0xe1, 0x18), 0xffff7f768f18) = -1 EINVAL (Invalid argument)
...
...
test ioctls::device::tests::test_create_device ... FAILED

failures:
---- ioctls::device::tests::test_create_device stdout ----
-----DEVICE_FD=DeviceFd { fd: File { fd: 16, path: "/dev/kvm", read: true, write: true } }-----
-----DEVICE_FD=DeviceFd { fd: File { fd: 16, path: "/dev/kvm", read: true, write: true } }-----
-----DEVICE_FD=DeviceFd { fd: File { fd: 16, path: "/dev/kvm", read: true, write: true } }-----
-----RAW_FD=16-----
-----RAW_FD=16-----
-----RAW_FD=16-----
thread 'ioctls::device::tests::test_create_device' panicked at 'assertion failed: `(left == right)`
  left: `22`,
 right: `25`', src/ioctls/device.rs:151:9

Note that this can only be reproduced in one of our aarch64 server, both bare metal and in container, and this server is pretty special because it is the only one that implements the armv8 32bit compatibility, i.e. it can execute the 32bit binary directly under 64bit Ubuntu OS.

And also, if directly excute this binary, such failing test case will magically pass. I think this would explain why the unit test passed while the coverage test failed.

Refer to similar issues in <1> and <2>, I would say it is probably caused by language or the machine itself? (I know little about the CI machine so please forgive me if I am wrong...)

andreeaflorescu · 2020-03-02T09:02:29Z

While I do think it's an interesting problem, I would prefer to find a workaround for this problem, log an issue with the findings so far and unblock the work on this PR. What do you think?

MrXinWang · 2020-03-02T09:27:49Z

While I do think it's an interesting problem, I would prefer to find a workaround for this problem, log an issue with the findings so far and unblock the work on this PR. What do you think?

@andreeaflorescu Well the easiest workaround would be comment the last assertion:
assert_eq!(errno::Error::last().errno(), 25); or change the assertion to accept both EINVAL and ENOTTY, but I wonder if the community would accept this workaround? It seems really ugly...

dianpopa · 2020-03-02T16:48:30Z

I tried this for firecracker and the tests pass. I also made them pass for kvm-ioctls using same command as for firecracker. See results below:
cov-output.zip

I removed from the command --target aarch64-unknown-linux-musl and the test_create_device failed.
I see the coverage test is run for gnu. Is there a strong reason behind that?

michael2012z · 2020-03-03T05:35:43Z

Good finding on musl/gnu!

Indeed, the kcov test passed with musl. I did one more test with a temp commit (removed now) to confirm whether it works with rust-vmm CI framework. Coverage cases passed with some update in test_coverage.py like this: . It looks like cargo kcov didn't take the rustflags defined in .cargo/config, so I had to hard-code the RUSTFLAGS in the command.

The result is seen here: https://buildkite.com/rust-vmm/kvm-ioctls-ci/builds/438#c78914f4-7428-49de-adc8-6f2760035155

The root cause why gnu doesn't work is still not known, but now we have yet another workaround. :)

dianpopa · 2020-03-03T09:53:36Z

Good finding on musl/gnu!

Indeed, the kcov test passed with musl. I did one more test with a temp commit (removed now) to confirm whether it works with rust-vmm CI framework. Coverage cases passed with some update in test_coverage.py like this: . It looks like cargo kcov didn't take the rustflags defined in .cargo/config, so I had to hard-code the RUSTFLAGS in the command.

Yes, I had to do hardcode RUSTFLAGS too. Could not find anything on why that happens though :(

MrXinWang · 2020-03-03T10:27:25Z

It looks like cargo kcov didn't take the rustflags defined in .cargo/config, so I had to hard-code the RUSTFLAGS in the command.

I had to do hardcode RUSTFLAGS too. Could not find anything on why that happens though :(

Probably the RUSTFLAGS is set by cargo-kcov/main.rs#L315 when build the test binary?

andreeaflorescu · 2020-03-03T11:01:17Z

For rust-vmm crates we use the default Rust target (gnu). This is why kcov is run with the gnu target.

I prefer to keep the testing the same for all crates, and because of that I would say to not change the way we run kcov only for this crate. Theoretically both gnu & musl targets are tested because we run the unit tests with both. It is really interesting that this test only fails with kcov+gnu and it works with kcov+musl.

If people are okay with this, I would just delete this line from the test assert_eq(errno::Error::last().errno(), 25); because as I said in my previous comment, this is just testing the error case that can not be reproduced in a real case scenario.

Investigation of why this actually fail can continue on a separate thread to unblock this PR. Thoughts?

michael2012z · 2020-03-03T11:50:43Z

If people are okay with this, I would just delete this line from the test assert_eq(errno::Error::last().errno(), 25); because as I said in my previous comment, this is just testing the error case that can not be reproduced in a real case scenario.

Good idea. This is a good way to move on.

As kcov supports arm64 now, this commit will enable the arm coverage test in the CI of kvm-ioctl repository. This commit contains a workaround to avoid `test_create_device` failure caused by ioctl returning `EINVAL` instead of `ENOTTY` using gnu toolchain. Signed-off-by: Henry Wang <[email protected]>

MrXinWang · 2020-03-04T01:31:07Z

Hi @andreeaflorescu, followed by your comment I updated the code and created an issue #89 to track this bug. Could you please review this PR again? Thanks!

Also the workaround works btw...

MrXinWang requested review from acatangiu, alxiord, andreeaflorescu and sameo as code owners February 22, 2020 08:17

MrXinWang changed the title ~~Enable arm coverage test for kvm-ioctl~~ [WIP]: Enable arm coverage test for kvm-ioctl Feb 26, 2020

michael2012z mentioned this pull request Mar 3, 2020

Add the support for KVM_HAS_DEVICE_ATTR #88

Closed

MrXinWang changed the title ~~[WIP]: Enable arm coverage test for kvm-ioctl~~ Enable arm coverage test for kvm-ioctl Mar 4, 2020

MrXinWang mentioned this pull request Mar 4, 2020

ARM coverage test case test_create_device failure on some machines with gnu toolchain #89

Closed

andreeaflorescu approved these changes Mar 4, 2020

View reviewed changes

alxiord approved these changes Mar 5, 2020

View reviewed changes

alxiord merged commit 8303546 into rust-vmm:master Mar 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable arm coverage test for kvm-ioctl #87

Enable arm coverage test for kvm-ioctl #87

MrXinWang commented Feb 22, 2020 •

edited

Loading

MrXinWang commented Feb 22, 2020 •

edited

Loading

michael2012z commented Feb 24, 2020 •

edited

Loading

andreeaflorescu commented Feb 24, 2020

andreeaflorescu commented Feb 24, 2020

michael2012z commented Feb 24, 2020 •

edited

Loading

andreeaflorescu commented Feb 24, 2020

michael2012z commented Feb 25, 2020

michael2012z commented Feb 27, 2020 •

edited

Loading

MrXinWang commented Mar 2, 2020 •

edited

Loading

andreeaflorescu commented Mar 2, 2020

MrXinWang commented Mar 2, 2020

dianpopa commented Mar 2, 2020

michael2012z commented Mar 3, 2020

dianpopa commented Mar 3, 2020

MrXinWang commented Mar 3, 2020

andreeaflorescu commented Mar 3, 2020

michael2012z commented Mar 3, 2020

MrXinWang commented Mar 4, 2020

Enable arm coverage test for kvm-ioctl #87

Enable arm coverage test for kvm-ioctl #87

Conversation

MrXinWang commented Feb 22, 2020 • edited Loading

MrXinWang commented Feb 22, 2020 • edited Loading

michael2012z commented Feb 24, 2020 • edited Loading

andreeaflorescu commented Feb 24, 2020

andreeaflorescu commented Feb 24, 2020

michael2012z commented Feb 24, 2020 • edited Loading

andreeaflorescu commented Feb 24, 2020

michael2012z commented Feb 25, 2020

michael2012z commented Feb 27, 2020 • edited Loading

MrXinWang commented Mar 2, 2020 • edited Loading

andreeaflorescu commented Mar 2, 2020

MrXinWang commented Mar 2, 2020

dianpopa commented Mar 2, 2020

michael2012z commented Mar 3, 2020

dianpopa commented Mar 3, 2020

MrXinWang commented Mar 3, 2020

andreeaflorescu commented Mar 3, 2020

michael2012z commented Mar 3, 2020

MrXinWang commented Mar 4, 2020

MrXinWang commented Feb 22, 2020 •

edited

Loading

MrXinWang commented Feb 22, 2020 •

edited

Loading

michael2012z commented Feb 24, 2020 •

edited

Loading

michael2012z commented Feb 24, 2020 •

edited

Loading

michael2012z commented Feb 27, 2020 •

edited

Loading

MrXinWang commented Mar 2, 2020 •

edited

Loading