Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable arm coverage test for kvm-ioctl #87

Merged
merged 1 commit into from
Mar 5, 2020
Merged

Enable arm coverage test for kvm-ioctl #87

merged 1 commit into from
Mar 5, 2020

Conversation

MrXinWang
Copy link
Contributor

@MrXinWang MrXinWang commented Feb 22, 2020

As kcov supports arm64 now and also refer to issue #9 , this PR will enable the arm coverage
test in the CI of kvm-ioctl repository by uploading a coverage file for aarch64 as well as update the rust-vmm-ci submodule.

Signed-off-by: Henry Wang <[email protected]>

@MrXinWang
Copy link
Contributor Author

MrXinWang commented Feb 22, 2020

@andreeaflorescu I think the GIC related stuff (of the CI machine, or the current code) probably lead to this coverage-arm failed? (it works well on my local machine (⊙ˍ⊙) and also weird that it is a test case fail instead of the coverage related fail...) I will give it a deeper investigation now.

As the output shows:

test ioctls::device::tests::test_create_device ... FAILED
...
thread 'ioctls::device::tests::test_create_device' panicked at 'assertion failed: `(left == right)`

@michael2012z
Copy link
Contributor

michael2012z commented Feb 24, 2020

The critical error information is:
thread 'ioctls::device::tests::test_create_device' panicked at 'assertion failed: (left == right)
left: 22, right: 25', src/ioctls/device.rs:143:9

Besides creating a device, the test case includes some additional steps, including setting attribute, that is where the error was triggered. Usually we expect error 25 ENOTTY which means (reference to: http://man7.org/linux/man-pages/man2/ioctl.2.html):
fd is not associated with a character special device.
or
The specified request does not apply to the kind of object that the file descriptor fd references.

Because we didn't really create a device.

But now the error is 22 - "Invalid argument". That's interesting, considering the same test case was OK in another arm CI job.

I tried to run pytest rust-vmm-ci/integration_tests/test_coverage.py in latest rustvmm/dev:v5 container on my local machines: one is with kernel 4.16 and GICv3, the other is with kernel 5.3 and GICv2 (a RPi 4B). All tests on both machines passed well.

Would anybody like to give some configuration info of the ARM CI machine (or maybe it is a VM) ?

@andreeaflorescu
Copy link
Member

Both the unit tests and the kcov test run on the same machine. The tests are running on a a1.metal instance on EC2 with Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-1045-aws aarch64).

Since they're running on the same machine, both the unit tests and the kcov tests should fail in the same way. This doesn't seem to happen, so I am suspecting it is something related to how the kcov tests are run. I will try to reproduce this on the machine where the CI is running, and I'll also double check the test.

@andreeaflorescu
Copy link
Member

I just tried it on the CI machine and it fails as in this PR. It would be interesting to find out when does that ioctl return EINVAL and when does it return ENOTTY. I guess this is the only way we can find out why is the behavior different when running with kcov. I suspected it might be pytest for some reason, but running the unit tests with pytest works. So just when using kcov it fails.

@michael2012z
Copy link
Contributor

michael2012z commented Feb 24, 2020

Agree. I will check some source code for the possible scenario when EINVAL is returned.

BTW, as I remember, kcov works in this way: It inserts (saying "insert" is not precise, but we can understand in this way) break-points into every instructions of a binary. If a BP is triggered, it means this line of code is covered. When the program finish, all the BPs that were not triggered indicated the uncovered code.

So, basically I think kcov shouldn't change the logic of the binary, although it definitely slow things down.

@andreeaflorescu
Copy link
Member

So, basically I think kcov shouldn't change the logic of the binary, although it definitely slow things down.

Yap, I agree. Kcov shouldn't change the logic of the binary, but it practice it seems that it does. So there must be some incorrect assumption somewhere. In any case, I wouldn't invest too much time into this issue because that test is just covering the error cases with a test device which in practice is not going to be used anyway.

@michael2012z
Copy link
Contributor

I have a theory to connect kcov to the test case failure: we call the ioctl to set device attribute here: https://github.com/rust-vmm/kvm-ioctls/blob/master/src/ioctls/device.rs#L142, and check the error in the next line of code. Normally there is no problem, because the global variable error was not changed. But with kcov, more happens between every 2 lines of code. (break point, etc.) Maybe something in kcov changed the global error variable immediately after executing the ioctl, so when we check the error code, it is no longer the result of the ioctl.

That's just my wild guess. I tried to verify it by strace command, like strace -f -e trace=ioctl ..., which prints the result of all ioctl's. But unfortunately it doesn't work with kcov, an error was seen:

"Can't set me as ptraced: Operation not permitted".

@MrXinWang MrXinWang changed the title Enable arm coverage test for kvm-ioctl [WIP]: Enable arm coverage test for kvm-ioctl Feb 26, 2020
@michael2012z
Copy link
Contributor

michael2012z commented Feb 27, 2020

Update:

We (together with @MrXinWang) did some more hack on the test case problem. We tried different ARM servers, reproduced the problem in only one of them in this way:

  1. Start a rust-vmm/dev:v5 container
  2. Run pytest rust-vmm-ci/integration_tests/test_coverage.py. The error can be seen.
  3. A binary "/kcov_build/debug/kvm_ioctls-542fb5c9b91f8e97" was generated in the last step. Run the binary can also trigger the error.

Note, if I run cargo test, there was no error.

We did some extra check (though not quite necessary): We tried strace to monitor the syscalls. The result looks like:
_[pid 934] ioctl(5, _IOC(IOC_WRITE, 0xae, 0xe1, 0x18), 0xffffb48f7ef8) = -1 EINVAL (Invalid argument)
test ioctls::device::tests::test_create_device ... FAILED

So, in this binary, seemingly the ioctl DO returned EINVAL, instead of ENOTTY.

It may be worthy to examine how is the binary made.

@MrXinWang
Copy link
Contributor Author

MrXinWang commented Mar 2, 2020

Add some little details on @michael2012z's comment:

Personally I hardly think this is a kcov derived problem, as we tried to run this case without kcov as well as the coverage test in this way (but I am not that sure as kcov and strace are both based on ptrace):

Add some print logs for identifying the fd in the failing case

        println!("-----DEVICE_FD={:?}-----", device_fd);
        println!("-----DEVICE_FD={:?}-----", device_fd);
        println!("-----DEVICE_FD={:?}-----", device_fd);


        println!("-----RAW_FD={:?}-----", raw_fd);
        println!("-----RAW_FD={:?}-----", raw_fd);
        println!("-----RAW_FD={:?}-----", raw_fd);
        // We are just creating a test device. Creating a real device would make the CI dependent
        // on host configuration (like having /dev/vfio). We expect this to fail.
        assert!(device_fd.set_device_attr(&dist_attr).is_err());
        assert_eq!(errno::Error::last().errno(), 25);

Inside the rust-vmm/dev:v5 container

# Build the test case binary, binary will be located in ./tmp
CARGO_TARGET_DIR=tmp cargo test --no-run

# Using strace to track the ioctls
strace -f -e trace=ioctl ./tmp/debug/kvm_ioctls-542fb5c9b91f8e97

Key outputs (Full log is attached as Full log.txt):

[pid   378] ioctl(16, _IOC(_IOC_WRITE, 0xae, 0xe1, 0x18), 0xffff7f768f18) = -1 EINVAL (Invalid argument)
...
...
test ioctls::device::tests::test_create_device ... FAILED

failures:
---- ioctls::device::tests::test_create_device stdout ----
-----DEVICE_FD=DeviceFd { fd: File { fd: 16, path: "/dev/kvm", read: true, write: true } }-----
-----DEVICE_FD=DeviceFd { fd: File { fd: 16, path: "/dev/kvm", read: true, write: true } }-----
-----DEVICE_FD=DeviceFd { fd: File { fd: 16, path: "/dev/kvm", read: true, write: true } }-----
-----RAW_FD=16-----
-----RAW_FD=16-----
-----RAW_FD=16-----
thread 'ioctls::device::tests::test_create_device' panicked at 'assertion failed: `(left == right)`
  left: `22`,
 right: `25`', src/ioctls/device.rs:151:9

Note that this can only be reproduced in one of our aarch64 server, both bare metal and in container, and this server is pretty special because it is the only one that implements the armv8 32bit compatibility, i.e. it can execute the 32bit binary directly under 64bit Ubuntu OS.

And also, if directly excute this binary, such failing test case will magically pass. I think this would explain why the unit test passed while the coverage test failed.

Refer to similar issues in <1> and <2>, I would say it is probably caused by language or the machine itself? (I know little about the CI machine so please forgive me if I am wrong...)

@andreeaflorescu
Copy link
Member

While I do think it's an interesting problem, I would prefer to find a workaround for this problem, log an issue with the findings so far and unblock the work on this PR. What do you think?

@MrXinWang
Copy link
Contributor Author

While I do think it's an interesting problem, I would prefer to find a workaround for this problem, log an issue with the findings so far and unblock the work on this PR. What do you think?

@andreeaflorescu Well the easiest workaround would be comment the last assertion:
assert_eq!(errno::Error::last().errno(), 25); or change the assertion to accept both EINVAL and ENOTTY, but I wonder if the community would accept this workaround? It seems really ugly...

@dianpopa
Copy link
Contributor

dianpopa commented Mar 2, 2020

I tried this for firecracker and the tests pass. I also made them pass for kvm-ioctls using same command as for firecracker. See results below:
cov-output.zip

I removed from the command --target aarch64-unknown-linux-musl and the test_create_device failed.
I see the coverage test is run for gnu. Is there a strong reason behind that?

@michael2012z
Copy link
Contributor

Good finding on musl/gnu!

Indeed, the kcov test passed with musl. I did one more test with a temp commit (removed now) to confirm whether it works with rust-vmm CI framework. Coverage cases passed with some update in test_coverage.py like this: . It looks like cargo kcov didn't take the rustflags defined in .cargo/config, so I had to hard-code the RUSTFLAGS in the command.

The result is seen here: https://buildkite.com/rust-vmm/kvm-ioctls-ci/builds/438#c78914f4-7428-49de-adc8-6f2760035155

The root cause why gnu doesn't work is still not known, but now we have yet another workaround. :)

@dianpopa
Copy link
Contributor

dianpopa commented Mar 3, 2020

Good finding on musl/gnu!

Indeed, the kcov test passed with musl. I did one more test with a temp commit (removed now) to confirm whether it works with rust-vmm CI framework. Coverage cases passed with some update in test_coverage.py like this: . It looks like cargo kcov didn't take the rustflags defined in .cargo/config, so I had to hard-code the RUSTFLAGS in the command.

Yes, I had to do hardcode RUSTFLAGS too. Could not find anything on why that happens though :(

@MrXinWang
Copy link
Contributor Author

It looks like cargo kcov didn't take the rustflags defined in .cargo/config, so I had to hard-code the RUSTFLAGS in the command.

I had to do hardcode RUSTFLAGS too. Could not find anything on why that happens though :(

Probably the RUSTFLAGS is set by cargo-kcov/main.rs#L315 when build the test binary?

@andreeaflorescu
Copy link
Member

For rust-vmm crates we use the default Rust target (gnu). This is why kcov is run with the gnu target.

I prefer to keep the testing the same for all crates, and because of that I would say to not change the way we run kcov only for this crate. Theoretically both gnu & musl targets are tested because we run the unit tests with both. It is really interesting that this test only fails with kcov+gnu and it works with kcov+musl.

If people are okay with this, I would just delete this line from the test assert_eq(errno::Error::last().errno(), 25); because as I said in my previous comment, this is just testing the error case that can not be reproduced in a real case scenario.

Investigation of why this actually fail can continue on a separate thread to unblock this PR. Thoughts?

@michael2012z
Copy link
Contributor

If people are okay with this, I would just delete this line from the test assert_eq(errno::Error::last().errno(), 25); because as I said in my previous comment, this is just testing the error case that can not be reproduced in a real case scenario.

Good idea. This is a good way to move on.

As kcov supports arm64 now, this commit will enable the arm coverage
test in the CI of kvm-ioctl repository.

This commit contains a workaround to avoid `test_create_device`
failure caused by ioctl returning `EINVAL` instead of `ENOTTY` using
gnu toolchain.

Signed-off-by: Henry Wang <[email protected]>
@MrXinWang MrXinWang changed the title [WIP]: Enable arm coverage test for kvm-ioctl Enable arm coverage test for kvm-ioctl Mar 4, 2020
@MrXinWang
Copy link
Contributor Author

Hi @andreeaflorescu, followed by your comment I updated the code and created an issue #89 to track this bug. Could you please review this PR again? Thanks!

Also the workaround works btw...

@alxiord alxiord merged commit 8303546 into rust-vmm:master Mar 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants