-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable arm coverage test for kvm-ioctl #87
Conversation
@andreeaflorescu I think the GIC related stuff (of the CI machine, or the current code) probably lead to this coverage-arm failed? (it works well on my local machine (⊙ˍ⊙) and also weird that it is a test case fail instead of the coverage related fail...) I will give it a deeper investigation now. As the output shows:
|
The critical error information is: Besides creating a device, the test case includes some additional steps, including setting attribute, that is where the error was triggered. Usually we expect error 25 But now the error is 22 - "Invalid argument". That's interesting, considering the same test case was OK in another arm CI job. I tried to run Would anybody like to give some configuration info of the ARM CI machine (or maybe it is a VM) ? |
Both the unit tests and the kcov test run on the same machine. The tests are running on a a1.metal instance on EC2 with Ubuntu 18.04.3 LTS (GNU/Linux 4.15.0-1045-aws aarch64). Since they're running on the same machine, both the unit tests and the kcov tests should fail in the same way. This doesn't seem to happen, so I am suspecting it is something related to how the kcov tests are run. I will try to reproduce this on the machine where the CI is running, and I'll also double check the test. |
I just tried it on the CI machine and it fails as in this PR. It would be interesting to find out when does that ioctl return EINVAL and when does it return ENOTTY. I guess this is the only way we can find out why is the behavior different when running with kcov. I suspected it might be pytest for some reason, but running the unit tests with pytest works. So just when using kcov it fails. |
Agree. I will check some source code for the possible scenario when BTW, as I remember, kcov works in this way: It inserts (saying "insert" is not precise, but we can understand in this way) break-points into every instructions of a binary. If a BP is triggered, it means this line of code is covered. When the program finish, all the BPs that were not triggered indicated the uncovered code. So, basically I think kcov shouldn't change the logic of the binary, although it definitely slow things down. |
Yap, I agree. Kcov shouldn't change the logic of the binary, but it practice it seems that it does. So there must be some incorrect assumption somewhere. In any case, I wouldn't invest too much time into this issue because that test is just covering the error cases with a test device which in practice is not going to be used anyway. |
I have a theory to connect kcov to the test case failure: we call the ioctl to set device attribute here: https://github.com/rust-vmm/kvm-ioctls/blob/master/src/ioctls/device.rs#L142, and check the error in the next line of code. Normally there is no problem, because the global variable That's just my wild guess. I tried to verify it by "Can't set me as ptraced: Operation not permitted". |
Update: We (together with @MrXinWang) did some more hack on the test case problem. We tried different ARM servers, reproduced the problem in only one of them in this way:
Note, if I run We did some extra check (though not quite necessary): We tried So, in this binary, seemingly the ioctl DO returned EINVAL, instead of ENOTTY. It may be worthy to examine how is the binary made. |
Add some little details on @michael2012z's comment: Personally I hardly think this is a kcov derived problem, as we tried to run this case without kcov as well as the coverage test in this way (but I am not that sure as kcov and strace are both based on ptrace): Add some print logs for identifying the fd in the failing case println!("-----DEVICE_FD={:?}-----", device_fd);
println!("-----DEVICE_FD={:?}-----", device_fd);
println!("-----DEVICE_FD={:?}-----", device_fd);
println!("-----RAW_FD={:?}-----", raw_fd);
println!("-----RAW_FD={:?}-----", raw_fd);
println!("-----RAW_FD={:?}-----", raw_fd);
// We are just creating a test device. Creating a real device would make the CI dependent
// on host configuration (like having /dev/vfio). We expect this to fail.
assert!(device_fd.set_device_attr(&dist_attr).is_err());
assert_eq!(errno::Error::last().errno(), 25); Inside the # Build the test case binary, binary will be located in ./tmp
CARGO_TARGET_DIR=tmp cargo test --no-run
# Using strace to track the ioctls
strace -f -e trace=ioctl ./tmp/debug/kvm_ioctls-542fb5c9b91f8e97 Key outputs (Full log is attached as Full log.txt):
Note that this can only be reproduced in one of our aarch64 server, both bare metal and in container, and this server is pretty special because it is the only one that implements the armv8 32bit compatibility, i.e. it can execute the 32bit binary directly under 64bit Ubuntu OS. And also, if directly excute this binary, such failing test case will magically pass. I think this would explain why the unit test passed while the coverage test failed. Refer to similar issues in <1> and <2>, I would say it is probably caused by language or the machine itself? (I know little about the CI machine so please forgive me if I am wrong...) |
While I do think it's an interesting problem, I would prefer to find a workaround for this problem, log an issue with the findings so far and unblock the work on this PR. What do you think? |
@andreeaflorescu Well the easiest workaround would be comment the last assertion: |
I tried this for firecracker and the tests pass. I also made them pass for kvm-ioctls using same command as for firecracker. See results below: I removed from the command |
Good finding on musl/gnu! Indeed, the kcov test passed with musl. I did one more test with a temp commit (removed now) to confirm whether it works with rust-vmm CI framework. Coverage cases passed with some update in test_coverage.py like this: . It looks like The result is seen here: https://buildkite.com/rust-vmm/kvm-ioctls-ci/builds/438#c78914f4-7428-49de-adc8-6f2760035155 The root cause why gnu doesn't work is still not known, but now we have yet another workaround. :) |
Yes, I had to do hardcode RUSTFLAGS too. Could not find anything on why that happens though :( |
Probably the RUSTFLAGS is set by cargo-kcov/main.rs#L315 when build the test binary? |
For rust-vmm crates we use the default Rust target (gnu). This is why kcov is run with the gnu target. I prefer to keep the testing the same for all crates, and because of that I would say to not change the way we run kcov only for this crate. Theoretically both gnu & musl targets are tested because we run the unit tests with both. It is really interesting that this test only fails with kcov+gnu and it works with kcov+musl. If people are okay with this, I would just delete this line from the test Investigation of why this actually fail can continue on a separate thread to unblock this PR. Thoughts? |
Good idea. This is a good way to move on. |
As kcov supports arm64 now, this commit will enable the arm coverage test in the CI of kvm-ioctl repository. This commit contains a workaround to avoid `test_create_device` failure caused by ioctl returning `EINVAL` instead of `ENOTTY` using gnu toolchain. Signed-off-by: Henry Wang <[email protected]>
Hi @andreeaflorescu, followed by your comment I updated the code and created an issue #89 to track this bug. Could you please review this PR again? Thanks! Also the workaround works btw... |
As kcov supports arm64 now and also refer to issue #9 , this PR will enable the arm coverage
test in the CI of
kvm-ioctl
repository by uploading a coverage file for aarch64 as well as update therust-vmm-ci
submodule.Signed-off-by: Henry Wang <[email protected]>