-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the bootloader hanging during uefi exit_boot_services #457
base: main
Are you sure you want to change the base?
Conversation
Tests will now fail with a timeout error after 2 minutes. That should bee more than enough to run any test.
This also switches to the new API of the uefi crate that was introduced in version 0.31. The old api was deprecated in 0.32 and will be removed in 0.33. See https://github.com/rust-osdev/uefi-rs/blob/ac19526656953c32e8e0ef7bc703f7700b151ae4/docs/funcs_migration.md and rust-osdev/uefi-rs#893
Switch from using the "-bios" flag to using "-pflash" instead. Using "-bios" is not recomended see: https://lists.katacontainers.io/pipermail/kata-dev/2021-January/001650.html This also allows us to easily update the ovmf-prebuild binaries. This fixes rust-osdev#400. I'm not sure whether the issue was an outdated ovmf version or the bios-pflash change. I can't tell what version is published with the ovmf-prebuild crate, so I can't check. Ovmf binaries are no longer provided by the ovmf-prebuild crate. Instead we now download them from the ovmf-prebuild github releases of the same crate. Unlike the crate, those are actually updated. Fixes rust-osdev#400
The I am 99% certain that this fixes the issue in #400. However I want to implement the same fix in my own kernel to make sure. I also still need to update https://github.com/rust-osdev/bootloader/blob/main/docs/create-disk-image.md |
Ok, I can confirm that using pflash works on my machine. I don't really understand why the CI tests fails. Maybe someone else can try this fix on their machine and give me feedback. The CI log is not really helpful. |
I wonder if we should provide most of the code in |
tests/runner/src/lib.rs
Outdated
@@ -105,6 +108,8 @@ where | |||
"-display", | |||
"none", | |||
"--no-reboot", | |||
"-smp", | |||
"8", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blog os at least doesn't support more than 1 core.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why that matters here. This is just for the tests in this repo and has no impact on any kernel build with this bootloader. Also correct me if I am wrong but Blog Os should work fine on a multicore system. It will use just 1, but the other cores don't do anything unless specifically started by the kernel.
tests/runner/src/lib.rs
Outdated
.kill() | ||
.expect("Qemu could not be killed after timeout"); | ||
panic!("Test timed out after 2 minutes"); | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rust's default test runner never times out. At worst it shows a warning after 60s. A timeout means that if you suspend the testing for longer than 120s using ctrl-z, you get an unnnecessary test failure. Also depending on how complex the tests of your OS are you may genuinely need more than 120s on low-end hardware. On local systems you can always kill the test manually and on CI there should be a timeout from the CI provider anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind removing this. I don't think the crtl-z case is common, but maybe that just comes down to preference. I never use that myself.
I mostly added this so I could easily run the tests locally in the background and get notified about failures.
This bug is random in nature so I ran the tests locally in a loop so I needed some kind of failure state.
As for CI the timeout is 30minutes and that seems excessive. Especially when like here the CPU is maxed out by the hanging test.
Also depending on how complex the tests of your OS are you may genuinely need more than 120s on low-end hardware.
That might actually explain the failing CI tests. So maybe I should remove this.
sha2 = "0.10.8" | ||
tar = "0.4.41" | ||
lzma-rs = "0.3.0" | ||
tempfile = "3.12.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this downloading logic should be added to the ovmf-prebuilt repo instead. There is already a stub crate for this in that repo anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Maybe that should be part of a separate PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI this is available now: https://crates.io/crates/ovmf-prebuilt/0.2.1
|
||
// Make a copy of the OVMF vars file so that it can be used | ||
// read+write without modifying the original. Under AArch64, some | ||
// versions of OVMF won't boot if the vars file isn't writeable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe reading the file and writing it to a new file would work to avoid an explicit permission change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean. I am creating a copy of the original file. Or are you referring to this?
// Necessary, as for example on NixOS, the files are read-only inside
// the Nix store.
#[cfg(target_os = "linux")]
fs_err::set_permissions(&ovmf_vars, Permissions::from_mode(0o666))?;
I copied that from the uefi crates tests. I did not really look into this too much. I assume they had some issues that meant they needed to add this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use fs::copy you make a copy of the file including pernissions and xattrs. It seems that here a copy of just the file contents ignoring file permissions would be a better option. That can be done by reading the file and writing it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced that would be any cleaner though. A simple call to set_permissions
is cleaner in my opinion.
Also this way we get some possible advantages from COW file systems, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the very least make it #[cfg(unix)]
then.
use {std::fs::Permissions, std::os::unix::fs::PermissionsExt}; | ||
|
||
/// Name of the ovmf-prebuilt release tag. | ||
const OVMF_PREBUILT_TAG: &str = "edk2-stable202311-r2"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty outdated version, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It's about a year old. I chose this version as it is the one the uefi crate uses in their tests.
We can probably update this. It just wasn't a priority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just tested the latest tarball version(edk2-stable202405-r1
) released to https://github.com/rust-osdev/ovmf-prebuilt.
It breaks the uefi pxe tests. I don't really know enough about pxe to debug this, but it looks like it does not find the bootloader and tries to start a uefi terminal.
Do you know how I can get the actual logs from qemu in the CI ? Right now the tests time after 30 minutes but I can't reproduce this locally. |
|
||
[[package]] | ||
name = "windows-targets" | ||
version = "0.52.6" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now pulling in an extra copy of windows-targets. Maybe update the async-process dependency of bootloader to 2.3.0, which uses windows-targets 0.52.6 too?
Fixes #400
Switch from using the "-bios" flag to using "-pflash" instead.
Using "-bios" is not recomended see:
https://lists.katacontainers.io/pipermail/kata-dev/2021-January/001650.html
This also allows us to easily update the ovmf-prebuild binaries.
This fixes #400. I'm not sure whether the issue was an outdated ovmf
version or the bios-pflash change. I can't tell what version is
published with the ovmf-prebuild crate, so I can't check.
Ovmf binaries are no longer provided by the ovmf-prebuild crate.
Instead we now download them from the ovmf-prebuild github releases of
the same crate. Unlike the crate, those are actually updated.