Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip test_disk_usage if there's no /usr/bin/du #6479

Closed
wants to merge 1 commit into from

Conversation

hawkw
Copy link
Member

@hawkw hawkw commented Aug 29, 2024

The zone_bundle::disk_usage function in sled-agent executes du(1) to
measure disk usage for a path. Presently, the path to the du(1)
executable is hard-coded to /usr/bin/du. On some systems, such as
NixOS, the du(1) executable may not live at the path /usr/bin/du. We
could, potentially, change the disk_usage function to find a du(1)
binary on the PATH, or fall back to other locations...but, on Helios, it
is always at /usr/bin/du, so hardcoding it is a good solution for the
environment that sled-agents are actually intended to run in.

However, hardcoding the path means that this test will always fail on
systems where there is no /usr/bin/du. Therefore, this commit changes
the test to first check if /usr/bin/du exists, and return early if it
does not, printing a warning indicating that the test is skipped.

If, someday, Rust's libtest grows a mechanism for tests to mark
themselves as skipped at runtime dynamically (rather than using the
#[ignore] attribute at compile-time), we should probably switch to
using that hypothetical mechanism here. See this
internals.rust-lang.org thread.

If folks aren't happy with making the test skip itself, we could
change the implementation to worth with du(1) being at a different
location by finding it from the PATH, potentially falling back to
/usr/bin/du if it isn't on the PATH. My thought process was that it
was better to avoid complicating the production sled-agent code just to
support environments that the software will never actually run on for
test purposes, but I'm open to being convinced otherwise.

The `zone_bundle::disk_usage` function in sled-agent executes `du(1)` to
measure disk usage for a path. Presently, the path to the `du(1)`
executable is hard-coded to `/usr/bin/du`. On some systems, such as
NixOS, the `du(1)` executable may not live at the path `/usr/bin/du`. We
could, potentially, change the `disk_usage` function to find a `du(1)`
binary on the PATH, or fall back to other locations...but, on Helios, it
is always at `/usr/bin/du`, so hardcoding it is a good solution for the
environment that sled-agents are actually intended to run in.

However, hardcoding the path means that this test will always fail on
systems where there is no `/usr/bin/du`. Therefore, this commit changes
the test to first check if `/usr/bin/du` exists, and return early if it
does not, printing a warning indicating that the test is skipped.

If, someday, Rust's libtest grows a mechanism for tests to mark
themselves as skipped at runtime dynamically (rather than using the
`#[ignore]` attribute at compile-time), we should probably switch to
using that hypothetical mechanism here. See this
[internals.rust-lang.org thread][1].

If folks aren't happy with making the test skip itself, we *could*
change the implementation to worth with `du(1)` being at a different
location by finding it from the PATH, potentially falling back to
`/usr/bin/du` if it isn't on the PATH. My thought process was that it
was better to avoid complicating the production sled-agent code just to
support environments that the software will never actually run on for
test purposes, but I'm open to being convinced otherwise.

[1]: https://internals.rust-lang.org/t/pre-rfc-skippable-tests/14611
@hawkw
Copy link
Member Author

hawkw commented Aug 29, 2024

Another option is changing the dev shell environment provided by the Nix flake to use buildFHSEnv to create an isolated FHS root filesystem just for the Omicron dev shell...but that's annoying and I would prefer to avoid doing it if possible.

Copy link
Collaborator

@bnaecker bnaecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the caveat that I know zilch about NixOS, I approve!

Comment on lines +1687 to +1690
if !Utf8Path::new(DU).exists() {
eprintln!("No {DU} executable found, skipping disk usage test!");
return;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we put this behind a #[cfg(target_os = "linux")]? Super conservative, but would make me slightly more comfortable :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, although, perhaps we want #[cfg(not(target_os = "illumos"))]? Would we also want this behavior on MacOS/Windows?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just trying to preserve the existing behavior on the largest number of systems we already run the test on, that's all. Implementing du in Rust is also fine to me, just sounds like more work, but there may be a good library for it.

@davepacheco
Copy link
Collaborator

Would we consider implementing the du behavior in Rust?

@hawkw
Copy link
Member Author

hawkw commented Aug 29, 2024

Actually, here's a thought for another potential alternative: @sunshowers, is there a way to tell Nextest to skip a particular test using an environment variable? The Nix dev shell could set an env var that disables this test, so that it's only skipped on NixOS and you could get a message saying it was skipped, instead of a "PASS"...

@hawkw
Copy link
Member Author

hawkw commented Aug 29, 2024

Would we consider implementing the du behavior in Rust?

That would certainly be the ideal approach, especially given that there's already code for selecting different du args based on what OS we're on:

// Each OS implements slightly different `du` options.
//
// Linux and illumos support the "apparent" size in bytes, though using
// different options. macOS doesn't support bytes at all, and has a minimum
// block size of 512.
//
// We'll suffer the lower resolution on macOS, and get higher resolution on
// the others.
cfg_if::cfg_if! {
if #[cfg(target_os = "illumos")] {
const BLOCK_SIZE: u64 = 1;
const DU_ARG: &str = "-A";
} else if #[cfg(target_os = "linux")] {
const BLOCK_SIZE: u64 = 1;
const DU_ARG: &str = "-b";
} else if #[cfg(target_os = "macos")] {
const BLOCK_SIZE: u64 = 512;
const DU_ARG: &str = "-k";
} else {
compile_error!("unsupported target OS");
}
}

It would be a bit more work just to fix a test that doesn't work on my weird OS for insane people, but it also seems like a better solution in the long run; certainly less flaky than shelling out. On systems where there's a du binary, we could even cross-check our impl against du(1)...

@hawkw
Copy link
Member Author

hawkw commented Aug 29, 2024

I'm gonna take a crack at writing a pure-Rust du(1) and if I can get it working this morning, replace this with a PR for that instead.

@sunshowers
Copy link
Contributor

Actually, here's a thought for another potential alternative: @sunshowers, is there a way to tell Nextest to skip a particular test using an environment variable? The Nix dev shell could set an env var that disables this test, so that it's only skipped on NixOS and you could get a message saying it was skipped, instead of a "PASS"...

Long requested but not currently supported :(

hawkw added a commit that referenced this pull request Aug 29, 2024
Currently, the `zone_bundle::disk_usage` function in sled-agent shells
out to `du(1)` and parses its output. This works fine, but it's a bit
flaky, especially when running the corresponding tests on "weird"
platforms such as NixOS, where the `du(1)` binary doesn't live in
`/usr/bin/du` (see #6479).

This commit replaces the code that shells out to `/usr/bin/du` with a
pure-Rust rewrite. There's a unit test that asserts that the disk usage
calculated by the new implementation matches the value calculated by
`du(1)`, with code for handling the potential inaccuracy due to `du(1)`
operating in terms of 512B blocks on macOS.

Closes #6479
@hawkw
Copy link
Member Author

hawkw commented Aug 29, 2024

@davepacheco #6482 replaces the use of du with a Rust function. LMKWYT!

@hawkw
Copy link
Member Author

hawkw commented Aug 29, 2024

Actually, here's a thought for another potential alternative: @sunshowers, is there a way to tell Nextest to skip a particular test using an environment variable? The Nix dev shell could set an env var that disables this test, so that it's only skipped on NixOS and you could get a message saying it was skipped, instead of a "PASS"...

Long requested but not currently supported :(

FWIW, I think this might be possible by adding a "nixos" profile to the nextest config file that skips the test, and having the devshell set NEXTEST_PROFILE="nixos"? But, I think Dave's approach of just RIIRing du is probably nicer anyway.

@sunshowers
Copy link
Contributor

sunshowers commented Aug 29, 2024

Actually, here's a thought for another potential alternative: @sunshowers, is there a way to tell Nextest to skip a particular test using an environment variable? The Nix dev shell could set an env var that disables this test, so that it's only skipped on NixOS and you could get a message saying it was skipped, instead of a "PASS"...

Long requested but not currently supported :(

FWIW, I think this might be possible by adding a "nixos" profile to the nextest config file that skips the test, and having the devshell set NEXTEST_PROFILE="nixos"? But, I think Dave's approach of just RIIRing du is probably nicer anyway.

That would work but is kind of a big hammer, and the default set would need to stay up-to-date across profiles. I like the RiiR idea too.

@hawkw hawkw closed this in #6482 Sep 3, 2024
@hawkw hawkw closed this in b6a098d Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants