Skip `test_disk_usage` if there's no `/usr/bin/du` #6479

hawkw · 2024-08-29T16:19:20Z

The zone_bundle::disk_usage function in sled-agent executes du(1) to
measure disk usage for a path. Presently, the path to the du(1)
executable is hard-coded to /usr/bin/du. On some systems, such as
NixOS, the du(1) executable may not live at the path /usr/bin/du. We
could, potentially, change the disk_usage function to find a du(1)
binary on the PATH, or fall back to other locations...but, on Helios, it
is always at /usr/bin/du, so hardcoding it is a good solution for the
environment that sled-agents are actually intended to run in.

However, hardcoding the path means that this test will always fail on
systems where there is no /usr/bin/du. Therefore, this commit changes
the test to first check if /usr/bin/du exists, and return early if it
does not, printing a warning indicating that the test is skipped.

If, someday, Rust's libtest grows a mechanism for tests to mark
themselves as skipped at runtime dynamically (rather than using the
#[ignore] attribute at compile-time), we should probably switch to
using that hypothetical mechanism here. See this
internals.rust-lang.org thread.

If folks aren't happy with making the test skip itself, we could
change the implementation to worth with du(1) being at a different
location by finding it from the PATH, potentially falling back to
/usr/bin/du if it isn't on the PATH. My thought process was that it
was better to avoid complicating the production sled-agent code just to
support environments that the software will never actually run on for
test purposes, but I'm open to being convinced otherwise.

The `zone_bundle::disk_usage` function in sled-agent executes `du(1)` to measure disk usage for a path. Presently, the path to the `du(1)` executable is hard-coded to `/usr/bin/du`. On some systems, such as NixOS, the `du(1)` executable may not live at the path `/usr/bin/du`. We could, potentially, change the `disk_usage` function to find a `du(1)` binary on the PATH, or fall back to other locations...but, on Helios, it is always at `/usr/bin/du`, so hardcoding it is a good solution for the environment that sled-agents are actually intended to run in. However, hardcoding the path means that this test will always fail on systems where there is no `/usr/bin/du`. Therefore, this commit changes the test to first check if `/usr/bin/du` exists, and return early if it does not, printing a warning indicating that the test is skipped. If, someday, Rust's libtest grows a mechanism for tests to mark themselves as skipped at runtime dynamically (rather than using the `#[ignore]` attribute at compile-time), we should probably switch to using that hypothetical mechanism here. See this [internals.rust-lang.org thread][1]. If folks aren't happy with making the test skip itself, we *could* change the implementation to worth with `du(1)` being at a different location by finding it from the PATH, potentially falling back to `/usr/bin/du` if it isn't on the PATH. My thought process was that it was better to avoid complicating the production sled-agent code just to support environments that the software will never actually run on for test purposes, but I'm open to being convinced otherwise. [1]: https://internals.rust-lang.org/t/pre-rfc-skippable-tests/14611

hawkw · 2024-08-29T16:26:31Z

Another option is changing the dev shell environment provided by the Nix flake to use buildFHSEnv to create an isolated FHS root filesystem just for the Omicron dev shell...but that's annoying and I would prefer to avoid doing it if possible.

bnaecker

With the caveat that I know zilch about NixOS, I approve!

bnaecker · 2024-08-29T16:29:42Z

sled-agent/src/zone_bundle.rs

+        if !Utf8Path::new(DU).exists() {
+            eprintln!("No {DU} executable found, skipping disk usage test!");
+            return;
+        }


Could we put this behind a #[cfg(target_os = "linux")]? Super conservative, but would make me slightly more comfortable :)

Sure, although, perhaps we want #[cfg(not(target_os = "illumos"))]? Would we also want this behavior on MacOS/Windows?

I was just trying to preserve the existing behavior on the largest number of systems we already run the test on, that's all. Implementing du in Rust is also fine to me, just sounds like more work, but there may be a good library for it.

davepacheco · 2024-08-29T16:41:00Z

Would we consider implementing the du behavior in Rust?

hawkw · 2024-08-29T16:41:59Z

Actually, here's a thought for another potential alternative: @sunshowers, is there a way to tell Nextest to skip a particular test using an environment variable? The Nix dev shell could set an env var that disables this test, so that it's only skipped on NixOS and you could get a message saying it was skipped, instead of a "PASS"...

hawkw · 2024-08-29T16:48:29Z

Would we consider implementing the du behavior in Rust?

That would certainly be the ideal approach, especially given that there's already code for selecting different du args based on what OS we're on:

omicron/sled-agent/src/zone_bundle.rs

Lines 1554 to 1575 in b78cbab

    
           // Each OS implements slightly different `du` options. 
        
           // 
        
           // Linux and illumos support the "apparent" size in bytes, though using 
        
           // different options. macOS doesn't support bytes at all, and has a minimum 
        
           // block size of 512. 
        
           // 
        
           // We'll suffer the lower resolution on macOS, and get higher resolution on 
        
           // the others. 
        
           cfg_if::cfg_if! { 
        
               if #[cfg(target_os = "illumos")] { 
        
                   const BLOCK_SIZE: u64 = 1; 
        
                   const DU_ARG: &str = "-A"; 
        
               } else if #[cfg(target_os = "linux")] { 
        
                   const BLOCK_SIZE: u64 = 1; 
        
                   const DU_ARG: &str = "-b"; 
        
               } else if #[cfg(target_os = "macos")] { 
        
                   const BLOCK_SIZE: u64 = 512; 
        
                   const DU_ARG: &str = "-k"; 
        
               } else { 
        
                   compile_error!("unsupported target OS"); 
        
               } 
        
           }

It would be a bit more work just to fix a test that doesn't work on my weird OS for insane people, but it also seems like a better solution in the long run; certainly less flaky than shelling out. On systems where there's a du binary, we could even cross-check our impl against du(1)...

hawkw · 2024-08-29T17:39:31Z

I'm gonna take a crack at writing a pure-Rust du(1) and if I can get it working this morning, replace this with a PR for that instead.

sunshowers · 2024-08-29T18:06:15Z

Actually, here's a thought for another potential alternative: @sunshowers, is there a way to tell Nextest to skip a particular test using an environment variable? The Nix dev shell could set an env var that disables this test, so that it's only skipped on NixOS and you could get a message saying it was skipped, instead of a "PASS"...

Long requested but not currently supported :(

Currently, the `zone_bundle::disk_usage` function in sled-agent shells out to `du(1)` and parses its output. This works fine, but it's a bit flaky, especially when running the corresponding tests on "weird" platforms such as NixOS, where the `du(1)` binary doesn't live in `/usr/bin/du` (see #6479). This commit replaces the code that shells out to `/usr/bin/du` with a pure-Rust rewrite. There's a unit test that asserts that the disk usage calculated by the new implementation matches the value calculated by `du(1)`, with code for handling the potential inaccuracy due to `du(1)` operating in terms of 512B blocks on macOS. Closes #6479

hawkw · 2024-08-29T18:39:34Z

@davepacheco #6482 replaces the use of du with a Rust function. LMKWYT!

hawkw · 2024-08-29T18:41:06Z

Actually, here's a thought for another potential alternative: @sunshowers, is there a way to tell Nextest to skip a particular test using an environment variable? The Nix dev shell could set an env var that disables this test, so that it's only skipped on NixOS and you could get a message saying it was skipped, instead of a "PASS"...

Long requested but not currently supported :(

FWIW, I think this might be possible by adding a "nixos" profile to the nextest config file that skips the test, and having the devshell set NEXTEST_PROFILE="nixos"? But, I think Dave's approach of just RIIRing du is probably nicer anyway.

sunshowers · 2024-08-29T18:56:32Z

Actually, here's a thought for another potential alternative: @sunshowers, is there a way to tell Nextest to skip a particular test using an environment variable? The Nix dev shell could set an env var that disables this test, so that it's only skipped on NixOS and you could get a message saying it was skipped, instead of a "PASS"...

Long requested but not currently supported :(

FWIW, I think this might be possible by adding a "nixos" profile to the nextest config file that skips the test, and having the devshell set NEXTEST_PROFILE="nixos"? But, I think Dave's approach of just RIIRing du is probably nicer anyway.

That would work but is kind of a big hammer, and the default set would need to stay up-to-date across profiles. I like the RiiR idea too.

hawkw requested review from bnaecker, iliana and sunshowers August 29, 2024 16:19

bnaecker approved these changes Aug 29, 2024

View reviewed changes

hawkw mentioned this pull request Aug 29, 2024

[sled-agent] Replace du(1) with a Rust function #6482

Merged

hawkw closed this in #6482 Sep 3, 2024

hawkw closed this in b6a098d Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip `test_disk_usage` if there's no `/usr/bin/du` #6479

Skip `test_disk_usage` if there's no `/usr/bin/du` #6479

hawkw commented Aug 29, 2024

hawkw commented Aug 29, 2024

bnaecker left a comment

bnaecker Aug 29, 2024

hawkw Aug 29, 2024

bnaecker Aug 29, 2024

davepacheco commented Aug 29, 2024

hawkw commented Aug 29, 2024

hawkw commented Aug 29, 2024

hawkw commented Aug 29, 2024

sunshowers commented Aug 29, 2024

hawkw commented Aug 29, 2024

hawkw commented Aug 29, 2024

sunshowers commented Aug 29, 2024 •

edited

Loading

Skip test_disk_usage if there's no /usr/bin/du #6479

Skip test_disk_usage if there's no /usr/bin/du #6479

Conversation

hawkw commented Aug 29, 2024

hawkw commented Aug 29, 2024

bnaecker left a comment

Choose a reason for hiding this comment

bnaecker Aug 29, 2024

Choose a reason for hiding this comment

hawkw Aug 29, 2024

Choose a reason for hiding this comment

bnaecker Aug 29, 2024

Choose a reason for hiding this comment

davepacheco commented Aug 29, 2024

hawkw commented Aug 29, 2024

hawkw commented Aug 29, 2024

hawkw commented Aug 29, 2024

sunshowers commented Aug 29, 2024

hawkw commented Aug 29, 2024

hawkw commented Aug 29, 2024

sunshowers commented Aug 29, 2024 • edited Loading

Skip `test_disk_usage` if there's no `/usr/bin/du` #6479

Skip `test_disk_usage` if there's no `/usr/bin/du` #6479

sunshowers commented Aug 29, 2024 •

edited

Loading