Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sled-agent] Replace du(1) with a Rust function #6482

Merged
merged 8 commits into from
Sep 3, 2024
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
194 changes: 139 additions & 55 deletions sled-agent/src/zone_bundle.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1545,61 +1545,54 @@ async fn compute_bundle_utilization(

// Return the number of bytes occupied by the provided directory.
//
// This returns an error if:
//
// - The "du" command fails
// - Parsing stdout fails
// - Parsing the actual size as a u64 fails
// This returns an error if reading any file or directory within the provided
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: Is there a reason that these are not doc comments?

// directory fails.
async fn disk_usage(path: &Utf8PathBuf) -> Result<u64, BundleError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's probably worth considering a different name here, given the function appears to be assessing the apparent size of a tree of files rather than the actual disk usage. I definitely would have guessed the other way based on the name!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, that's a good point --- du(1) was actually reporting physical disk usage. We should probably change this to be more faithful to du rather than reporting logical size...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, that's a good point --- du(1) was actually reporting physical disk usage.

I don't believe it was, because we were using the -A flag. To be clear I think the name was already confusing haha

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, okay, I can just rename the function, then. Unless we want the other behavior --- I'm not totally sure. @bnaecker, this is your code, right?

Copy link
Collaborator

@bnaecker bnaecker Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, yeah. I have no preference on the name, that was chosen since it was calling out to du. Whatever you want to call it is fine by me! Josh is correct that the calling code is using that to find the size of a tree of files, during a check against the zone bundle dataset quota.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

772cd16 renames it to dir_size, but I'm happy to change the name again if there's something y'all think would be clearer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can that be &Utf8Path btw? I know you didn't touch that specific line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can, yeah — I just didn't want to mess with the surrounding code as part of a quick fix. Happy to go and change it as well.

// Each OS implements slightly different `du` options.
//
// Linux and illumos support the "apparent" size in bytes, though using
// different options. macOS doesn't support bytes at all, and has a minimum
// block size of 512.
//
// We'll suffer the lower resolution on macOS, and get higher resolution on
// the others.
cfg_if::cfg_if! {
if #[cfg(target_os = "illumos")] {
const BLOCK_SIZE: u64 = 1;
const DU_ARG: &str = "-A";
} else if #[cfg(target_os = "linux")] {
const BLOCK_SIZE: u64 = 1;
const DU_ARG: &str = "-b";
} else if #[cfg(target_os = "macos")] {
const BLOCK_SIZE: u64 = 512;
const DU_ARG: &str = "-k";
} else {
compile_error!("unsupported target OS");
let path = path.clone();
// We could, alternatively, just implement this using `tokio::fs` to read
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have not looked at this much yet but I wonder if we should just use the walkdir crate that we already have? I feel like there are so many edge cases in doing traversal that it's worth using a crate (but I don't feel strongly).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, i didn't realize that was in our dependency tree, I'll take a look at it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

b4e8731 changes this to use walkdir. I will add that, on my machine, the walkdir implementation takes ~100 ms to calculate the size of sled-agent/src, while my previous implementation did it in ~8 ms, and du(1) only takes about 6 ms. I'm not sure if we care about performance here that much, though, so that may not matter a whole lot --- making sure we're handling edge cases is definitely important, but we're also not using all that much of walkdir's functionality (e.g. traversing symlinks).

Copy link
Contributor

@sunshowers sunshowers Aug 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at https://github.com/uutils/coreutils/blob/8fe93ded741f8257e9d0a5bce12f904e09227a59/src/uu/du/src/du.rs#L319, at least du looks pretty simple. There are definitely issues around traversing symlinks and bind mounts (loops in general) but I guess once we're not traversing symlinks, the only other check one could do is to ensure we don't cross filesystems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also that code's recursive, huh. I'd definitely write this using a heap-allocated stack personally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also that code's recursive, huh. I'd definitely write this using a heap-allocated stack personally.

That was what my original implementation did! :)

// the directory and so on. However, the `tokio::fs` APIs are basically just
// a wrapper around `spawn_blocking`-ing code that does the `std::fs`
// functions. So, putting the whole thing in one big `spawn_blocking`
// closure that just does all the blocking fs ops lets us spend less time
// creating and destroying tiny blocking tasks.
tokio::task::spawn_blocking(move || {
let mut sum = 0;
// Rather than recursing, we use a VecDeque as a queue of paths to scan
// next. This is so that if we're walking a very deeply nested directory
// structure, we don't overflow our stack.
//
// TODO(eliza): this could, conceivably, be parallelized by spawning a
// bunch of blocking tasks to walk each sub-directory, instead of using
// a queue. But, is optimizing this really all that important? It's
// probably far from the slowest part of zone bundle collection anyway...
let mut dirs_to_scan = std::collections::VecDeque::new();
dirs_to_scan.push_back(path.to_owned());

while let Some(dir) = dirs_to_scan.pop_front() {
let mk_dir_error = |err| BundleError::ReadDirectory {
directory: dir.clone(),
err,
};
for entry in std::fs::read_dir(&dir).map_err(mk_dir_error)? {
let entry = entry.map_err(mk_dir_error)?;
let mk_error = |err| BundleError::ReadDirectory {
directory: Utf8PathBuf::try_from(entry.path())
.unwrap_or_else(|_| path.clone()),
err,
};
if entry.file_type().map_err(mk_error)?.is_dir() {
let path = entry.path().try_into()?;
dirs_to_scan.push_back(path);
} else {
sum += entry.metadata().map_err(mk_error)?.len()
hawkw marked this conversation as resolved.
Show resolved Hide resolved
}
}
}
}
const DU: &str = "/usr/bin/du";
let args = &[DU_ARG, "-s", path.as_str()];
let output = Command::new(DU).args(args).output().await.map_err(|err| {
BundleError::Command { cmd: format!("{DU} {}", args.join(" ")), err }
})?;
let err = |msg: &str| {
BundleError::Cleanup(anyhow!(
"failed to fetch disk usage for {}: {}",
path,
msg,
))
};
if !output.status.success() {
return Err(err("du command failed"));
}
let Ok(s) = std::str::from_utf8(&output.stdout) else {
return Err(err("non-UTF8 stdout"));
};
let Some(line) = s.lines().next() else {
return Err(err("no lines in du output"));
};
let Some(part) = line.trim().split_ascii_whitespace().next() else {
return Err(err("no disk usage size computed in output"));
};
part.parse()
.map(|x: u64| x.saturating_mul(BLOCK_SIZE))
.map_err(|_| err("failed to parse du output"))

Ok(sum)
})
.await
.expect("spawned blocking task should not be cancelled or panic")
}

// Return the quota for a ZFS dataset, or the available size.
Expand Down Expand Up @@ -1659,8 +1652,7 @@ async fn zfs_quota(path: &Utf8PathBuf) -> Result<u64, BundleError> {

#[cfg(test)]
mod tests {
use super::disk_usage;
use super::Utf8PathBuf;
use super::*;

#[tokio::test]
async fn test_disk_usage() {
Expand All @@ -1676,6 +1668,98 @@ mod tests {
let path = Utf8PathBuf::from("/some/nonexistent/path");
assert!(disk_usage(&path).await.is_err());
}

#[tokio::test]
#[cfg(any(
target_os = "illumos",
target_os = "linux",
target_os = "macos"
))]
async fn test_disk_usage_matches_du() {
// Each OS implements slightly different `du` options.
//
// Linux and illumos support the "apparent" size in bytes, though using
// different options. macOS doesn't support bytes at all, and has a minimum
// block size of 512.
//
// We'll suffer the lower resolution on macOS, and get higher resolution on
// the others.
cfg_if::cfg_if! {
if #[cfg(target_os = "illumos")] {
const BLOCK_SIZE: u64 = 1;
const DU_ARG: &str = "-A";
} else if #[cfg(target_os = "linux")] {
const BLOCK_SIZE: u64 = 1;
const DU_ARG: &str = "-b";
} else if #[cfg(target_os = "macos")] {
const BLOCK_SIZE: u64 = 512;
const DU_ARG: &str = "-k";
} else {
compile_error!("unsupported target OS");
}
}

async fn disk_usage_du(path: &Utf8PathBuf) -> Result<u64, BundleError> {
const DU: &str = "du";
let args = &[DU_ARG, "-s", path.as_str()];
let output =
Command::new(DU).args(args).output().await.map_err(|err| {
BundleError::Command {
cmd: format!("{DU} {}", args.join(" ")),
err,
}
})?;
let err = |msg: &str| {
BundleError::Cleanup(anyhow!(
"failed to fetch disk usage for {}: {}",
path,
msg,
))
};
if !output.status.success() {
return Err(err("du command failed"));
}
let Ok(s) = std::str::from_utf8(&output.stdout) else {
return Err(err("non-UTF8 stdout"));
};
let Some(line) = s.lines().next() else {
return Err(err("no lines in du output"));
};
let Some(part) = line.trim().split_ascii_whitespace().next() else {
return Err(err("no disk usage size computed in output"));
};
part.parse()
.map(|x: u64| x.saturating_mul(BLOCK_SIZE))
.map_err(|_| err("failed to parse du output"))
}

let path =
Utf8PathBuf::from(concat!(env!("CARGO_MANIFEST_DIR"), "/src"));
let t0 = std::time::Instant::now();
let usage = dbg!(disk_usage(&path).await)
.expect("disk usage for src dir failed");
eprintln!("disk_usage({path}) took {:?}", t0.elapsed());

let t0 = std::time::Instant::now();
let du_usage =
dbg!(disk_usage_du(&path).await).expect("running du failed!");
eprintln!("du -s {path} took {:?}", t0.elapsed());

// Round down the Rust disk usage result to `du`'s block size on this
hawkw marked this conversation as resolved.
Show resolved Hide resolved
// system.
let usage = if BLOCK_SIZE > 1 {
eprintln!("rounding up to `du(1)`'s block size of {BLOCK_SIZE}B");
dbg!(usage.div_ceil(BLOCK_SIZE) * BLOCK_SIZE)
} else {
usage
};

assert_eq!(
usage, du_usage,
"expected `disk_usage({path})` to == `du -s {path}`\n\
{usage}B (`disk_usage`) != {du_usage}B (`du`)",
);
}
}

#[cfg(all(target_os = "illumos", test))]
Expand Down
Loading