-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unit tests for DumpSetup #3788
Unit tests for DumpSetup #3788
Conversation
bc0c1b6
to
af38052
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments in-line -- I think it's great to have tests, and I'm mostly on-board with this direction, but I'm wondering if we could use fakes instead of mocks to perform these tests.
acfa80c
to
97f4c85
Compare
fa62391
to
6910e78
Compare
6910e78
to
62ad4d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests look great, thanks for adding these!
@@ -398,7 +398,7 @@ wicket-common = { path = "wicket-common" } | |||
wicketd-client = { path = "clients/wicketd-client" } | |||
zeroize = { version = "1.7.0", features = ["zeroize_derive", "std"] } | |||
zip = { version = "0.6.6", default-features = false, features = ["deflate","bzip2"] } | |||
zone = { version = "0.3", default-features = false, features = ["async"] } | |||
zone = { version = "0.3", default-features = false, features = ["async", "sync"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do really need the sync feature here? If it's possible, it would be cool to keep the codebase async-only.
AFAICT, this is only used to call Zones::list_blocking
, which seems like it could easily be replaced with:
If we use async_trait
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zones::list_blocking
is called a couple layers down the stack from DumpSetupWorker::archive_files
, which is called within a MutexGuard - I was hoping to avoid questions of cancellation-safety issues this way.
))); | ||
let worker_weak = Arc::downgrade(&worker); | ||
let log_poll = log.new(o!("component" => "DumpSetup-archival")); | ||
let _poller = std::thread::spawn(move || { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I know this has been moved, so, no need to block this PR, but might be worth considering "why a std::thread" vs "how about a tokio::task?"
We use a lot of tokio tasks for async work here, if poll_file_archival
could make sense as an async operation, this might make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely open to reworking it to a tokio::task in a follow-up -- now that there are tests to make such a change less risky, and the conditions that led to using a std::thread here ~7 months ago are no longer a concern.
// function also invokes `savecore(8)` to save it into that directory. | ||
// On success, returns Ok(Some(stdout)) if `savecore(8)` was invoked, or | ||
// Ok(None) if it wasn't. | ||
fn dumpadm( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm kinda struggling to see what's different from before, but this looks pretty different from just "unit tests" + "refactoring for unit tests", right?
I don't recall seeing this "check for savecore directory, use /tmp/crash
if it doesn't exist" behavior before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's here in the red half of the diff https://github.com/oxidecomputer/omicron/pull/3788/files#diff-7fb5da3f3c07c737a75656eb61b1c77ed35a9fefae4c8454128cd681e97b6408L147
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotcha, thank you!
// if we don't have a savecore destination yet, still create and use | ||
// a tmpfs path (rather than the default location under /var/crash, | ||
// which is in the ramdisk pool), because dumpadm refuses to do what | ||
// we ask otherwise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/tmp
appears to be backed by swap -- I think that using /var/crash
also has problems (writing a crash dump of the kernel to a ramdisk is not useful) but writing a kernel crash dump to swap also seems like it's basically throwing it away.
Is using this /tmp
path better than not calling savecore at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the case where the /tmp path is used as a placeholder for the invocation to dumpadm
, we don't actually invoke savecore
. this is just there to give dumpadm
a 'safer'1 placeholder -- we need it to tell the kernel to direct its core to a specific raw partition should it crash, but it also writes the config file that savecore
reads. we only invoke savecore
later in this if savecore_dir.is_some()
-- that is, if we have a non-placeholder destination for it.
Footnotes
-
for some value of safer. in theory, sled-agent's the only thing ever invoking savecore on the host, but if we're in a situation where kernel core dumps are part of the conversation, we may want an amount of defense-in-depth against entropy -- supposing for example we're in some situation where we may not be able to enumerate the external disks, but where an operator can manually invoke savecore and retrieve the compressed dump from /tmp through some other means. ↩
verifies decision-making in different combinations of M.2/U.2 dataset and dump slice availability and occupancy & tests log/core-archiving.
62ad4d7
to
d36c61c
Compare
verifies decision-making in different combinations of M.2/U.2 dataset and dump slice availability and occupancy & tests log/core-archiving. (functionality that had been implemented for #2478)