Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sled-agent: Setup dump dev on M.2, savecore to U.2 #3586

Merged
merged 4 commits into from
Jul 16, 2023

Conversation

lifning
Copy link
Contributor

@lifning lifning commented Jul 13, 2023

Each time the sled-agent upserts a new disk, it enumerates every disk it knows about. If an M.2 is found, it runs dumpadm(8) to set it as the dump device. If a U.2 is also found, it has dumpadm(8) invoke savecore(8) to save the previous dump on the M.2, if any, to an encrypted zvol on the U.2, and mark the dump slice as empty.

This is a bit of a kludge due to savecore(8) not yet exposing a clear way to save-and-clear a dump partition other than the one configured system-wide by dumpadm. While redundant, this is for our purposes idempotent - as long as any M.2 is configured, dumps have a place to go, and as long as any U.2 is present, we will attempt to save any yet-unsaved kernel core dumps on all known M.2s to it.

(see RFD 118 for details of partition layout) (#2450, #2478)

@jclulow
Copy link
Collaborator

jclulow commented Jul 14, 2023

FYI: I have turned off the site/compliance/dump service by default as of oxidecomputer/helios@811f89c which should mean that nothing from the OS image is doing anything with dumpadm or savecore. There should be no need to deal with the crash dump directory pilot might have created in the past, etc, as that should never end up existing on production systems.

Let me know if you need something else there!

@lifning lifning force-pushed the dumpadm-setup branch 6 times, most recently from b915ee6 to 5572094 Compare July 15, 2023 05:04
@lifning lifning marked this pull request as ready for review July 15, 2023 05:08
@lifning lifning requested a review from smklein July 15, 2023 05:26
@jclulow
Copy link
Collaborator

jclulow commented Jul 15, 2023

FYI: adjacent relevant follow-up fix I have made to Helios: oxidecomputer/helios@6cc29f6

@lifning lifning force-pushed the dumpadm-setup branch 2 times, most recently from ff70527 to 6fc06e7 Compare July 16, 2023 02:12
Copy link
Collaborator

@smklein smklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The integration pieces of this make sense to me, and look good!

I have some question that are 90% related to me trying to understand the savecore and dumapdm utilities.

Comment on lines 129 to 148
// ...but do create and use a tmpfs path (rather than the default
// location under /var/crash, which is in the ramdisk pool), because
// dumpadm refuses to do what we ask otherwise.
let tmp_crash = "/tmp/crash";
std::fs::create_dir_all(tmp_crash).map_err(DumpAdmError::Mkdir)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... weird, so it's not possible to run -n without -s?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dumpadm doesn't like it if the configured savecore directory doesn't exist, so we're making sure it does (even though we're not using it in this case)

Comment on lines +111 to +125
// Include memory from the current process if there is one for the panic
// context, in addition to kernel memory:
cmd.arg("-c").arg("curproc");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a superset of the all option?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curproc dumps more than kernel but less than all:

           kernel

               Kernel memory pages only.

           all

               All memory pages.

           curproc

               Kernel memory pages, and the memory pages of the process whose
               thread was currently executing on the CPU on which the crash
               dump was initiated. If the thread executing on that CPU is a
               kernel thread not associated with any user process, only kernel
               pages will be dumped.

Comment on lines +33 to +41
pub fn dump_flag_is_valid(
dump_slice: &Utf8PathBuf,
) -> Result<bool, DumpHdrError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does "true" here imply that there is a dump we should extract via savecore? If so: could we add that as a doc comment?

Comment on lines 142 to 163
if savecore_dir.is_some() {
if let Ok(true) = dump_flag_is_valid(dump_slice) {
return savecore();
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these conditionals basically "have we just noticed that a dump already exists on this slice, ready for extraction via savecore?"


#[derive(Default)]
pub struct DumpSetup {
savecore_lock: Arc<std::sync::Mutex<()>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this lock is particularly opt-in, can we document what it's doing? Looks like "preventing us from running savecore concurrently", right?

disks: &mut MutexGuard<'_, HashMap<DiskIdentity, DiskWrapper>>,
log: Logger,
) {
let mut dump_slices = Vec::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a major issue, but it feels a little wasteful to call this and re-calculate all state for all disks when we're only actually inserting one disk at a time. We do have the DumpSetup struct; we could keep a cache of all the information we need?

(Not a blocker for this PR though, the only difference here is a minor efficiency difference)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's what i'm thinking for a cleaner follow-up, the thought behind what's here now is "get dumps configured at all, these machines can probably handle counting to 10 redundantly a few times at startup for now"

Comment on lines 173 to 175
// in the event that savecore(8) terminates before it finishes copying the
// dump, the incomplete dump will remain in the target directory, but the next
// invocation will overwrite it.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, so bounds is like, "the last successful savecore iteration." seems like this overwrite behavior is kinda what we want then, eh?

Comment on lines +114 to +120
Ok(false) => {
info!(log, "Dump slice {dump_slice:?} appears to have already been saved");
}
Err(err) => {
debug!(log, "Dump slice {dump_slice:?} appears to be unused: {err:?}");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be breaking out early in these cases? Or is the goal to map "dump slice" to "dump dir" even if no dump exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on the contrary, in these cases we definitely want to configure the system to use the dump slice, as there's nothing important there to worry about overwriting

Comment on lines +59 to +61
// TODO: a more reasonable way of deduplicating the effort.
let _guard = savecore_lock.lock();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this lock exists to prevent us from running savecore concurrently, but we're configuring dumpadm in such as way that it should be running savecore on reboot, if there was a crash.

Do we risk racing between the "system-initiated savecore" and the "sled-agent-initiated savecore"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's no system-initiated savecore at boot time, actually! i need to change the flags here to be more clear about that. at boot-time, our system has no configured dump slice until sled-agent sets one, and so won't ever invoke savecore by itself.

Comment on lines +104 to +119
pub fn dumpadm(
dump_slice: &Utf8PathBuf,
savecore_dir: Option<&Utf8PathBuf>,
) -> Result<Option<OsString>, DumpAdmError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document -- what's the returned string here?

Comment on lines +61 to +71
let version =
f.read_u32::<LittleEndian>().map_err(DumpHdrError::ReadVersion)?;
if version != DUMP_VERSION {
return Err(DumpHdrError::InvalidVersion(version));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the point of checking the version that the flag semantics might change in the future? I am curious how often dump_version changes and whether this check is asking for trouble in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. i'm operating under the assumption that the position and semantics of the magic number and the version are stable, and that everything else may change if the version changes. ideally (and i've mentioned as much to rm) we wouldn't have any such version-sensitivity and this function would be replaced by shelling out to some kind of savecore --tell-me-if-there-is-one=/on/this/slice

Comment on lines 167 to 169
// invokes savecore(8) according to the system-wide config set by dumpadm.
// savecore(8) creates a file in the savecore directory called `vmdump.<n>`,
// where `<n>` is the number in the neighboring plaintext file called `bounds`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably mention the default if bounds is not present (or parseable) is 0

lifning pushed a commit to lifning/omicron that referenced this pull request Jul 16, 2023
@lifning lifning force-pushed the dumpadm-setup branch 2 times, most recently from 54bdb28 to 69cb0f9 Compare July 16, 2023 07:24
Each time the sled-agent upserts a new disk, it enumerates every disk it
knows about. If an M.2 is found, it runs dumpadm(8) to set it as the
dump device. If a U.2 is *also* found, it has dumpadm(8) invoke
savecore(8) to save the previous dump on the M.2, if any, to the U.2,
and mark it as empty.

This is a bit of a kludge due to savecore(8) not yet exposing a clear
way to save-and-clear a dump *partition* other than the one configured
system-wide by dumpadm. While redundant, this is for our purposes
idempotent - as long as *any* M.2 is configured, dumps have a place to
go, and as long as any U.2 is present, we will attempt to save any
yet-unsaved kernel core dumps on all known M.2s to it.

(see RFD 118 for details of partition layout)
@lifning lifning enabled auto-merge (squash) July 16, 2023 10:22
@lifning lifning merged commit f1cc092 into oxidecomputer:main Jul 16, 2023
@lifning lifning deleted the dumpadm-setup branch July 16, 2023 10:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants