-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sled-agent performs archival of rotated logs for all zones onto U.2 debug dataset #3713
Conversation
sled-agent/src/storage/dump_setup.rs
Outdated
// as we rotate them out, logadm will keep resetting to .log.0, | ||
// so we need to maintain our own numbering in the dest dataset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest using .<epoch seconds>
instead of .N
here (and adding .N in the unlikely event there is a conflict).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should reconfigure logadm in the zone to rotate the files using a date stamp pattern, like 2023-07-19T10:00:00Z
(%FT%TZ
maybe?) rather than try to work around the integer suffixes.
sled-agent/src/storage/dump_setup.rs
Outdated
) -> Result<(), RotateLogsError> { | ||
// pattern matching rotated logs, e.g. foo.log.3 | ||
let pattern = logdir | ||
.join("*.log.*") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably safer to use *.log.?
and *.log.??
as patterns and combine the glob matches, just in case there is a file that has .log.
in the middle of its base name (shouldn't happen, but...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than using a glob or a regex, perhaps we should look at creating a separate directory in the zone to hold the rotated files and have logadm move them there as part of rotation?
Just as a small terminological suggestion: I would draw a distinction between the act of rotation, which I expect we're still having logadm perform inside the zone, and the act of archival, which is lifting them out once they've been rotated and which it seems we're looking to do from outside the zone here. |
(Part of #2478, continued in #3713) This configures coreadm to put all core dumps onto the M.2 'crash' dataset, and creates a thread that moves them all onto a U.2 'debug' dataset every 5 minutes. This also refactors the dumpadm/savecore code to be less redundant and more flexible, and adds an amount of arbitrary logic for e.g. picking the U.2 onto which to save cores.
(Part of #2478, continued in #3713) This configures coreadm to put all core dumps onto the M.2 'crash' dataset, and creates a thread that moves them all onto a U.2 'debug' dataset every 5 minutes. This also refactors the dumpadm/savecore code to be less redundant and more flexible, and adds an amount of arbitrary logic for e.g. picking the U.2 onto which to save cores.
…ebug dataset (#3713) This periodically moves logs rotated by logadm in cron (oxidecomputer/helios#107) into the crypt/debug zfs dataset on the U.2 chosen by the logic in #3677. It replaces the rotated number (*.log.0, *.log.1) with the unix epoch timestamp of the rotated log's modification time such that they don't collide when collected repeatedly (logadm will reset numbering when the previous ones are moved away). (for #2478)
(for #2478, depends on #3677 (lifning/omicron@coreadm...log-rotate))
This periodically moves logs rotated by logadm in cron (oxidecomputer/helios#107) into the crypt/debug zfs dataset on the U.2 chosen by the logic in #3677. It replaces the rotated number (*.log.0, *.log.1) with the unix epoch timestamp of the rotated log's modification time such that they don't collide when collected repeatedly (logadm will reset numbering when the previous ones are moved away).
After putting kernel dumps on both M.2 dump slices, starting sled-agent, forcing
logadm -p now smf_logs_daily
in every zone, then runningint main() { return *(int*)0; }
to generate a core in an oxz_nexus_ zone and the global zone: