Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

umbrella: notifications on crash dump savecore #4293

Open
wesolows opened this issue Oct 18, 2023 · 0 comments
Open

umbrella: notifications on crash dump savecore #4293

wesolows opened this issue Oct 18, 2023 · 0 comments
Labels
Debugging For when you want better data in debugging an issue (log messages, post mortem debugging, and more) mvp Sled Agent Related to the Per-Sled Configuration and Management
Milestone

Comments

@wesolows
Copy link

There are a number of umbrella and individual tickets covering collection and management of data used to debug problems with the machine and our software. A few examples of these are #2235, #2478, and #3860. The general premise of these tickets is that when a crash (of a sled, or of component user software) occurs, we will preserve data that may be useful in understanding the cause.

This ticket covers notifying Oxide (or in principle a third-party support provider) that such an event has occurred and data is, or should be, available for retrieval. A simpler and more universal aspect of this is notifying operators; notifications of events like these is discussed in RFDs 55 and 307; while the latter was clearly intended to cover functionality available at RR, I'm unaware of any current means by which an operator can be notified when a sled has crashed and rebooted with a dump saved. The operator should also be able to query via API the state of debug data availability of each sled or a specific sled, or the entire machine, as well as some crash event history. There is probably also scope here for detection of sleds that are not functioning at all, tying in with #4287 and reporting this as an event even if no automated action is taken as a matter of policy. Consider RFDs 82 and 302 here.

This is another umbrella ticket that covers what is essentially project-scope work. Additional tickets for specific pieces are likely to be desirable.

@wesolows wesolows added Sled Agent Related to the Per-Sled Configuration and Management mvp Debugging For when you want better data in debugging an issue (log messages, post mortem debugging, and more) labels Oct 18, 2023
@wesolows wesolows added this to the MVP milestone Oct 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Debugging For when you want better data in debugging an issue (log messages, post mortem debugging, and more) mvp Sled Agent Related to the Per-Sled Configuration and Management
Projects
None yet
Development

No branches or pull requests

1 participant