-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sled agent] Consider setting uniform coreadm values to extract info from terminating processes? #1597
Comments
In particular, we want to enable global cores and use the |
A request: probably please don't hard code |
coreadm wants the directory to exist before it will create a core. If we use |
I believe that in the case being referenced (Joyent's SmartOS), the path was really |
https://www.illumos.org/issues/2123 |
If we need to create a cores directory we can do that in the brand code. It has hooks for installing and for booting and so on. |
If we do want a |
I expect so. I'm not sure where the path was actually managed, though. Maybe as Josh suggested it was done in the brand code. |
I'm not sure. We've got to decide first where the core files will go. That'll presumably be some directory on a ZFS dataset on some zpool. Who creates the pool? The dataset? The directory? My first thought is that we put all of the core files into one directory per Sled (i.e., don't create a per-zone dataset or even directory). That's because I'm not sure what we'd gain from separate datasets or directories per zone, and this way we don't have to do anything here when zones come and go. Still, I'm not sure what storage we want to put these on, so I don't know what pool or dataset we want to put these on, so I don't know who's responsible for it. |
Separate datasets per zone would allow us to have a separate quota for core files per zone, which I suspect would be valuable. It would be good to avoid a run-away core generator in zone A from preventing a subsequent single core file being generated by zone B. We'll also want an overall quota that inhibits cores from exhausting the space in the pool they're in. I think we'll want to put this stuff on a dataset we create in some U.2 device or devices. A few thoughts:
There is not, I suspect, a single best answer to this problem. |
Many/much/all of the work described here was completed in other PRs/Issues: sled-agent performs archival of rotated logs for all zones onto U.2 debug dataset I think if there are follow on issues, they should go here: #2478 |
In the 8/16 control plane sync, we discussed the possibility of using https://illumos.org/man/8/coreadm to set a filter to extract core files from crashing non-global zones into the global zone.
Currently, when non-global zone services terminate, Sled Agent stops and deletes the underlying zone. This helps avoid leakage of that resource - we have no further execution-time usage for it - but limits visibility.
By dumping core files into the global zone, we'd be able to inspect errors, even after the zone is destroyed.
The text was updated successfully, but these errors were encountered: