-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds service bundles for zones #3388
Conversation
This is a first-cut at helping to resolve #1598. It provides the basic methods for creating, listing, and fetching "zone bundles" -- the state of an Oxide-managed zone at some point in time. In response to a client request (and notably, not in any automated way), it creates a tarball that contains:
I've also added a rudimentary quota to the dataset storing this information, of 100GiB currently. That's a wild guess, and no special handling is done when this space fills up. There are a few things I like about this. I think the code for packaging up the zone state is useful, even if we exercise it through a different mechanism, like an automated collection system or when zones are destroyed. I believe there are folks who would current like the ability to take a zone bundle on demand; @askfongjojo and @gjcolombo have specifically asked for ways to do this. There are definitely drawbacks and next steps:
|
There aren't a whole lot of ways I can test this code, so I'm including a bunch of testing notes. This is all from me running on my developer machine, the real sled agent on a non-Gimlet system. First we can ask the sled-agent what zones are currently running: bnaecker@shale : ~/omicron $ ./target/debug/zb --host fd00:1122:3344:101::1 list-zones
oxz_clickhouse_oxp_f4b4dc87-ab46-49fb-a4b4-d361ae214c03
oxz_cockroachdb_oxp_f4b4dc87-ab46-49fb-a4b4-d361ae214c03
oxz_crucible_oxp_d462a7f7-b628-40fe-80ff-4e4189e2d62b
oxz_crucible_oxp_e4b4dc87-ab46-49fb-a4b4-d361ae214c03
oxz_crucible_oxp_f4b4dc87-ab46-49fb-a4b4-d361ae214c03
oxz_crucible_pantry
oxz_external_dns
oxz_internal_dns
oxz_nexus
oxz_ntp
oxz_oximeter
oxz_switch Then we can list zone bundles for a specific zone: bnaecker@shale : ~/omicron $ ./target/debug/zb --host fd00:1122:3344:101::1 ls oxz_switch
oxz_switch/2b6c6b4d-bf8c-4e5e-b230-351e9183c631 I've taken one zone bundle already, which is what we see. We can fetch that bundle directly (which is a glorified bnaecker@shale : ~/omicron $ ./target/debug/zb --host fd00:1122:3344:101::1 get --bundle-id 2b6c6b4d-bf8c-4e5e-b230-351e9183c631 oxz_switch
bnaecker@shale : ~/omicron $ tar tzf 2b6c6b4d-bf8c-4e5e-b230-351e9183c631.tar.gz
metadata
ptree
uptime
last
who
svcs
netstat
pfiles.20255
pstack.20255
pargs.20255
system-illumos-dendrite:default.log
pfiles.20263
pstack.20263
pargs.20263
oxide-mgs:default.log
pfiles.20281
pstack.20281
pargs.20281
system-illumos-mg-ddm:default.log
pfiles.20273
pstack.20273
pargs.20273
oxide-wicketd:default.log We've got one file per kind of data, with a bnaecker@shale : ~/omicron $ tar xzf 2b6c6b4d-bf8c-4e5e-b230-351e9183c631.tar.gz metadata
bnaecker@shale : ~/omicron $ cat metadata
time_created = "2023-06-20T23:24:00.986843053Z"
[id]
zone_name = "oxz_switch"
bundle_id = "2b6c6b4d-bf8c-4e5e-b230-351e9183c631" Each command file lists the command at the top, then the output itself: bnaecker@shale : ~/omicron $ tar xzf 2b6c6b4d-bf8c-4e5e-b230-351e9183c631.tar.gz ptree
bnaecker@shale : ~/omicron $ cat ptree
Command: ["ptree"]
19387 /sbin/init
19394 /lib/svc/bin/svc.startd
19993 /usr/lib/saf/ttymon -g -d /dev/console -l console -m ldterm,ttcompat -
19396 /lib/svc/bin/svc.configd
19436 /lib/inet/netcfgd
19439 /lib/inet/ipmgmtd
19698 /usr/lib/fm/fmd/fmd
19753 /usr/lib/pfexecd
19925 /usr/sbin/nscd
19978 /usr/lib/utmpd
19994 /usr/sbin/syslogd
20002 /usr/sbin/cron
20234 /usr/lib/inet/in.ripngd -s
20255 /opt/oxide/dendrite/bin/dpd run
20263 /opt/oxide/mgs/bin/mgs run --id-and-address-from-smf /var/svc/manifest/s
20272 ctrun -l child -o noorphan,regent /opt/oxide/wicketd/bin/wicketd run /va
20273 /opt/oxide/wicketd/bin/wicketd run /var/svc/manifest/site/wicketd/conf
20280 ctrun -l child -o noorphan,regent /opt/oxide/mg-ddm/pkg/ddm_method_scrip
20281 /opt/oxide/mg-ddm/bin/ddmd --admin-port 8000 --admin-addr :: --kind tr
20328 /sbin/dhcpagent
20622 /usr/lib/inet/in.ndpd
28242 ptree Hopefully this lets us parse things with normal tools with the minimum of fuss. You can create a new bundle with: bnaecker@shale : ~/omicron $ ./target/debug/zb --host fd00:1122:3344:101::1 create oxz_switch
Created zone bundle: oxz_switch/057cedd1-5fc4-422a-a110-9f3c37ef4dc5 In the sled-agent logs, we can see what's going on:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Truly superb work, this will be a boon to have.
I think the follow-ups you mentioned would be great next steps, with priority on:
- Prioritizing auto-collection for destroyed zones
- Removal of old bundles
3894ecd
to
94272a2
Compare
- Adds a dataset to the M.2s for storing debugging data. - Adds basic mechanism for setting a ZFS quota on datasets. - Adds HTTP endpoints for listing, creating, and fetching zone service bundles from the sled agent. - Adds methods to `ServiceManager` for implementing the above. Zone bundles run a set of commands to get the zone-wide output and some key process-specific data for relevant processes from an Oxide service zone. These are packed into a tarball along with a simple metdata file, describing the zone bundle. - Adds some helper methods in `RunningZone` and related for listing the expected SMF service names and processes associated with them based on the zone's manifest files. - Adds dev tool `zb` for talking to the sled agent to operate on zone bundles.
- mv zb.rs -> zone-bundle.rs - Add TOML extension to zone bundle metadata file - Return 404 on bad zone name - Typos, safety notes, and link to logadm(8)
94272a2
to
41806ce
Compare
ServiceManager
for implementing the above. Zone bundles run a set of commands to get the zone-wide output and some key process-specific data for relevant processes from an Oxide service zone. These are packed into a tarball along with a simple metdata file, describing the zone bundle.RunningZone
and related for listing the expected SMF service names and processes associated with them based on the zone's manifest files.zb
for talking to the sled agent to operate on zone bundles.