Skip to content

Commit

Permalink
Adding automatic bundle on zone death (#3829)
Browse files Browse the repository at this point in the history
- Moves zone bundle code to free functions in its own module, out of the
`ServiceManager` itself.
- Adds handling of Propolis zones, by reworking the locking around the
instance manager.
- Adds sled-agent endpoint for listing all zone bundles, even those not
corresponding to an existing zone.
- Adds a "cause" to the zone bundle metadata, indicating why it was
created.
- Some QoL improvements to `zone-bundle`, allowing listing bundles from
zones matching a filter (or all), along with parseable output.
- Improves robustness of extracting `GATEWAY_MAC` from the ARP entries
for the provided `GATEWAY_IP`, and adds warning if the proxy-arp entries
are not provided.
- Extracts log files which may have been archived to a U.2 as well as
the M.2-local log files
- Adds basic mechanism for running zone-specific commands. Not used yet.
  • Loading branch information
bnaecker authored Aug 9, 2023
1 parent 28a6504 commit 9b1867b
Show file tree
Hide file tree
Showing 17 changed files with 1,422 additions and 591 deletions.
41 changes: 22 additions & 19 deletions docs/how-to-run.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -172,7 +172,7 @@ The rest of these instructions assume that you're building and running Omicron o
The Sled Agent supports operation on both:

* a Gimlet (i.e., real Oxide hardware), and
* an ordinary PC that's been set up to look like a Gimlet using the `./tools_create_virtual_hardware.sh` script.
* an ordinary PC that's been set up to look like a Gimlet using the `./tools/create_virtual_hardware.sh` script.

This script also sets up a "softnpu" zone to implement Boundary Services. SoftNPU simulates the Tofino device that's used in real systems. Just like Tofino, it can implement sled-to-sled networking, but that's beyond the scope of this doc.

Expand Down Expand Up @@ -373,7 +373,9 @@ $ dig recovery.sys.oxide.test @192.168.1.20 +short
192.168.1.21
----

Where did 192.168.1.20 come from? That's the external address of the external DNS server. We knew that only because it's the first address in the "internal services" IP pool in config-rss.toml.
Where did 192.168.1.20 come from? That's the external address of the external
DNS server. We knew that because it's listed in the `external_dns_ips` entry of
the `config-rss.toml` file we're using.

Having looked this up, the easiest thing will be to use `http://192.168.1.21` for your URL (replacing with `https` if you used a certificate, and replacing that IP if needed). If you've set up networking right, you should be able to reach this from your web browser. You may have to instruct the browser to accept a self-signed TLS certificate. See also <<_connecting_securely_with_tls_using_the_cli>>.

Expand All @@ -392,12 +394,19 @@ An IP pool is needed to provide external connectivity to Instances. The address

[source,console]
----
$ oxide api /v1/system/ip-pools/default/ranges/add --method POST --input - <<EOF
{
"first": "192.168.1.31",
"last": "192.168.1.40"
$ oxide ip-pool range add --pool default --first 192.168.1.31 --last 192.168.1.40
success
IpPoolRange {
id: 4a61e65a-d96d-4c56-9cfd-dc1e44d9e99b,
ip_pool_id: 1b1289a7-cefe-4a7e-a8c9-d93330846301,
range: V4(
Ipv4Range {
first: 192.168.1.31,
last: 192.168.1.40,
},
),
time_created: 2023-08-02T16:31:43.679785Z,
}
EOF
----

With SoftNPU you will generally also need to configure Proxy ARP. Below, `IP_POOL_START` and `IP_POOL_END` are the first and last addresses you used in the previous command:
Expand Down Expand Up @@ -435,11 +444,6 @@ $ oxide api /v1/images?project=myproj --method POST --input - <<EOF
{
"name": "alpine",
"description": "boot from propolis zone blob!",
"block_size": 512,
"distribution": {
"name": "alpine",
"version": "propolis-blob"
},
"os": "linux",
"version": "1",
"source": {
Expand All @@ -457,22 +461,21 @@ $ oxide api /v1/images --method POST --input - <<EOF
{
"name": "crucible-tester-sparse",
"description": "boot from a url!",
"block_size": 512,
"distribution": {
"name": "debian",
"version": "9"
},
"os": "debian",
"version": "9",
"source": {
"type": "url",
"url": "http://[fd00:1122:3344:101::15]/crucible-tester-sparse.img"
"url": "http://[fd00:1122:3344:101::15]/crucible-tester-sparse.img",
"block_size": 512
}
}
EOF
----

=== Provision an instance using the CLI

You'll need the id `$IMAGE_ID` of the image you just created.
You'll need the id `$IMAGE_ID` of the image you just created. You can fetch that
with `oxide image view --image $IMAGE_NAME`.

Now, create a Disk from that Image. The disk size must be a multiple of 1 GiB and at least as large as the image size. The example below creates a disk using the image made from the alpine ISO that ships with propolis, and sets the size to the next 1GiB multiple of the original alpine source:

Expand Down
11 changes: 9 additions & 2 deletions illumos-utils/src/running_zone.rs
Original file line number Diff line number Diff line change
Expand Up @@ -933,11 +933,10 @@ impl RunningZone {

/// Return the names of the Oxide SMF services this zone is intended to run.
pub fn service_names(&self) -> Result<Vec<String>, ServiceError> {
const NEEDLES: [&str; 2] = ["/oxide", "/system/illumos"];
let output = self.run_cmd(&["svcs", "-H", "-o", "fmri"])?;
Ok(output
.lines()
.filter(|line| NEEDLES.iter().any(|needle| line.contains(needle)))
.filter(|line| is_oxide_smf_log_file(line))
.map(|line| line.trim().to_string())
.collect())
}
Expand Down Expand Up @@ -1191,3 +1190,11 @@ impl InstalledZone {
path
}
}

/// Return true if the named file appears to be a log file for an Oxide SMF
/// service.
pub fn is_oxide_smf_log_file(name: impl AsRef<str>) -> bool {
const SMF_SERVICE_PREFIXES: [&str; 2] = ["/oxide", "/system/illumos"];
let name = name.as_ref();
SMF_SERVICE_PREFIXES.iter().any(|needle| name.contains(needle))
}
117 changes: 115 additions & 2 deletions openapi/sled-agent.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,34 @@
"version": "0.0.1"
},
"paths": {
"/all-zone-bundles": {
"get": {
"summary": "List all zone bundles that exist, even for now-deleted zones.",
"operationId": "zone_bundle_list_all",
"responses": {
"200": {
"description": "successful operation",
"content": {
"application/json": {
"schema": {
"title": "Array_of_ZoneBundleMetadata",
"type": "array",
"items": {
"$ref": "#/components/schemas/ZoneBundleMetadata"
}
}
}
}
},
"4XX": {
"$ref": "#/components/responses/Error"
},
"5XX": {
"$ref": "#/components/responses/Error"
}
}
}
},
"/cockroachdb": {
"post": {
"summary": "Initializes a CockroachDB cluster",
Expand Down Expand Up @@ -528,7 +556,7 @@
},
"/zones/{zone_name}/bundles": {
"get": {
"summary": "List the zone bundles that are current available for a zone.",
"summary": "List the zone bundles that are available for a running zone.",
"operationId": "zone_bundle_list",
"parameters": [
{
Expand Down Expand Up @@ -639,6 +667,42 @@
"$ref": "#/components/responses/Error"
}
}
},
"delete": {
"summary": "Delete a zone bundle.",
"operationId": "zone_bundle_delete",
"parameters": [
{
"in": "path",
"name": "bundle_id",
"description": "The ID for this bundle itself.",
"required": true,
"schema": {
"type": "string",
"format": "uuid"
}
},
{
"in": "path",
"name": "zone_name",
"description": "The name of the zone this bundle is derived from.",
"required": true,
"schema": {
"type": "string"
}
}
],
"responses": {
"204": {
"description": "successful deletion"
},
"4XX": {
"$ref": "#/components/responses/Error"
},
"5XX": {
"$ref": "#/components/responses/Error"
}
}
}
},
"/zpools": {
Expand Down Expand Up @@ -2654,6 +2718,39 @@
"vni"
]
},
"ZoneBundleCause": {
"description": "The reason or cause for a zone bundle, i.e., why it was created.",
"oneOf": [
{
"description": "Generated in response to an explicit request to the sled agent.",
"type": "string",
"enum": [
"explicit_request"
]
},
{
"description": "A zone bundle taken when a sled agent finds a zone that it does not expect to be running.",
"type": "string",
"enum": [
"unexpected_zone"
]
},
{
"description": "An instance zone was terminated.",
"type": "string",
"enum": [
"terminated_instance"
]
},
{
"description": "Some other, unspecified reason.",
"type": "string",
"enum": [
"other"
]
}
]
},
"ZoneBundleId": {
"description": "An identifier for a zone bundle.",
"type": "object",
Expand All @@ -2677,6 +2774,14 @@
"description": "Metadata about a zone bundle.",
"type": "object",
"properties": {
"cause": {
"description": "The reason or cause a bundle was created.",
"allOf": [
{
"$ref": "#/components/schemas/ZoneBundleCause"
}
]
},
"id": {
"description": "Identifier for this zone bundle",
"allOf": [
Expand All @@ -2689,11 +2794,19 @@
"description": "The time at which this zone bundle was created.",
"type": "string",
"format": "date-time"
},
"version": {
"description": "A version number for this zone bundle.",
"type": "integer",
"format": "uint8",
"minimum": 0
}
},
"required": [
"cause",
"id",
"time_created"
"time_created",
"version"
]
},
"ZoneType": {
Expand Down
51 changes: 50 additions & 1 deletion schema/zone-bundle-metadata.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,10 +4,20 @@
"description": "Metadata about a zone bundle.",
"type": "object",
"required": [
"cause",
"id",
"time_created"
"time_created",
"version"
],
"properties": {
"cause": {
"description": "The reason or cause a bundle was created.",
"allOf": [
{
"$ref": "#/definitions/ZoneBundleCause"
}
]
},
"id": {
"description": "Identifier for this zone bundle",
"allOf": [
Expand All @@ -20,9 +30,48 @@
"description": "The time at which this zone bundle was created.",
"type": "string",
"format": "date-time"
},
"version": {
"description": "A version number for this zone bundle.",
"type": "integer",
"format": "uint8",
"minimum": 0.0
}
},
"definitions": {
"ZoneBundleCause": {
"description": "The reason or cause for a zone bundle, i.e., why it was created.",
"oneOf": [
{
"description": "Generated in response to an explicit request to the sled agent.",
"type": "string",
"enum": [
"explicit_request"
]
},
{
"description": "A zone bundle taken when a sled agent finds a zone that it does not expect to be running.",
"type": "string",
"enum": [
"unexpected_zone"
]
},
{
"description": "An instance zone was terminated.",
"type": "string",
"enum": [
"terminated_instance"
]
},
{
"description": "Some other, unspecified reason.",
"type": "string",
"enum": [
"other"
]
}
]
},
"ZoneBundleId": {
"description": "An identifier for a zone bundle.",
"type": "object",
Expand Down
Loading

0 comments on commit 9b1867b

Please sign in to comment.