Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sled-agent] Add support for Support Bundle command execution #7011

Merged
merged 5 commits into from
Nov 8, 2024

Conversation

papertigers
Copy link
Contributor

This PR lays the groundwork for executing system commands and returning their output as raw data (both on success and failure) via the internal sled-agent HTTP API. Command execution collects both stdout and stdout in an interleaved fashion so that the command output is exactly how an operator would have seen it if they were sitting at the terminal.

@papertigers
Copy link
Contributor Author

Example output from the two endpoints below. The goal here is to have nexus collect this raw data and dump it out to a file that will end up in the final support bundle.

curl "http://[fd00:1122:3344:101::1]:12345/support/zoneadm-info"
Command executed [/usr/bin/pfexec /usr/sbin/zoneadm list -cip]:
    ==== stdio ====
0:global:running:/::ipkg:shared
15:sidecar_softnpu:running:/sidecar/sidecar_softnpu:a0a3c2bd-9fa2-4663-bfed-1a34df024e3d:omicron1:excl
16:oxz_switch:running:/zone/oxz_switch:6fe611b3-ddad-4bc8-8471-9a045ee7b4f3:omicron1:excl
17:oxz_internal_dns_cb5ce93c-61a8-499f-bd4e-f185581ec3b4:running:/pool/ext/ebba9827-8936-4c65-98c7-efdba9658631/crypt/zone/oxz_internal_dns_cb5ce93c-61a8-499f-bd4e-f185581ec3b4:ac575b3b-f929-4d1e-9a2e-a206520d584c:omicron1:excl
18:oxz_internal_dns_eadc7d5f-849f-4785-a3dc-3429a0b826a6:running:/pool/ext/6ab4058e-231f-4ac1-a830-9277511d4e87/crypt/zone/oxz_internal_dns_eadc7d5f-849f-4785-a3dc-3429a0b826a6:3858a534-53e1-461c-b187-2a059f4830e3:omicron1:excl
19:oxz_internal_dns_ac0c761d-0a75-4123-af33-9868b477c7d3:running:/pool/ext/85b82676-dcde-485c-95aa-348ec75c4429/crypt/zone/oxz_internal_dns_ac0c761d-0a75-4123-af33-9868b477c7d3:f485a901-e876-486b-87f6-85e6f2b31e84:omicron1:excl
20:oxz_ntp_01b5b04a-9a7d-466b-98e1-406573a2fe9a:running:/pool/ext/85b82676-dcde-485c-95aa-348ec75c4429/crypt/zone/oxz_ntp_01b5b04a-9a7d-466b-98e1-406573a2fe9a:aee5e1ce-cb67-402c-9627-7e58bfef1369:omicron1:excl

    ==== exit status ====
exit status: 0
curl "http://[fd00:1122:3344:101::1]:12345/support/ipadm-info"
Command executed [/usr/bin/pfexec /usr/sbin/ipadm show-if]:
    ==== stdio ====
IFNAME     CLASS     STATE    CURRENT      PERSISTENT
lo0        VIRTUAL   ok       -m-v------46 ---
igb0       IP        ok       bm--------46 ---
cxgbe0     IP        failed   bm--------46 ---
cxgbe1     IP        failed   bm--------46 ---
fake_external0 IP    ok       bm--------46 ---
net0       IP        ok       bm--------46 ---
net1       IP        ok       bm--------46 ---
bootstrap0 IP        ok       bm--------46 ---
underlay0  IP        ok       bm--------46 ---

    ==== exit status ====
exit status: 0


Command executed [/usr/bin/pfexec /usr/sbin/ipadm show-addr]:
    ==== stdio ====
ADDROBJ           TYPE     STATE        ADDR
lo0/v4            static   ok           127.0.0.1/8
igb0/dhcp         dhcp     ok           172.20.2.193/24
fake_external0/external static ok       192.168.1.199/24
lo0/v6            static   ok           ::1/128
igb0/ll           addrconf ok           fe80::eaea:6aff:fe09:8939%igb0/10
cxgbe0/ll         addrconf inaccessible fe80::aa40:25ff:fe04:1d2%cxgbe0/10
cxgbe1/ll         addrconf inaccessible fe80::aa40:25ff:fe04:1da%cxgbe1/10
net0/ll           addrconf ok           fe80::a0c1:eff:febf:64e0%net0/10
net1/ll           addrconf ok           fe80::b8b5:b9ff:fe24:3715%net1/10
bootstrap0/ll     addrconf ok           fe80::8:20ff:feb5:3c67%bootstrap0/10
bootstrap0/bootstrap6 static ok         fdb0:e8ea:6a09:8939::1/64
underlay0/ll      addrconf ok           fe80::8:20ff:fe5d:6b3b%underlay0/10
underlay0/sled6   static   ok           fd00:1122:3344:101::1/64
underlay0/internaldns1 static ok        fd00:1122:3344:2::2/64
underlay0/internaldns0 static ok        fd00:1122:3344:1::2/64
underlay0/internaldns2 static ok        fd00:1122:3344:3::2/64

    ==== exit status ====
exit status: 0


Command executed [/usr/bin/pfexec /usr/sbin/ipadm show-prop]:
    ==== stdio ====
PROTO PROPERTY              PERM CURRENT      PERSISTENT   DEFAULT      POSSIBLE
ipv4  forwarding            rw   on           on           off          on,off
ipv4  ttl                   rw   255          --           255          1-255
ipv6  forwarding            rw   on           on           off          on,off
ipv6  hoplimit              rw   255          --           255          1-255
ipv6  hostmodel             rw   weak         --           weak         strong,
                                                                        src-priority,
                                                                        weak
ipv4  hostmodel             rw   weak         --           weak         strong,
                                                                        src-priority,
                                                                        weak
icmp  max_buf               rw   262144       --           262144       65536-1073741824
icmp  recv_buf              rw   8192         --           8192         4096-262144
icmp  send_buf              rw   8192         --           8192         4096-262144
tcp   congestion_control    rw   sunreno      --           sunreno      sunreno,newreno,
                                                                        cubic
tcp   ecn                   rw   passive      --           passive      never,passive,
                                                                        active
tcp   extra_priv_ports      rw   2049,4045    --           2049,4045    1-65535
tcp   largest_anon_port     rw   65535        --           65535        1024-65535
tcp   max_buf               rw   1048576      --           1048576      8192-1073741824
tcp   recv_buf              rw   128000       --           128000       2048-1048576
tcp   sack                  rw   active       --           active       never,passive,
                                                                        active
tcp   send_buf              rw   49152        --           49152        4096-1048576
tcp   smallest_anon_port    rw   32768        --           32768        1024-65535
tcp   smallest_nonpriv_port rw   1024         --           1024         1024-32768
udp   extra_priv_ports      rw   2049,4045    --           2049,4045    1-65535
udp   largest_anon_port     rw   65535        --           65535        1024-65535
udp   max_buf               rw   2097152      --           2097152      65536-1073741824
udp   recv_buf              rw   57344        --           57344        128-2097152
PROTO PROPERTY              PERM CURRENT      PERSISTENT   DEFAULT      POSSIBLE
udp   send_buf              rw   57344        --           57344        1024-2097152
udp   smallest_anon_port    rw   32768        --           32768        1024-65535
udp   smallest_nonpriv_port rw   1024         --           1024         1024-32768
sctp  extra_priv_ports      rw   2049,4045    --           2049,4045    1-65535
sctp  largest_anon_port     rw   65535        --           65535        1024-65535
sctp  max_buf               rw   1048576      --           1048576      8192-1073741824
sctp  recv_buf              rw   102400       --           102400       8192-1048576
sctp  send_buf              rw   102400       --           102400       8192-1048576
sctp  smallest_anon_port    rw   32768        --           32768        1024-65535
sctp  smallest_nonpriv_port rw   1024         --           1024         1024-32768

    ==== exit status ====
exit status: 0

@papertigers papertigers requested a review from smklein November 7, 2024 22:26
Comment on lines +790 to +794
.into_iter()
.map(|cmd| cmd.get_output())
.collect::<Vec<_>>()
.as_slice()
.join("\n\n");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we joining these with \n\n when we aren't doing so for the zoneadm command?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the spacing between multiple commands as the ipadm variant currently runs 3 commands. I can change it to a single new line if you would like, I just thought giving the output some space was easier on the eyes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, gotcha, didn't realize the outputs were also different here (Vec of Results vs one Result). This seems fine!

sled-agent/src/support_bundle.rs Outdated Show resolved Hide resolved
sled-agent/src/support_bundle.rs Outdated Show resolved Hide resolved
sled-agent/src/support_bundle.rs Outdated Show resolved Hide resolved
Comment on lines 239 to 242
let mut command = Command::new("sleep");
command.env_clear().arg("10");

match execute_command_with_timeout(command, Duration::from_secs(5))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we use smaller timeouts here? just would be nice to shorten the latency of this test to < 5 seconds

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Comment on lines +250 to +290
#[tokio::test]
async fn test_command_stdout_is_correct() {
let mut command = Command::new("echo");
command.env_clear().arg("No Cables. No Assembly. Just Cloud.");

let res = execute_command_with_timeout(command, Duration::from_secs(5))
.await
.unwrap();
let expected_output =
"No Cables. No Assembly. Just Cloud.\n".to_string();
assert_eq!(expected_output, res.stdio);
}

#[tokio::test]
async fn test_command_stderr_is_correct() {
let mut command = Command::new("bash");
command.env_clear().args(&["-c", "echo oxide computer > /dev/stderr"]);

let res = execute_command_with_timeout(command, Duration::from_secs(5))
.await
.unwrap();
let expected_output = "oxide computer\n".to_string();
assert_eq!(expected_output, res.stdio);
}

#[tokio::test]
async fn test_command_stdout_stderr_are_interleaved() {
let mut command = Command::new("bash");
command.env_clear().args(&[
"-c",
"echo one > /dev/stdout \
&& echo two > /dev/stderr \
&& echo three > /dev/stdout",
]);

let res = execute_command_with_timeout(command, Duration::from_secs(5))
.await
.unwrap();
let expected_output = "one\ntwo\nthree\n".to_string();
assert_eq!(expected_output, res.stdio);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice tests!

@papertigers papertigers requested a review from smklein November 8, 2024 20:51
sled-agent/src/support_bundle.rs Outdated Show resolved Hide resolved
}

/// Takes a given `Command` and returns a lossy representation of the program
// and it's arguments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

teensy obnoxious copyedit: "it's" is an abbreviation for "it is", while "its" means "belonging to it)

Suggested change
// and it's arguments.
// and its arguments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c3b5c52

Comment on lines 61 to 69
// Grab the command arguments
let mut res = command
.get_args()
.map(|a| a.to_string_lossy().into())
.collect::<Vec<String>>();

// Grab the command itself
res.insert(0, command.get_program().to_string_lossy().into());
res.join(" ")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: performance probably does not matter here, but we are doing this basically the least efficient way possible with regards to allocations:

  • to_string_lossy() returns a Cow, because it allocates a new String only if it's invalid UTF8, which is unlikely. but calling calling Into<String> on that Cow forces an allocation unconditionally.
  • we collect those strings, allocating a Vec.
  • Vec::insert at 0 forces the whole Vec to be copied one index forward to make space for the inserted item --- if we instead used iter::chain() or something to just put the program name first, we wouldn't have to shift the entire contents of the rest of the Vec's contents
  • Vec::join allocates a new string, so all the previous string allocations that we made during this function are now deallocated --- most of them didn't really need heap allocations

personally, i might consider doing something like this:

Suggested change
// Grab the command arguments
let mut res = command
.get_args()
.map(|a| a.to_string_lossy().into())
.collect::<Vec<String>>();
// Grab the command itself
res.insert(0, command.get_program().to_string_lossy().into());
res.join(" ")
// Grab the command itself
let mut res = command.get_program().to_string_lossy().into();
// Grab the command arguments
for arg in command.get_args() {
let arg = arg.to_string_lossy();
write!(&mut res, " {arg}").expect("write! to strings never fails");
}
res

which is maybe less attractive than Vec::join, but doesn't create a temporary allocation for each argument and for a Vec that all end up getting thrown away --- instead, everything is written directly to the returned String.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation! The insert at offset 0 was going to cause everything to shift and I wasn't too worried about the allocations here but overall your suggestion is much nicer and more efficient!!

cmd: Command,
oneshot: oneshot::Sender<()>,
) -> Result<SupportBundleCmdOutput, SupportBundleCmdError> {
let cmd_string = command_to_string(&cmd);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, take it or leave it: AFAICT this is only ever called when formatting an error message, so perhaps we should call it lazily in the various map_err closures, instead of doing it unconditionally at the top of the function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I originally intended to do this but Command::from consumes the StdCommand, but in the end we store it in the SupportBundleCmdOutput struct anyways so it get's used regardless of an error.

res.join(" ")
}

/// Spawn a command asynchronously and collect it's output by interleaving
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Spawn a command asynchronously and collect it's output by interleaving
/// Spawn a command asynchronously and collect its output by interleaving

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c3b5c52

sled-agent/src/support_bundle.rs Outdated Show resolved Hide resolved
@papertigers papertigers requested a review from hawkw November 8, 2024 21:59
Copy link
Member

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@papertigers papertigers enabled auto-merge (squash) November 8, 2024 22:40
@papertigers papertigers merged commit 463a368 into main Nov 8, 2024
17 checks passed
@papertigers papertigers deleted the mike/support-bundle-commands branch November 8, 2024 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants