-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
omdb
support for showing devices visible to MGS
#4162
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Just some small nitpicks.
let c = &mgs_client; | ||
let mut sp_infos = | ||
futures::stream::iter(sp_list_ignition.iter().filter_map(|ignition| { | ||
if matches!(ignition.details, SpIgnition::Yes { .. }) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little unsure about using ignition as a filter on which SPs to ask for state, because at least in theory, a problem with ignition could result in ignition claiming there's nothing there but we could still talk to an SP over the management network. Similarly, ignition is fetched from MGS's local sidecar SP; if something is wrong with it, we might get no data back (at which point we'd bail out above anyway, which might or might not be reasonable).
Maybe we could run sp_get
on all configured SPs, and ignore any errors for SPs where ignition also claims the SP is not present? This is also not super critical if you want to punt on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I did this was that when I didn't, it took like a minute or two for the command to complete. I think we wound up waiting for a lot of timeouts? I can see that being a useful option when things are broken but for the common case where everything's healthy and some slots are just unpopulated I thought it should finish faster. (I will double check this behavior in case that's unexpected and there was something else weird going on.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, yeah, that does sound right. I don't think it should've taken a minute or two, but I would definitely believe a few 10s of seconds. A single request to a missing SP should result in roughly 10 seconds to timeout based on our current config:
Lines 21 to 26 in 9aabe2a
# When sending UDP RPC packets to an SP, how many total attempts do we make | |
# before giving up? | |
rpc_max_attempts = 5 | |
# sleep time between UDP RPC resends (up to `rpc_max_attempts`) | |
rpc_per_attempt_timeout_millis = 2000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed on london
just now that it took 7s for MGS to respond to a request for an SP's state when that SP is not present. I think that's not worth trying for the common case so I'd like to punt for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Works for me 👍
dev-tools/omdb/src/bin/omdb/mgs.rs
Outdated
let msg = format!("fetching info about SP {:?}", sp_id); | ||
c.sp_get(sp_id.type_, sp_id.slot) | ||
.await | ||
.context(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nit - as written, we always allocate msg
but then immediately free it if the request succeeds. Could we use
.context(msg) | |
.with_context(|| format!("...")) |
instead, or do we run into problems copying/cloning sp_id
? (That seems like it should be Copy
, I think?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. (This was a remnant of an earlier version where the same message was used in multiple places.)
dev-tools/omdb/src/bin/omdb/mgs.rs
Outdated
} | ||
|
||
/// Comparator for service processor identifiers | ||
fn sp_id_cmp(&s1: &SpIdentifier, s2: &SpIdentifier) -> Ordering { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we derive PartialCmp, Cmp
for SpIdentifier
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
Ok(()) | ||
} | ||
|
||
fn sp_type_to_str(s: &SpType) -> &'static str { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fine, but stylistically I probably would've written this as an extension trait to make the call sites nicer:
trait SpTypeExt {
fn as_str(&self) -> &'static str;
}
impl SpTypeExt for SpType { .. }
dev-tools/omdb/src/bin/omdb/mgs.rs
Outdated
.with_context(|| { | ||
format!( | ||
"get caboose for sp type {:?} sp slot {} \ | ||
component {:?} slot {}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tiny nit - some extra whitespace here, probably from cargo fmt
not realigning \
-broken strings
dev-tools/omdb/src/bin/omdb/mgs.rs
Outdated
} | ||
|
||
const COMPONENTS_WITH_CABOOSES: &'static [&'static str] = | ||
&["sp", "rot", "host-boot-flash"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
host-boot-flash
does not have a caboose; that's a hubris image thing. Maybe we could shove a caboose (or something with similar information) into the host phase 1 images? Not sure what that would look like.
This change adds:
omicron-dev
support for running MGS with a simulated SPomdb
support for querying MGS to see what devices are visible to itI'm honestly not sure how generally useful this is but it's been helpful for me learning the MGS API and it may be useful if we ever find MGS to be reporting surprising data.
(There is also
gateway-cli
, which is capable of a lot more than this tool is. I found it's output a little too hard to read for helping me grok what was in these APIs.)For reference, here's the output on "london" (a more realistic system than what's in the test suite output, which is simulated):