initial inventory for automated update #4291

davepacheco · 2023-10-18T03:23:17Z

This PR implements the first round of hardware/software inventory for automated update. See RFD 433 for background. There's a summary of the new data model in dbinit.sql.

I'm sorry this change is so big. Here are the key pieces:

in nexus/types: type related to software inventory (used in a few places)
in schema/crdb and nexus/db-model: database schema/model described in RFD 433
in nexus/db-queries: datastore queries to insert or delete an entire inventory Collection
in nexus/inventory: new crate with Collector and builder interface. This crate only collects inventory -- it doesn't do anything with the database.
in nexus/src/app/background: a new background task that uses these other pieces to collect inventory, write it to the database, and clean up old collections
omdb support for showing inventory data from the database
implementation of all this for service processors, roots of trust, and cabooses, all fetched from MGS

What's not here (and will be in future PRs, not this one):

any other hardware/software inventory, including anything from sled agent: host OS versions, physical disks, Omicron zones, other hardware, etc.
Nothing uses any of this data yet.

Some other stuff came along for the ride here. I'm happy to separate these if that's useful but they're each pretty small:

omicron-dev run-all, as well as all tests that set up a ControlPlaneTestContext, now run a Management Gateway Service backed by the same simulated SPs used in the existing MGS tests. This was easy to do, convenient for future inventory work, and it was necessary to test the omdb changes.
I noticed while debugging some SQL queries that omdb does not call usdt::register_probes(), so we don't have (for example) the diesel-dtrace probes in omdb. I added a call in pool.rs to cover these. This isn't quite once per process, but it's close, and ensures that anybody who uses our database layer will get these probes. This was a one line change.
In datastore/mod.rs, I removed pool_authorized() because it was non-pub and there was only one caller and I found the name confusing.

Here's some example output. I used omicron-dev run-all to get everything going:

$ cargo run --bin=omicron-dev -- run-all
...
    Finished dev [unoptimized + debuginfo] target(s) in 4m 36s
     Running `target/debug/omicron-dev run-all`
omicron-dev: setting up all services ...
log file: /dangerzone/omicron_tmp/omicron-dev-omicron-dev.20844.0.log
note: configured to log to "/dangerzone/omicron_tmp/omicron-dev-omicron-dev.20844.0.log"
DB URL: postgresql://root@[::1]:49896/omicron?sslmode=disable
DB address: [::1]:49896
log file: /dangerzone/omicron_tmp/omicron-dev-omicron-dev.20844.1.log
note: configured to log to "/dangerzone/omicron_tmp/omicron-dev-omicron-dev.20844.1.log"
omicron-dev: services are running.
omicron-dev: nexus external API:    127.0.0.1:12220
omicron-dev: nexus internal API:    [::1]:12221
omicron-dev: cockroachdb pid:       21010
omicron-dev: cockroachdb URL:       postgresql://root@[::1]:49896/omicron?sslmode=disable
omicron-dev: cockroachdb directory: /dangerzone/omicron_tmp/.tmpvU7h5N
omicron-dev: internal DNS HTTP:     http://[::1]:56273
omicron-dev: internal DNS:          [::1]:37290
omicron-dev: external DNS name:     oxide-dev.test
omicron-dev: external DNS HTTP:     http://[::1]:36365
omicron-dev: external DNS:          [::1]:55005
omicron-dev:   e.g. `dig @::1 -p 55005 test-suite-silo.sys.oxide-dev.test`
omicron-dev: management gateway:    http://[::1]:57254
omicron-dev: silo name:             test-suite-silo
omicron-dev: privileged user name:  test-privileged

Here's omdb nexus background-tasks show for the new "inventory" task:

$ export OMDB_DNS_SERVER=[::1]:37290
$ omdb nexus background-tasks show
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12221
...
task: "inventory_collection"
  configured period: every 10m
  currently executing: no
  last completed activation: iter 3, triggered by an explicit signal
    started at 2023-10-18T03:12:39.479Z (139s ago) and ran for 210ms
    last collection id:      ac9edf3e-c420-4fa1-ae05-547f05d60574
    last collection started: 2023-10-18T03:12:39Z
    last collection done:    2023-10-18T03:12:39Z

Here's using omdb to poke around the inventory data:

$ omdb db inventory
Print information about collected hardware/software inventory

Usage: omdb db inventory <COMMAND>

Commands:
  baseboard-ids  list all baseboards ever found
  cabooses       list all cabooses ever found
  collections    list and show details from particular collections
  help           Print this message or the help of the given subcommand(s)

Options:
  -h, --help  Print help

$ omdb db inventory baseboard-ids
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using database URL postgresql://root@[::1]:49896/omicron?sslmode=disable
note: database schema version matches expected (7.0.0)
ID                                   PART_NUMBER      SERIAL_NUMBER 
a2418980-2188-4613-9787-df7d96ee94eb FAKE_SIM_GIMLET  SimGimlet00   
9684c118-6bae-461d-bbee-005265aee0f2 FAKE_SIM_GIMLET  SimGimlet01   
e4350b7a-49d2-4858-b4f2-74e52decfa80 FAKE_SIM_SIDECAR SimSidecar0   
c3f5ce3a-a536-44dd-b699-7354e778494d FAKE_SIM_SIDECAR SimSidecar1   

$ omdb db inventory cabooses
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using database URL postgresql://root@[::1]:49896/omicron?sslmode=disable
note: database schema version matches expected (7.0.0)
ID                                   BOARD         GIT_COMMIT NAME       VERSION 
0f14c32f-9e59-44bd-a842-d99c499ff99c SimGimletSp   ffffffff   SimGimlet  0.0.1   
1cee4409-9021-42d3-b742-cf1a21c01918 SimSidecarRot eeeeeeee   SimSidecar 0.0.1   
2e7545b4-e4b2-4060-ade7-247e6ce64a70 SimSidecarSp  ffffffff   SimSidecar 0.0.1   
95d067e3-daf6-475e-8b01-01f6b93c1dd6 SimGimletRot  eeeeeeee   SimGimlet  0.0.1   

$ omdb db inventory collections
list and show details from particular collections

Usage: omdb db inventory collections <COMMAND>

Commands:
  list  list collections
  show  show what was found in a particular collection
  help  Print this message or the help of the given subcommand(s)

Options:
  -h, --help  Print help

$ omdb db inventory collections list
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using database URL postgresql://root@[::1]:49896/omicron?sslmode=disable
note: database schema version matches expected (7.0.0)
ID                                   STARTED              TOOK  NSPS NERRORS 
ac9edf3e-c420-4fa1-ae05-547f05d60574 2023-10-18T03:12:39Z 17 ms 4    0       

$ omdb db inventory collections show ac9edf3e-c420-4fa1-ae05-547f05d60574
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using database URL postgresql://root@[::1]:49896/omicron?sslmode=disable
note: database schema version matches expected (7.0.0)
collection: ac9edf3e-c420-4fa1-ae05-547f05d60574
collector:  e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (likely a Nexus instance)
started:    2023-10-18T03:12:39.543Z
done:       2023-10-18T03:12:39.560Z
errors:     0

Sled SimGimlet00
    part number: FAKE_SIM_GIMLET
    power:    A2
    revision: 0
    MGS slot: Sled 0 (cubby 0)
    found at: 2023-10-18 03:12:39.553986 UTC from http://[::1]:57254
    cabooses:
        SLOT       BOARD        NAME      VERSION GIT_COMMIT 
         SP slot 0 SimGimletSp  SimGimlet 0.0.1   ffffffff   
         SP slot 1 SimGimletSp  SimGimlet 0.0.1   ffffffff   
        RoT slot A SimGimletRot SimGimlet 0.0.1   eeeeeeee   
        RoT slot B SimGimletRot SimGimlet 0.0.1   eeeeeeee   
    RoT: active slot: slot A
    RoT: persistent boot preference: slot A
    RoT: pending persistent boot preference: -
    RoT: transient boot preference: -
    RoT: slot A SHA3-256: -
    RoT: slot B SHA3-256: -

Sled SimGimlet01
    part number: FAKE_SIM_GIMLET
    power:    A2
    revision: 0
    MGS slot: Sled 1 (cubby 1)
    found at: 2023-10-18 03:12:39.557752 UTC from http://[::1]:57254
    cabooses:
        SLOT       BOARD        NAME      VERSION GIT_COMMIT 
         SP slot 0 SimGimletSp  SimGimlet 0.0.1   ffffffff   
         SP slot 1 SimGimletSp  SimGimlet 0.0.1   ffffffff   
        RoT slot A SimGimletRot SimGimlet 0.0.1   eeeeeeee   
        RoT slot B SimGimletRot SimGimlet 0.0.1   eeeeeeee   
    RoT: active slot: slot A
    RoT: persistent boot preference: slot A
    RoT: pending persistent boot preference: -
    RoT: transient boot preference: -
    RoT: slot A SHA3-256: -
    RoT: slot B SHA3-256: -

Switch SimSidecar0
    part number: FAKE_SIM_SIDECAR
    power:    A2
    revision: 0
    MGS slot: Switch 0
    found at: 2023-10-18 03:12:39.546153 UTC from http://[::1]:57254
    cabooses:
        SLOT       BOARD         NAME       VERSION GIT_COMMIT 
         SP slot 0 SimSidecarSp  SimSidecar 0.0.1   ffffffff   
         SP slot 1 SimSidecarSp  SimSidecar 0.0.1   ffffffff   
        RoT slot A SimSidecarRot SimSidecar 0.0.1   eeeeeeee   
        RoT slot B SimSidecarRot SimSidecar 0.0.1   eeeeeeee   
    RoT: active slot: slot A
    RoT: persistent boot preference: slot A
    RoT: pending persistent boot preference: -
    RoT: transient boot preference: -
    RoT: slot A SHA3-256: -
    RoT: slot B SHA3-256: -

Switch SimSidecar1
    part number: FAKE_SIM_SIDECAR
    power:    A2
    revision: 0
    MGS slot: Switch 1
    found at: 2023-10-18 03:12:39.550225 UTC from http://[::1]:57254
    cabooses:
        SLOT       BOARD         NAME       VERSION GIT_COMMIT 
         SP slot 0 SimSidecarSp  SimSidecar 0.0.1   ffffffff   
         SP slot 1 SimSidecarSp  SimSidecar 0.0.1   ffffffff   
        RoT slot A SimSidecarRot SimSidecar 0.0.1   eeeeeeee   
        RoT slot B SimSidecarRot SimSidecar 0.0.1   eeeeeeee   
    RoT: active slot: slot A
    RoT: persistent boot preference: slot A
    RoT: pending persistent boot preference: -
    RoT: transient boot preference: -
    RoT: slot A SHA3-256: -
    RoT: slot B SHA3-256: -

This commit matches commit 581e902 in branch dap/nexus-inventory. I've just rebased the changes onto "main" here.

andrewjstone · 2023-10-18T06:05:07Z

what chicken switches do we want to add?

I first read that as "chicken sandwiches" and was very confused. I'm not even that hungry!

jgallagher · 2023-10-18T20:52:23Z

nexus/inventory/src/collector.rs

+        // if we try to ask MGS about it, we have to wait for MGS to time out
+        // its attempt to reach it (currently several seconds).  This choice
+        // enables inventory to complete much faster, at the expense of not
+        // being able to identify this particular condition.


I wonder if we should still try to query the SPs ignition says aren't present, but on some lower frequency (maybe even a separate background task entirely? and/or in a subsystem that's more related to faults than inventory, since "I can talk to an SP that ignition says isn't there" is definitely abnormal?). I'm nervous about baking in blind spots.

I think that makes sense. I'd like to defer it for now. I don't think making this choice now makes it any harder to do that in the future.

No argument on deferring it. Maybe create an issue after this lands so we don't lose track? Seems like the kind of thing that would only happen on an already-bad day.

jgallagher · 2023-10-19T16:20:08Z

schema/crdb/dbinit.sql

+    PRIMARY KEY (inv_collection_id, hw_baseboard_id)
+);
+
+CREATE TYPE IF NOT EXISTS omicron.public.caboose_which AS ENUM (


We chatted extensively about this; I'll attempt to summarize here:

We don't love this enum; it feels a little goofy.

One alternative is to instead have caboose_slot_0 / caboose_slot_1 foreign keys in inv_sp/inv_rot that refer to rows in inv_caboose. This is a 1-to-at-most-1 relationship, but it allows us to encode in the schema that either all or none of the caboose data is available.

Currently, inv_caboose doesn't have a single primary key, so adding a foreign key to it is awkward at best. We could either add an artificial primary key to inv_caboose, or try to shift things a bit to use sw_caboose_id as a FK.

I went ahead and did this in 58c010f. Honestly, I could go either way on the result. CabooseWhich does feel janky. But it also reflects exactly what we're getting from MGS: that is, each row in this table reflects one response from the get-caboose endpoint, and that essentially represents the parameters to that request. It's kind of a nice property that no row in an inv_* table represents data from multiple collection requests:

it guides the schema design -- there's one table for each kind of observation (with maybe additional tables if there are a bunch of fields that can be present or absent together)

it makes it easy to map the collection request responses to database rows

it makes it easy to have the uniform set of (inv_collection_id, time_collected, source) fields. We do have those here, but it's arguably misleading because the source and time_collected fields on inv_service_processor and inv_root_of_trust don't apply to the caboose fields

This example sounds awfully specific but I feel like there's something general here: imagine if we wanted in the future to update the database as we collect data instead of all at once at the end. We'd have this unfortunate situation of having to either insert an inv_service_processor record and then update it later or else hang onto it (don't insert it) until we've tried to collected all the things that might go into it.

I don't think these are big deals for this particular case. Rather, I came to this after exploring much different ways to structure this -- like an inv_sled table that might include pieces of information from both the sled agent (like the current host OS) and the SP (like the current host flash contents), etc. I really disliked this because in the face of partial failures you have all these partial rows and then everything has to be NULL-able. That's how I got to the "rows should not include data from multiple sources" rule.

Put differently: I don't think this specific violation of that rule is that bad, but without that rule, I found myself spinning in circles for a long time about how to design the schema. It's pretty compelling to just say "each source of observation is a table; each observation is a row; then apply the usual database normalization rules".

All that said, I'm kind of ambivalent in the end. I think I slightly prefer the previous thing with caboose_which but I'm interested in your thoughts.

All the properties you describe about caboose_which make sense. I think I still slightly-to-moderately prefer the changes in 58c010f, but the more I look at it the more I think it's largely a superficial preference. If you start to feel more strongly that you want to go back to caboose_which I could certainly live with that, even if it's just to maintain the schema design guidance.

This feels clearest to me on this issue in particular:

imagine if we wanted in the future to update the database as we collect data instead of all at once at the end. We'd have this unfortunate situation of having to either insert an inv_service_processor record and then update it later or else hang onto it (don't insert it) until we've tried to collected all the things that might go into it.

I think it's the same either way. With caboose_which, partway through a collection insertion we could have rows in inv_sp that do not have corresponding rows in inv_caboose, which feels functionally the same as inv_sp having NULL slotN_inv_caboose_id foreign keys. The latter does mean when we do collect a caboose we have to do an insert+update instead of just an insert, but from a query / data representation point of view, either way you don't know whether the caboose is missing because we couldn't collect it or if it just hasn't been collected yet (absent other info like the presence of an error).

I think it's the same either way.

Yeah, from a representation perspective, that's true. I had in my mind an implicit rule that we wouldn't want to write a record and then update it later in the same operation. But that's somewhat arbitrary, too. (Not doing this does make it more complicated to infer anything from partially-inserted collections, and to measure progress based on what's present, but now we're talking about several layers of hypotheticals that aren't worth dealing with now.)

As I was reading through inv_root_of_trust and inv_service_processor I was wondering where the references to the cabooses were, and then reached this comment thread and the remaining tables. Thinking about this a bit, I think we should stiick with the caboose_which and inv_caboose tables as they are now rather than embedding fields in the sp and rot tables which would require a write + update.

I don't think the slight convenience or aesthetically pleasing look of the sp and rot tables is strong enough to violate the rule of "one collection per source = one row in one table". That's a really powerful thing to allow us to reason about the system and my gut is telling me we'll be happy to have that later.

jgallagher · 2023-10-19T18:23:34Z

nexus/types/src/inventory.rs

+            name: c.name,
+            // The MGS API uses an `Option` here because old SP versions did not
+            // supply it.  But modern SP versions do.  So we should never hit
+            // this `unwrap_or()`.


Should we modify MGS to remove this Option altogether before (or as a part of) this PR? I'm inclined to say "yes"; it's a trivial change in MGS's sp_component_caboose_get endpoint.

Sure, I'll take a swing at that.

jgallagher · 2023-10-19T18:24:38Z

nexus/types/src/inventory.rs

+/// with separate records, even though they might come from the same source
+/// (in this case, a single MGS request).
+///
+/// We make heavy use of maps, sets, and Arcs here because many of these things


This part of the comment makes me nervous, but I think unnecessarily so. If we actually have Arcs pointing to each other, we can end up with undroppable cycles, but after reading over the structs I don't think we do, right? The Arc<T> types in Collection are:

BaseboardId (does not contain any Arcs)

Caboose (does not contain any Arcs)

and then CabooseFound keeps an Arc<Caboose> (which is also fine).

I'll reword this a bit to clarify that no two objects point at "each other". It's more about the fact that some objects are pointed-to by many other things within the Collection.

jgallagher · 2023-10-19T19:29:07Z

nexus/db-queries/src/db/datastore/inventory.rs

+                    // `inv_service_processor` using an explicit list of columns
+                    // and values.  Without the following statement, If a new
+                    // required column were added, this would only fail at
+                    // runtime.


Big 👍 on this comment (and the solution it's describing). Very clear what's going on here.

jgallagher · 2023-10-19T19:57:10Z

nexus/db-queries/src/db/datastore/inventory.rs

+
+        opctx.authorize(authz::Action::Modify, &authz::INVENTORY).await?;
+
+        loop {


Can we get all the collection IDs to delete in a single query instead of looping and having to delete one at a time? This is grossly oversimplified (in particular, I'm putting the error count in directly as a column), but given

create table coll (id int primary key, started timestamp, nerrors int);

a query like this should return all IDs we need to prune:

select id from coll where id not in ( -- keep the 3 most recent collections... (select id from coll order by started desc limit 3) union -- and the single most recent collection that had no errors (if it wasn't -- already saved by the "3 most recent" above) (select id from coll where nerrors = 0 order by started desc limit 1) );

I think this is a promising approach. But I'm a little worried it'll take a while to make this real (first real SQL, then real Diesel code), and also that when we do, we may find the database winds up doing a table scan (or rejecting the SQL because we've configured it to disallow that). As an example: one of the stated assumptions is that the number of collections here could be huge. In that case, the highest-level subquery here will produce only 3 rows, but the query itself will be trying to select any collections not in that set, which will in turn return a very large result set. So we'll a LIMIT there. We also want to start with the oldest ones, so we'll want an ORDER BY timestamp. At this point, there are enough variables here that I'm not sure what query plan Cockroach will use. I think ideal would be to do the subquery, then scan the index (by timestamp) and just skip over any rows that are in the subquery and stop when we reach the limit. I hope it will do that but it's hard to be sure until we do that work.

I think this is probably all solvable (or else we'll find out why it's not), but if the current code is at least correct and not pathological, I'd rather defer this than spend the time now to work all this out.

jgallagher · 2023-10-19T20:05:32Z

nexus/db-queries/src/db/datastore/inventory.rs

+        // break it up if these transactions become too big.  But we'd need a
+        // way to stop other clients from discovering a collection after we
+        // start removing it and we'd also need to make sure we didn't leak a
+        // collection if we crash while deleting it.


Do we also need to prune no-longer-referenced hw_baseboard_id or sw_caboose rows? It's a little hard to imagine hw_baseboard_id getting "too big" since it only gets a row for each physical component the rack sees, but maybe in a large/long-lived multirack deployment? sw_caboose gets a few new rows for every update, so probably grows somewhat faster but still not all that quickly.

In principle, eventually, yes. I think this is not urgent and we will notice before it becomes so.

jgallagher · 2023-10-19T20:21:51Z

nexus/db-queries/src/db/datastore/inventory.rs

+}
+
+impl diesel::query_builder::QueryFragment<diesel::pg::Pg> for InvCabooseInsert {
+    fn walk_ast<'b>(


I have no useful suggestion here, but would like to register my complaint that this is much harder to read than the equivalent query written out in raw SQL would be.

nexus/inventory/src/builder.rs

This reverts commit 58c010f.

davepacheco · 2023-10-30T21:19:31Z

@jgallagher I've made a bunch of changes since your review, but hopefully no surprises. Besides the stuff that came up in your review:

added a bunch of automated tests (which required yet more database layer boilerplate)
rewrote the omdb commands to use functions I created for the automated tests
replaced the CTE with regular Diesel code when I realized I could do a join within INSERT INTO ... SELECT
added a "disable" chicken switch in the Nexus config, in case of some critical reason we need to stop inventory from running
added the database schema changes to the update path

That's all I've got planned so I think this is ready for re-review.

jgallagher · 2023-10-31T14:38:36Z

Looks like the helios deploy test failure is legit (or at least related to the changes):

error_message_internal = Failed to initialize CockroachDb:
 Error running command in zone 'oxz_cockroachdb_13a01e9b-3937-4043-b973-eef03d18a34f':
 Command [/opt/oxide/cockroachdb/bin/cockroach sql --insecure --host [fd00:1122:3344:101::7]:32221 --file /opt/oxide/cockroachdb/sql/dbinit.sql] executed and failed with status:
 exit status: 1
 stdout: ... snip snip snip ... CREATE TABLE\n  stderr: ERROR: relation "omicron.public.inv_collection" does not exist\nSQLSTATE: 42P01\nERROR: relation "omicron.public.inv_collection" does not exist\nSQLSTATE: 42P01\nFailed running "sql"\n

andrewjstone

@davepacheco This looks great. I only skimmed most of the DB transaction stuff, but the overall picture makes sense to me. I'll leave it to John to approve due to my skimming and his expertise.

andrewjstone · 2023-10-31T04:58:35Z

schema/crdb/dbinit.sql

@@ -2514,6 +2514,222 @@ CREATE TABLE IF NOT EXISTS omicron.public.bootstore_keys (
    generation INT8 NOT NULL
 );

+/*


I ❤️ this comment

andrewjstone · 2023-10-31T05:38:02Z

schema/crdb/dbinit.sql

+    PRIMARY KEY (inv_collection_id, hw_baseboard_id)
+);
+
+CREATE TYPE IF NOT EXISTS omicron.public.caboose_which AS ENUM (


As I was reading through inv_root_of_trust and inv_service_processor I was wondering where the references to the cabooses were, and then reached this comment thread and the remaining tables. Thinking about this a bit, I think we should stiick with the caboose_which and inv_caboose tables as they are now rather than embedding fields in the sp and rot tables which would require a write + update.

I don't think the slight convenience or aesthetically pleasing look of the sp and rot tables is strong enough to violate the rule of "one collection per source = one row in one table". That's a really powerful thing to allow us to reason about the system and my gut is telling me we'll be happy to have that later.

andrewjstone · 2023-10-31T17:32:31Z

nexus/db-queries/src/db/datastore/inventory.rs

+    /// Prune inventory collections stored in the database, keeping at least
+    /// `nkeep`.
+    ///
+    /// This function removes as many collections as possible while preserving


The latest nkeep are provided by timestamps, which aren't really global. Right now collection is at 10 min intervals, so as long as only 1 nexus performs a collection per interval this should be fine. However, I could see problems arising around order, although quite unlikely due to our 500ms limitation around syncing.

I don't really think this is something worth worrying about but figured I'd ask for completeness sake. Are there autoincrementing IDs we could use for collections rather than UUIDs as foreign keys and then sort by those? Would this present other issues with dueling Nexuses?

Yeah, using timestamps is definitely a little fuzzy. In this case though I think that reflects the reality that collections are not atomic and they don't have a total order. Two Nexus instances could totally run collections concurrently that have start/done times that overlap (and I think that's fine). Consumers can decide if they want the most-recently-started or most-recently-finished (or even the-one-containing-the-most-recent-collection-time-for-the-specific-item-that-I-care-about). We could potentially use a sequence to assign a total order to these but I don't think it would have a useful semantic meaning -- at best it'd be a proxy for "which one committed to the database first" and I'm not sure that's useful.

Okay so my argument is basically "report the facts (the start/done timestamp) and let consumers decide what they want". But that just punts your question to "okay, well, which ones should we keep when we're pruning them?". And I think the answer here is to tune both the frequency and nkeep such that it doesn't really matter if we choose "wrong" -- i.e., if two collections start/finish at about the same time but for some reason a consumer might reasonably want either one, we should probably just keep both. But my expectation here is that all consumers for now would probably want the same thing, which is the latest "time_started" one, and as long as "nkeep" is more than 1 then it doesn't matter which ones we keep if two overlap because there's always a newer one which is what consumers actually want.

Ah, ok. That makes sense. I was actually thinking that we could eliminate the overlapping collections to a degree by having each nexus check that there isn't a collection currently running - or rather that one hasn't started within some bound (say collection_interval / 2) before kicking off another. With that, collections should be very close to totally ordered by time if not always so.

davepacheco · 2023-10-31T18:52:01Z

Thanks for taking a look @andrewjstone!

jgallagher

This looks great! Just a handful of small nits.

jgallagher · 2023-10-31T16:57:17Z

nexus/db-model/src/inventory.rs

+    pub serial_number: String,
+}
+
+impl<'a> From<&'a BaseboardId> for HwBaseboardId {


Tiny nit / question - if we have to .clone() all the fields of BaseboardId, should this be impl From<BaseboardId> instead, and push the clone to the callsite? I'm not sure how this is used, but that might avoid some clones, if there are be cases where a caller has a BaseboardId that they want to convert into a HwBaseboardId and not use again.

Similar question about other From<&T> impls in this file.

Agreed. Fixed in a55216d. I changed this one and the SwCaboose one. I did not change the InvCollection one because in that case, the source object (a Collection) is potentially huge.

jgallagher · 2023-10-31T17:57:20Z

nexus/src/app/background/inventory_collection.rs

+    resolver: internal_dns::resolver::Resolver,
+    creator: String,
+    nkeep: u32,
+    disable: bool,


Just making sure I understand: this is a setting at the Nexus config level (i.e., the TOML file baked into the Nexus zone) and cannot change at runtime, right? If we needed to flip this switch, how would we?

That's right. To my knowledge we do not yet have a way to apply dynamic config at runtime. The intent here is that if we really needed to, we could modify the TOML file inside each Nexus zone to disable this task. Then we'd restart Nexus. It's obviously not great but I've frequently found these sorts of facilities essential in the mitigation of production incidents in the past. (A step up might be a support API for pausing any background task in a particular Nexus instance by name. But that wouldn't survive Nexus restart without storing that config somewhere.)

jgallagher · 2023-10-31T18:03:08Z

nexus/src/app/background/inventory_collection.rs

+            datastore.clone(),
+        );
+
+        // Nexus starts our very background task, so we should find a collection


Nit typo - "starts our very background"

Not a typo, but poorly written. I reworded it in a55216d.

jgallagher · 2023-10-31T18:07:01Z

nexus/src/app/background/init.rs

@@ -88,6 +96,30 @@ impl BackgroundTasks {
            (task, watcher_channel)
        };

+        // Background task: inventory collector
+        let task_inventory_collection = {
+            let watcher = inventory_collection::InventoryCollector::new(


Nit / question - is this variable misnamed? Looks like register takes watchers as its last arg, but this is the task implementation itself, right?

Yeah, fixed in a55216d.

jgallagher · 2023-10-31T18:07:32Z

nexus/inventory/tests/output/collector_errors.txt

+    RotSlotB baseboard part "FAKE_SIM_SIDECAR" serial "SimSidecar1": board "SimSidecarRot"
+
+errors:
+error: MGS "http://[100::1]:12345": listing ignition targets: Communication Error: error sending request for url (http://[100::1]:12345/ignition): error trying to connect: tcp connect error: Network is unreachable (os error <<redacted>>): error sending request for url (http://[100::1]:12345/ignition): error trying to connect: tcp connect error: Network is unreachable (os error <<redacted>>): error trying to connect: tcp connect error: Network is unreachable (os error <<redacted>>): tcp connect error: Network is unreachable (os error <<redacted>>): Network is unreachable (os error <<redacted>>)


Ug, sorry for this; I really need to clean up the duplicated: duplicated: duplicated: errors from MGS

nexus/inventory/src/collector.rs

jgallagher · 2023-10-31T19:56:00Z

nexus/db-queries/src/db/datastore/inventory.rs

+                let index = u16::try_from(i).map_err(|e| {
+                    Error::internal_error(&format!(
+                        "failed to convert error index to u16 (too \
+                                many errors in inventory collection?): {}",


Trivial nit - rustfmt won't line up split strings

Ugh. This keeps happening and I don't notice. I'm not sure why. I wonder if it happens when some other change (like a symbol rename) causes this block to be reformatted when I'm not actually working on it. Anyway, fixed in a55216d.

jgallagher · 2023-10-31T20:03:38Z

nexus/db-queries/src/db/datastore/inventory.rs

+    }
+}
+
+/// A SQL common table expression (CTE) used to insert into `inv_caboose`


I think this block comment is referencing code that no longer exists, right?

Yikes! Yes. Removed in a55216d.

davepacheco · 2023-11-01T20:50:25Z

I think I've addressed the outstanding feedback and I intend to land this once the repo re-opens after the latest customer update.

The RoT can report four different 512-byte pages (CMPA, and CFPA active/inactive/scratch). Given multiple RoT artifacts that are viable (match the right board, etc.) but are signed with different keys, these pages are required to identify which archive was signed with a key that the RoT will accept. This PR adds collection of these pages to the inventory system added in #4291. The implementation here is fairly bulky but very mechanical, and is implemented almost identically to the way we collect cabooses: there's an `rot_page_which` to identify which of the four kinds of page it is, and a table for storing the relatively small number of raw page data values. Most of the changes in this PR resulted from "find where we're doing something for cabooses, then do the analogous thing for RoT pages". There are a couple minor quibbles in the unit tests that I'll point out by leaving comments below. The RoT pages now show up when viewing a collection through omdb (note that the quite long base64 string is truncated; there's a command line flag to override the truncation and show the full string): ```console $ omdb db inventory collections show e2f84867-010d-4ac3-bbf3-bc1e865da16b > x.txt note: database URL not specified. Will search DNS. note: (override with --db-url or OMDB_DB_URL) note: using database URL postgresql://root@[::1]:43301/omicron?sslmode=disable note: database schema version matches expected (11.0.0) collection: e2f84867-010d-4ac3-bbf3-bc1e865da16b collector: e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c (likely a Nexus instance) started: 2023-11-14T18:51:54.900Z done: 2023-11-14T18:51:54.942Z errors: 0 Sled SimGimlet00 part number: FAKE_SIM_GIMLET power: A2 revision: 0 MGS slot: Sled 0 (cubby 0) found at: 2023-11-14 18:51:54.924602 UTC from http://[::1]:42341 cabooses: SLOT BOARD NAME VERSION GIT_COMMIT SpSlot0 SimGimletSp SimGimlet 0.0.1 ffffffff SpSlot1 SimGimletSp SimGimlet 0.0.1 ffffffff RotSlotA SimGimletRot SimGimlet 0.0.1 eeeeeeee RotSlotB SimGimletRot SimGimlet 0.0.1 eeeeeeee RoT pages: SLOT DATA_BASE64 Cmpa Z2ltbGV0LWNtcGEAAAAAAAAAAAAAAAAA... CfpaActive Z2ltbGV0LWNmcGEtYWN0aXZlAAAAAAAA... CfpaInactive Z2ltbGV0LWNmcGEtaW5hY3RpdmUAAAAA... CfpaScratch Z2ltbGV0LWNmcGEtc2NyYXRjaAAAAAAA... RoT: active slot: slot A RoT: persistent boot preference: slot A RoT: pending persistent boot preference: - RoT: transient boot preference: - RoT: slot A SHA3-256: - RoT: slot B SHA3-256: - Sled SimGimlet01 part number: FAKE_SIM_GIMLET power: A2 revision: 0 MGS slot: Sled 1 (cubby 1) found at: 2023-11-14 18:51:54.935038 UTC from http://[::1]:42341 cabooses: SLOT BOARD NAME VERSION GIT_COMMIT SpSlot0 SimGimletSp SimGimlet 0.0.1 ffffffff SpSlot1 SimGimletSp SimGimlet 0.0.1 ffffffff RotSlotA SimGimletRot SimGimlet 0.0.1 eeeeeeee RotSlotB SimGimletRot SimGimlet 0.0.1 eeeeeeee RoT pages: SLOT DATA_BASE64 Cmpa Z2ltbGV0LWNtcGEAAAAAAAAAAAAAAAAA... CfpaActive Z2ltbGV0LWNmcGEtYWN0aXZlAAAAAAAA... CfpaInactive Z2ltbGV0LWNmcGEtaW5hY3RpdmUAAAAA... CfpaScratch Z2ltbGV0LWNmcGEtc2NyYXRjaAAAAAAA... RoT: active slot: slot A RoT: persistent boot preference: slot A RoT: pending persistent boot preference: - RoT: transient boot preference: - RoT: slot A SHA3-256: - RoT: slot B SHA3-256: - Switch SimSidecar0 part number: FAKE_SIM_SIDECAR power: A2 revision: 0 MGS slot: Switch 0 found at: 2023-11-14 18:51:54.904 UTC from http://[::1]:42341 cabooses: SLOT BOARD NAME VERSION GIT_COMMIT SpSlot0 SimSidecarSp SimSidecar 0.0.1 ffffffff SpSlot1 SimSidecarSp SimSidecar 0.0.1 ffffffff RotSlotA SimSidecarRot SimSidecar 0.0.1 eeeeeeee RotSlotB SimSidecarRot SimSidecar 0.0.1 eeeeeeee RoT pages: SLOT DATA_BASE64 Cmpa c2lkZWNhci1jbXBhAAAAAAAAAAAAAAAA... CfpaActive c2lkZWNhci1jZnBhLWFjdGl2ZQAAAAAA... CfpaInactive c2lkZWNhci1jZnBhLWluYWN0aXZlAAAA... CfpaScratch c2lkZWNhci1jZnBhLXNjcmF0Y2gAAAAA... RoT: active slot: slot A RoT: persistent boot preference: slot A RoT: pending persistent boot preference: - RoT: transient boot preference: - RoT: slot A SHA3-256: - RoT: slot B SHA3-256: - Switch SimSidecar1 part number: FAKE_SIM_SIDECAR power: A2 revision: 0 MGS slot: Switch 1 found at: 2023-11-14 18:51:54.915680 UTC from http://[::1]:42341 cabooses: SLOT BOARD NAME VERSION GIT_COMMIT SpSlot0 SimSidecarSp SimSidecar 0.0.1 ffffffff SpSlot1 SimSidecarSp SimSidecar 0.0.1 ffffffff RotSlotA SimSidecarRot SimSidecar 0.0.1 eeeeeeee RotSlotB SimSidecarRot SimSidecar 0.0.1 eeeeeeee RoT pages: SLOT DATA_BASE64 Cmpa c2lkZWNhci1jbXBhAAAAAAAAAAAAAAAA... CfpaActive c2lkZWNhci1jZnBhLWFjdGl2ZQAAAAAA... CfpaInactive c2lkZWNhci1jZnBhLWluYWN0aXZlAAAA... CfpaScratch c2lkZWNhci1jZnBhLXNjcmF0Y2gAAAAA... RoT: active slot: slot A RoT: persistent boot preference: slot A RoT: pending persistent boot preference: - RoT: transient boot preference: - RoT: slot A SHA3-256: - RoT: slot B SHA3-256: - ``` There's also a new `omdb` subcommand to report the RoT pages (which does not truncate, but if we think it should that'd be easy to change): ```console $ omdb db inventory rot-pages note: database URL not specified. Will search DNS. note: (override with --db-url or OMDB_DB_URL) note: using database URL postgresql://root@[::1]:43301/omicron?sslmode=disable note: database schema version matches expected (11.0.0) ID DATA_BASE64 099ba572-a978-4592-ae7a-452629377904 c2lkZWNhci1jZnBhLWluYWN0aXZlAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= 0e9dc5b0-b190-43da-acb6-84450fdfdb94 c2lkZWNhci1jbXBhAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= 80923bac-fbcc-46e0-b861-9dba906c14f7 Z2ltbGV0LWNmcGEtaW5hY3RpdmUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= 98cc4225-a791-4092-99c6-81e27e8d8ffa c2lkZWNhci1jZnBhLWFjdGl2ZQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= a32eaf95-a20e-4570-8860-e0fb584a2ff1 c2lkZWNhci1jZnBhLXNjcmF0Y2gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= c941810a-1c6a-4dda-9c71-41a0caf62ace Z2ltbGV0LWNtcGEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= e96042d0-ae8a-435c-9118-1b71e8a9a651 Z2ltbGV0LWNmcGEtYWN0aXZlAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= fdc27064-4338-4cbe-bfe5-622b11a9afbc Z2ltbGV0LWNmcGEtc2NyYXRjaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=

initial inventory for automated update

0f01172

This commit matches commit 581e902 in branch dap/nexus-inventory. I've just rebased the changes onto "main" here.

davepacheco requested a review from jgallagher October 18, 2023 03:23

fix docs

54f83e0

jgallagher reviewed Oct 19, 2023

View reviewed changes

davepacheco added 4 commits October 19, 2023 15:35

replace CabooseWhich with optional fields in SP, RoT tables

58c010f

initial review feedback

eaab37f

cont: review feedback

3080700

omdb test is flaky because of the order of background tasks printed out

90a6ad5

davepacheco self-assigned this Oct 24, 2023

davepacheco added 12 commits October 24, 2023 21:11

add test for inventory builder

fc91640

rustfmt

3e79d69

add basic test for collector

515fba1

typo

f355bd3

Revert "replace CabooseWhich with optional fields in SP, RoT tables"

03abf62

This reverts commit 58c010f.

add new database test, restore and fix previous builder test

2984000

add more tests and update omdb to use functions I added for tests

64ed053

CTE is not necessary for the INSERT with two foreign keys

7b21243

add chicken switch

eac3319

Merge branch 'main' into dap/nexus-inventory-2

376a37b

update database schema for the upgrade case too

5b7152c

fix hakari

f0aa152

fix test on ubuntu

e2b25e3

fix deploy case?

79723b7

andrewjstone reviewed Oct 31, 2023

View reviewed changes

jgallagher approved these changes Oct 31, 2023

View reviewed changes

review feedback

a55216d

davepacheco enabled auto-merge (squash) November 1, 2023 21:54

davepacheco merged commit 5be4c93 into main Nov 1, 2023
22 of 23 checks passed

davepacheco deleted the dap/nexus-inventory-2 branch November 1, 2023 22:00

davepacheco mentioned this pull request Nov 3, 2023

Support for propolis-based softnpu device, fix multi-switch uplink updates. #4390

Merged

jgallagher mentioned this pull request Nov 14, 2023

Nexus inventory: Add collection of RoT CMPA and CFPA pages #4496

Merged

rcgoodfellow mentioned this pull request Jul 16, 2024

Make MGS available in nexus test suite #5780

Open


		opctx.authorize(authz::Action::Modify, &authz::INVENTORY).await?;

		loop {

initial inventory for automated update #4291

initial inventory for automated update #4291

Conversation

davepacheco commented Oct 18, 2023 • edited Loading

andrewjstone commented Oct 18, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davepacheco commented Oct 30, 2023

jgallagher commented Oct 31, 2023

andrewjstone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davepacheco commented Oct 31, 2023

jgallagher left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

davepacheco commented Nov 1, 2023

davepacheco commented Oct 18, 2023 •

edited

Loading