-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose disk firmware in inventory #5706
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall structure makes sense to me, just a couple nitpicks!
pub struct DiskFirmware { | ||
active_slot: u8, | ||
next_active_slot: Option<u8>, | ||
slot1_read_only: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels odd that all the other fields imply an arbitrary number of slots, but this field specifically embeds the notion of "slot1"? Also, is this one or zero-indexed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Welcome to the NVMe spec! The spec states that devices have slot 1-7 (and potentially more in the future as there's unused padding at the end of the C struct). The spec also says that only slot 1 can be read only. Not every device will have all 7 slots available, for example the C bhyve nvme devices shows only one slot that is read only, while some of our WDC devices show 4 slots. And not every available slot will necessarily contain firmware. The firmware version itself is defined to be an ascii string hence the Vec<Option<String>>
.
After having chatted with @jgallagher about this we landed on NvmeSlot
being a wrapping type to address the array index being 0 but corresponding with slot1 - this can be seen here and has yet to be officially reviewed. The thought was that one could gain access to all the slots via an iterator and upstream of the library you could deal with the 0 indexed array.
I originally had this represented as an enum but was unsure what the underlying CRDB type should look like. Very open to suggestions here as I just went with something for this PR as a placeholder.
sled-hardware/src/disk.rs
Outdated
active_slot: u8, | ||
next_active_slot: Option<u8>, | ||
slot1_read_only: bool, | ||
slots: Vec<Option<String>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the value of this String? This feels a little bit like a more specific type that has been coerced into a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the above comment.
sled-hardware/src/illumos/mod.rs
Outdated
let mut removed = Vec::new(); | ||
let mut updated = Vec::new(); | ||
|
||
// Find new or udpated disks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Find new or udpated disks. | |
// Find new or updated disks. |
let controller_lock = match controller.try_read_lock() { | ||
libnvme::controller::TryLockResult::Ok(locked) => locked, | ||
// We should only hit this if something in the system has locked the | ||
// controller in question for writing. | ||
libnvme::controller::TryLockResult::Locked(_) => { | ||
warn!( | ||
log, | ||
"NVMe Controller is already locked so we will try again | ||
in the next hardware snapshot" | ||
); | ||
return Err(Error::NvmeControllerLocked); | ||
} | ||
libnvme::controller::TryLockResult::Err(err) => { | ||
return Err(Error::from(err)) | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this makes sense, given our prior conversation
sled-storage/src/disk.rs
Outdated
// Today the only update we expect to be able to apply to | ||
// a `Disk` is firmware. | ||
pub(crate) fn update_disk(&mut self, raw_disk: &RawDisk) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could program this a little bit more defensively, in case there are other parts of these structures that can be updated. The comment indicates the true behavior ("we ignore everything but the firmware") but the calllsites don't really make that clear.
Can we either call this update_firmware
, or have some other way to make sure "we don't drop updates on the floor" if something other than the firmware field changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can fix this - I had originally left this as update_disk
as there might be more fields in the future we wish to support. But I am happy to make this an explicit update_firmware
for the time being. (side note... maybe update_firmware_metadata
is a better name).
46347bf
to
fa35f33
Compare
fa35f33
to
e492752
Compare
c6d472b
to
cae1f97
Compare
cae1f97
to
27cebc0
Compare
This is ready for review now that oxidecomputer/libnvme#4 has landed. |
@@ -300,6 +300,15 @@ impl StorageManagerTestHarness { | |||
.expect("Failed to remove vdev"); | |||
} | |||
|
|||
// XXX MTZ: Provide a vdev update aka new firmware. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we address this this "XXX" before merging?
With #5706 merged and later reenabled in #6026 we now want to store this data in CRDB so that it can be used by blueprints and reconfigurator. `nexus_types::inventory::PhysicalDisk` now has a `firmware` field that is exposed as an enum called `PhysicalDiskFirmware`. Today we only have `Unknown` and `Nvme` variants but this can later be expanded upon if we ever start using additional disk types such as SCSI/SATA etc. Finally I have included an updated version of `omdb db inventory physical-disks` that will include the currently active firmware as well as the next active firmware if a new version has been staged.
The necessary bits to track a disk's firmware version over its lifetime. This also plumbs up the firmware metadata to the inventory http endpoint, although nothing is consuming it just yet as there will be a follow up PR that actually inserts this data into CRDB.