Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose disk firmware in inventory #5706

Merged
merged 8 commits into from
Jun 27, 2024
Merged

Conversation

papertigers
Copy link
Contributor

@papertigers papertigers commented May 6, 2024

The necessary bits to track a disk's firmware version over its lifetime. This also plumbs up the firmware metadata to the inventory http endpoint, although nothing is consuming it just yet as there will be a follow up PR that actually inserts this data into CRDB.

Copy link
Collaborator

@smklein smklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall structure makes sense to me, just a couple nitpicks!

pub struct DiskFirmware {
active_slot: u8,
next_active_slot: Option<u8>,
slot1_read_only: bool,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels odd that all the other fields imply an arbitrary number of slots, but this field specifically embeds the notion of "slot1"? Also, is this one or zero-indexed?

Copy link
Contributor Author

@papertigers papertigers May 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Welcome to the NVMe spec! The spec states that devices have slot 1-7 (and potentially more in the future as there's unused padding at the end of the C struct). The spec also says that only slot 1 can be read only. Not every device will have all 7 slots available, for example the C bhyve nvme devices shows only one slot that is read only, while some of our WDC devices show 4 slots. And not every available slot will necessarily contain firmware. The firmware version itself is defined to be an ascii string hence the Vec<Option<String>>.

After having chatted with @jgallagher about this we landed on NvmeSlot being a wrapping type to address the array index being 0 but corresponding with slot1 - this can be seen here and has yet to be officially reviewed. The thought was that one could gain access to all the slots via an iterator and upstream of the library you could deal with the 0 indexed array.

I originally had this represented as an enum but was unsure what the underlying CRDB type should look like. Very open to suggestions here as I just went with something for this PR as a placeholder.

active_slot: u8,
next_active_slot: Option<u8>,
slot1_read_only: bool,
slots: Vec<Option<String>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the value of this String? This feels a little bit like a more specific type that has been coerced into a string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the above comment.

let mut removed = Vec::new();
let mut updated = Vec::new();

// Find new or udpated disks.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Find new or udpated disks.
// Find new or updated disks.

Comment on lines +539 to +560
let controller_lock = match controller.try_read_lock() {
libnvme::controller::TryLockResult::Ok(locked) => locked,
// We should only hit this if something in the system has locked the
// controller in question for writing.
libnvme::controller::TryLockResult::Locked(_) => {
warn!(
log,
"NVMe Controller is already locked so we will try again
in the next hardware snapshot"
);
return Err(Error::NvmeControllerLocked);
}
libnvme::controller::TryLockResult::Err(err) => {
return Err(Error::from(err))
}
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this makes sense, given our prior conversation

Comment on lines 430 to 441
// Today the only update we expect to be able to apply to
// a `Disk` is firmware.
pub(crate) fn update_disk(&mut self, raw_disk: &RawDisk) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could program this a little bit more defensively, in case there are other parts of these structures that can be updated. The comment indicates the true behavior ("we ignore everything but the firmware") but the calllsites don't really make that clear.

Can we either call this update_firmware, or have some other way to make sure "we don't drop updates on the floor" if something other than the firmware field changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can fix this - I had originally left this as update_disk as there might be more fields in the future we wish to support. But I am happy to make this an explicit update_firmware for the time being. (side note... maybe update_firmware_metadata is a better name).

@papertigers papertigers force-pushed the mike/nvme-firmware-inventory branch 2 times, most recently from 46347bf to fa35f33 Compare May 23, 2024 23:48
@papertigers papertigers force-pushed the mike/nvme-firmware-inventory branch from fa35f33 to e492752 Compare June 18, 2024 19:14
@papertigers papertigers force-pushed the mike/nvme-firmware-inventory branch from c6d472b to cae1f97 Compare June 25, 2024 15:49
@papertigers papertigers force-pushed the mike/nvme-firmware-inventory branch from cae1f97 to 27cebc0 Compare June 25, 2024 15:55
@papertigers papertigers changed the title WIP expose disk firmware in inventory Expose disk firmware in inventory Jun 26, 2024
@papertigers papertigers marked this pull request as ready for review June 26, 2024 20:58
@papertigers
Copy link
Contributor Author

This is ready for review now that oxidecomputer/libnvme#4 has landed.

@papertigers papertigers requested a review from smklein June 26, 2024 20:58
@@ -300,6 +300,15 @@ impl StorageManagerTestHarness {
.expect("Failed to remove vdev");
}

// XXX MTZ: Provide a vdev update aka new firmware.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we address this this "XXX" before merging?

@papertigers papertigers enabled auto-merge (squash) June 27, 2024 16:11
@papertigers papertigers merged commit 1e3e967 into main Jun 27, 2024
21 checks passed
@papertigers papertigers deleted the mike/nvme-firmware-inventory branch June 27, 2024 17:14
papertigers added a commit that referenced this pull request Sep 25, 2024
With #5706 merged and later reenabled in #6026 we now want to store this
data in CRDB so that it can be used by blueprints and reconfigurator.

`nexus_types::inventory::PhysicalDisk` now has a `firmware` field that
is exposed as an enum called `PhysicalDiskFirmware`. Today we only have
`Unknown` and `Nvme` variants but this can later be expanded upon if we
ever start using additional disk types such as SCSI/SATA etc.

Finally I have included an updated version of `omdb db inventory
physical-disks` that will include the currently active firmware as well
as the next active firmware if a new version has been staged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants