Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The control plane should use libipcc #4536

Merged
merged 19 commits into from
Jan 11, 2024
Merged

The control plane should use libipcc #4536

merged 19 commits into from
Jan 11, 2024

Conversation

papertigers
Copy link
Contributor

@papertigers papertigers commented Nov 21, 2023

With oxidecomputer/illumos-gate@779fb5b now integrated we should use libipcc in omicron rather than make direct calls via ioctl. The library will take care of hiding the implementation details from upstream consumers -- this becomes important in the future when communication with the service processor from the host OS physically changes with newer board design.

Currently the only consumer in this repo is installinator but the control plane is about to start communicating with the RoT via IPCC as well.

This PR introduces new bindings around libipcc and removes the old ioctl interfaces.

ipcc/src/handle.rs Outdated Show resolved Hide resolved
ipcc/src/handle.rs Outdated Show resolved Hide resolved
ipcc/src/lib.rs Outdated Show resolved Hide resolved
@papertigers papertigers changed the title Mike/ipcc The control plane should use libipcc Nov 27, 2023
@papertigers
Copy link
Contributor Author

To test this I ran some tests on a bench gimlet.

First off let's make sure installinator can lookup the InstallinatorImageId:

gimlet # ./installinator install --from-ipcc /var/tmp/foo
Nov 27 20:59:15.708 INFO Received prefixes from ddmd, prefixes: {"fe80::2cf4:23ff:fe60:ff36": [PathVector { destination: Ipv6Prefix { addr: fd00:99::, len: 64 }, path: ["oxz_switch"] }]}, DdmAdminClient: [::1]:8000, artifact: ArtifactHashId { kind: ArtifactKind("host_phase_2"), hash: ArtifactHash("0102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20") }, file: clients/ddm-admin-client/src/lib.rs:119
Nov 27 20:59:15.708 INFO discovered 0 peers: [], artifact: ArtifactHashId { kind: ArtifactKind("host_phase_2"), hash: ArtifactHash("0102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20") }, file: installinator/src/peers.rs:167
Nov 27 20:59:15.708 WARN unable to fetch artifact from peers, retrying discovery, artifact: ArtifactHashId { kind: ArtifactKind("host_phase_2"), hash: ArtifactHash("0102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20") }, file: installinator/src/peers.rs:178

Using a butler to adjust the size of the InstallinatorImageId value (too small / too big):

$
butler $ pfexec humility writeword 0x2402b040 122 $((16#707569A3)) $((16#65746164)) $((16#5064695f)) $((16#44434241)) $((16#48474645)) $((16#4c4b4a49)) $((16#504f4e4d)) $((16#736F686C)) $((16#68705F74)) $((16#5F657361)) $((16#01205832)) $((16#05040302)) $((16#09080706)) $((16#0D0C0B0A)) $((16#11100F0E)) $((16#15141312)) $((16#19181716)) $((16#1D1C1B1A)) $((16#6D201F1E)) $((16#746E6F63)) $((16#5F6C6F72)) $((16#6E616C70)) $((16#21205865)) $((16#25242322)) $((16#29282726)) $((16#2D2C2B2A)) $((16#31302F2E)) $((16#35343332)) $((16#39383736)) $((16#3D3C3B3A)) $((16#00403F3E))
gimlet # ./installinator install --from-ipcc /var/tmp/foo
Error: error retrieving installinator image ID

Caused by:
    deserializing installinator image ID failed: i/o error reading from slice: failed to fill whole buffer
butler $ pfexec humility writeword 0x2402b040 124 $((16#707569A3)) $((16#65746164)) $((16#5064695f)) $((16#44434241)) $((16#48474645)) $((16#4c4b4a49)) $((16#504f4e4d)) $((16#736F686C)) $((16#68705F74)) $((16#5F657361)) $((16#01205832)) $((16#05040302)) $((16#09080706)) $((16#0D0C0B0A)) $((16#11100F0E)) $((16#15141312)) $((16#19181716)) $((16#1D1C1B1A)) $((16#6D201F1E)) $((16#746E6F63)) $((16#5F6C6F72)) $((16#6E616C70)) $((16#21205865)) $((16#25242322)) $((16#29282726)) $((16#2D2C2B2A)) $((16#31302F2E)) $((16#35343332)) $((16#39383736)) $((16#3D3C3B3A)) $((16#00403F3E))
gimlet # ./installinator install --from-ipcc /var/tmp/foo
Error: error retrieving installinator image ID

Caused by:
    IPCC key lookup failed for key InstallinatorImageId: buffer too small for value

Change host_phase2 to host_qhase2:

butler $ pfexec humility writeword 0x2402b040 123 $((16#707569A3)) $((16#65746164)) $((16#5064695f)) $((16#44434241)) $((16#48474645)) $((16#4c4b4a49)) $((16#504f4e4d)) $((16#736F686C)) $((16#68715F74)) $((16#5F657361)) $((16#01205832)) $((16#05040302)) $((16#09080706)) $((16#0D0C0B0A)) $((16#11100F0E)) $((16#15141312)) $((16#19181716)) $((16#1D1C1B1A)) $((16#6D201F1E)) $((16#746E6F63)) $((16#5F6C6F72)) $((16#6E616C70)) $((16#21205865)) $((16#25242322)) $((16#29282726)) $((16#2D2C2B2A)) $((16#31302F2E)) $((16#35343332)) $((16#39383736)) $((16#3D3C3B3A)) $((16#00403F3E))
gimlet # ./installinator install --from-ipcc /var/tmp/foo
Error: error retrieving installinator image ID

Caused by:
    deserializing installinator image ID failed: semantic error: missing field `host_phase_2` (offset unknown)

Crashing the SP during key lookup:

gimlet # ./installinator install --from-ipcc /var/tmp/foo
Error: error retrieving installinator image ID

Caused by:
    IPCC key lookup ioctl failed for key InstallinatorImageId: Connection timed out (os error 145)

Remove the key completely:

butler $ pfexec humility writeword 0x2402b040 0
gimlet # ./installinator install --from-ipcc /var/tmp/foo
Error: error retrieving installinator image ID

Caused by:
    IPCC key lookup failed for key InstallinatorImageId: no value for key

Restoring a valid configuration again:

gimlet # ./installinator install --from-ipcc /var/tmp/foo
Nov 27 21:18:50.283 INFO Received prefixes from ddmd, prefixes: {"fe80::2cf4:23ff:fe60:ff36": [PathVector { destination: Ipv6Prefix { addr: fd00:99::, len: 64 }, path: ["oxz_switch"] }]}, DdmAdminClient: [::1]:8000, artifact: ArtifactHashId { kind: ArtifactKind("host_phase_2"), hash: ArtifactHash("0102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20") }, file: clients/ddm-admin-client/src/lib.rs:119
Nov 27 21:18:50.283 INFO discovered 0 peers: [], artifact: ArtifactHashId { kind: ArtifactKind("host_phase_2"), hash: ArtifactHash("0102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20") }, file: installinator/src/peers.rs:167
Nov 27 21:18:50.284 WARN unable to fetch artifact from peers, retrying discovery, artifact: ArtifactHashId { kind: ArtifactKind("host_phase_2"), hash: ArtifactHash("0102030405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20") }, file: installinator/src/peers.rs:178

@papertigers papertigers marked this pull request as ready for review November 27, 2023 21:20
@@ -13,7 +13,7 @@ rustdocflags = "--document-private-items"
# things critical to correctness probably don't belong here.
[target.x86_64-unknown-illumos]
rustflags = [
"-C", "link-arg=-Wl,-znocompstrtab"
"-C", "link-arg=-Wl,-znocompstrtab,-R/usr/platform/oxide/lib/amd64"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a workaround I was using to get around a missing run path. The solution is likely somewhere in the rpath crate - I am diverting my attention to fixing this now, but if someone has pointers on how this should be done, that would be greatly appreciated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spoke to dap about this and I plan to push a commit to clean up this mess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: opting to go with passing the oxide platform directory here after some discussion in chat.

@jordanhendricks jordanhendricks self-requested a review November 28, 2023 18:26
Copy link
Contributor

@jordanhendricks jordanhendricks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excited to see this land! Some comments and questions

// file, You can obtain one at https://mozilla.org/MPL/2.0/.

#[cfg(target_os = "illumos")]
static OXIDE_PLATFORM: &str = "/usr/platform/oxide/lib/amd64/";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

worth a comment explaining what this path is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 77bd065.

_marker: core::marker::PhantomData<(*mut u8, core::marker::PhantomPinned)>,
}

pub(crate) const LIBIPCC_ERR_OK: libipcc_err_t = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing doc comment here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 77bd065.

_marker: core::marker::PhantomData<(*mut u8, core::marker::PhantomPinned)>,
}

pub(crate) const LIBIPCC_ERR_OK: libipcc_err_t = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would prefer if there are newlines between the constants, since it helps with readability.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 77bd065.

LIBIPCC_ERR_NO_MEM => IpccError::NoMem(inner),
LIBIPCC_ERR_INVALID_PARAM => IpccError::InvalidParam(inner),
LIBIPCC_ERR_INTERNAL => IpccError::Internal(inner),
LIBIPCC_ERR_KEY_UNKNOWN => IpccError::Internal(inner),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be IpccError::KeyUnknown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch -- dang it copy and paste!
Fixed in 77bd065.

ipcc/src/lib.rs Outdated
//! An interface to libipcc (inter-processor communications channel) which
//! currently supports looking up values stored in the SP by key. These
//! values are passed from the control plane to the SP (through MGS) or
//! are maniuplated through the `ipcc` command found on Helios.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "manipulated"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary after accepting Andy's change.

// Copyright 2023 Oxide Computer Company

use crate::IpccError;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe add a comment in here that these stubs are for not-illumos platforms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 77bd065.

}

/// Interface to the inter-processor communications channel.
/// For more information see rfd 316.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are there docs for the illumos library as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no documentation for the library other than the header file. I thought about linking the ipcc driver man page but the point of libipcc is that users don't call the kernel driver interface directly anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be man pages for libipcc in the next week or so.


impl Drop for IpccHandle {
fn drop(&mut self) {
unsafe {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For each unsafe block, it would be to have a comment explaining why the code is safe across FFI boundaries. An example of what I mean:

// Safety: CStr::from_ptr is documented as safe if:
// 1. The pointer contains a valid null terminator at the end of
// the string
// 2. The pointer is valid for reads of bytes up to and including
// the null terminator
// 3. The memory referenced by the return CStr is not mutated for
// the duration of lifetime 'a
//
// (1) is true because we initialize the buffers for ste_path as all
// 0s, and their length is long enough to include the null
// terminator for all paths on the system.
// (2) should be guaranteed by the syscall itself, and we can know
// how many entries are valid via its return value.
// (3) We aren't currently mutating the memory referenced by the
// CStr, though there's nothing here enforcing that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added safety comments to unsafe blocks that were not calls into libipcc itself since calling any C function has to be marked unsafe. So all the usage of CString and CStr list why they are safe. Let me know if you would like more safety comments elsewhere.

ipcc/src/lib.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@jordanhendricks jordanhendricks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Excited to land this (modulo buildomat worker OS versions getting updated)

Copy link
Contributor

@citrus-it citrus-it left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work on this, I'm pleased to see the last direct ipcc ioctl user quashed.
Modulo the tests passing, approved.

@papertigers papertigers enabled auto-merge (squash) January 10, 2024 22:54
@papertigers papertigers merged commit 0528456 into main Jan 11, 2024
22 of 23 checks passed
@papertigers papertigers deleted the mike/ipcc branch January 11, 2024 00:08
@davepacheco
Copy link
Collaborator

Just putting this here in case others search for this error message: this PR appears to move forward the minimum Helios that the build machine must be running to 2.0.22221 from October 12th, 2023 (thanks @citrus-it). If you're on a machine that's too old, you'll see a message like this:

error: could not compile `installinator` (bin "installinator" test) due to previous error
warning: build failed, waiting for other jobs to finish...
error: linking with `gcc` failed: exit status: 1
  |
...
  = note: ld: fatal: library -lipcc: not found
          ld: fatal: file processing errors. No output written to /home/dap/omicron-iter/target/debug/deps/installinator-40f5c7be5862bb49
          collect2: error: ld returned 1 exit status
          

error: could not compile `installinator` (lib test) due to previous error
error: command `/home/dap/.rustup/toolchains/1.74.1-x86_64-unknown-illumos/bin/cargo test --no-run --message-format json-render-diagnostics` exited with code 101

You should generally be able to update Helios machines with pkg update followed by a reboot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants