Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[smf] Use new zone network config service #1096

Merged
merged 7 commits into from
Feb 20, 2024

Conversation

karencfv
Copy link
Contributor

oxidecomputer/omicron#4677 Will implement a new zone network configuration setup service so control plane services don't have to set this up themselves.

Once that PR is merged, I'll open up a separate one in the omicron repo that will include these changes in the sled-agent/src/services.rs file for crucible and crucible pantry which I'll have to somehow coordinate with this PR.

service_name = "crucible"
only_for_targets.image = "standard"
source.type = "composite"
source.packages = [ "crucible-svc.tar.gz", "zone-network-setup.tar.gz" ]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heya @smklein, I have a couple of questions around packages in this repo

  1. The zone-network-setup.tar.gz package will be built in omicron https://github.com/oxidecomputer/omicron/pull/4677/files#diff-3ef35f168f90144ed9f4e1c80d4e7b95e4584652bf7f22886046a11c7ef630a6R617-R627 . From my understanding, what I have here will work once deployed on the rack (and the led-agent/src/services.rs file is updated of course), but will forever make the rbuild tests fail here?
    If all of that is wrong, will I have to recreate the zone-network-setup service here? :/
  2. I'm not sure what's going on with the packages. I have defined the composite packages here, just like I did in omicron, but for some reason it doesn't create the new composite packages when I run cargo run --bin crucible-package 🤷‍♀️

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heya @smklein, I have a couple of questions around packages in this repo

  1. The zone-network-setup.tar.gz package will be built in omicron https://github.com/oxidecomputer/omicron/pull/4677/files#diff-3ef35f168f90144ed9f4e1c80d4e7b95e4584652bf7f22886046a11c7ef630a6R617-R627 . From my understanding, what I have here will work once deployed on the rack (and the led-agent/src/services.rs file is updated of course), but will forever make the rbuild tests fail here?
    If all of that is wrong, will I have to recreate the zone-network-setup service here? :/

I'm not really familiar with what's going on in the rbuild tests in crucible, so I can't really comment on that -- but if this issue is basically a question of dependencies (namely: Omicron generally pulls in Crucible, but here, Crucible wants to pull in something from Omicron, in the form of the network setup), then I think there are two options:

  1. Crucible emits zones that are missing their network config. Omicron adds them in during package assembly.

Pro: I think this would work with minimal setup
Con: Crucible's zones, without this extra step, would be broken out-of-the-box, because they'd be missing network config (seems liket his might be what you're alluding to with the failing tests?)

  1. We could pull the zone-network-setup out of Omicron, and have both crucible and Omicron pull it in?

I don't think we should duplicate the service, but this would be my standard for breaking out of a circular dependency. We did something similar for the Omicron packaging tools, so they could be used on both sides: https://github.com/oxidecomputer/omicron-package

  1. I'm not sure what's going on with the packages. I have defined the composite packages here, just like I did in omicron, but for some reason it doesn't create the new composite packages when I run cargo run --bin crucible-package 🤷‍♀️

This is the entirety of "crucible-package":

#[tokio::main]
async fn main() -> Result<()> {
let cfg = config::parse("package-manifest.toml")?;
let output_dir = Path::new("out");
create_dir_all(output_dir)?;
for (name, package) in cfg.packages {
package
.create_for_target(&Target::default(), &name, output_dir)
.await?;
}
Ok(())
}

What is getting outputted when you run that command?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elaborating on that create_for_target call:

Copy link
Contributor

@smklein smklein Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, just to confirm: The "omicron" repo is using the same version of omicron-package as Crucible, so this implementation of building composites should be the same code we're using in Omicron's repo.

EDIT: This is mostly true, except that crucible was missing "sort package creation by dependency order". Added that below.

Copy link
Contributor Author

@karencfv karencfv Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @smklein!

Pro: I think this would work with minimal setup
Con: Crucible's zones, without this extra step, would be broken out-of-the-box, because they'd be missing network config (seems liket his might be what you're alluding to with the failing tests?)

Yeah, that's what I was alluding to with the failing tests. If we choose to go this way we'd have to make significant changes to the rbuild test here.

We could pull the zone-network-setup out of Omicron, and have both crucible and Omicron pull it in?

The more I think about it, this is probably the best way to go. It's extremely likely that we'll have more shared services in the future. For example, clickhouse, ch-keeper and cockroachdb all use the internal-dns service, and a tiny service that disables ssh has just been introduced oxidecomputer/omicron#4716. It's probably best to extract shared services into their own repo it seems 🤔

Do you have any thoughts on this @leftwo ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like moving common code to a common location is the way to go, and I don't have any attachment to it being it this repo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason that the final assembly cannot be done in omicron, keeping the zone network service and all of its illumos dependencies there?

I haven't tried, but I think this is could work if the service names here were changed slightly (crucible-zone, crucible-pantry-zone?). Then omicron could continue downloading those with their new names and combine them with the new zone network service to produce the final crucible and crucible-pantry.

If that's workable, I think it's simpler and keeps final zone assembly in one place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely worth looking into! I'll give it a go

@smklein
Copy link
Contributor

smklein commented Jan 12, 2024

Ugh, okay, I think I see what's happening here. This is the parsed order of package-manifest.toml, with your changes:

Package: crucible
Package: crucible-pantry
Package: crucible-pantry-svc
Package: crucible-svc

I think the issue here is that the "crucible" package is being created before the dependencies (seems that the manifest is parsed and then sorted?).

I'm pretty sure I can change this to go in dependency-first order -- gimme a moment to try topologically sorting the packages.

@smklein
Copy link
Contributor

smklein commented Jan 12, 2024

#1097 should provide a fix. This will be the new output from your PR, after it merges:

$ cargo run --bin crucible-package
    Finished dev [unoptimized + debuginfo] target(s) in 0.27s
     Running `target/debug/crucible-package`
Creating 'crucible-pantry-svc.tar.gz'
Creating 'crucible-svc.tar.gz'
Creating 'zone-network-setup.tar.gz'
Error: Cannot find a package to create output: 'zone-network-setup.tar.gz' 
        This can happen when building a composite package, where one of 
        the 'source.packages' has not been found.

@karencfv
Copy link
Contributor Author

@smklein :

Thanks a bunch for looking into this!

This is the entirety of "crucible-package":

ha ha, that's why I was so confused! That binary just iterated over all the packages

Ugh, okay, I think I see what's happening here. This is the parsed order of package-manifest.toml, with your changes:
Package: crucible
Package: crucible-pantry
Package: crucible-pantry-svc
Package: crucible-svc
I think the issue here is that the "crucible" package is being created before the dependencies (seems that the manifest is parsed and then sorted?).

Nice catch!! and thanks for opening up that PR so quickly, very much appreciated!!

exit "$SMF_EXIT_ERR_CONFIG"
fi

# TODO remove when https://github.com/oxidecomputer/stlouis/issues/435 is addressed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we no longer need these changes, or are they handled somewhere else now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what the new zone network config setup service does :)

I'm extracting all of this into a new SMF service (oxide/zone-network-setup), which sets up the common initial zone networking configuration for each self assembled zone.
https://github.com/oxidecomputer/omicron/pull/4677/files#diff-5fb7b70dc87176e02517181b0887ce250b6a4e4079e495990551deeca741dc8bR74-R92

service_name = "crucible"
only_for_targets.image = "standard"
source.type = "composite"
source.packages = [ "crucible-svc.tar.gz", "zone-network-setup.tar.gz" ]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like moving common code to a common location is the way to go, and I don't have any attachment to it being it this repo.

fi

# TODO remove when https://github.com/oxidecomputer/stlouis/issues/435 is addressed
ipadm delete-if "$DATALINK" || true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question here, setting the MTU was important at one time, so before
we remove this, I would want to be sure it's done elsewhere.

Copy link
Contributor Author

@karencfv karencfv Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@karencfv
Copy link
Contributor Author

karencfv commented Feb 8, 2024

@citrus-it, looks like that worked, thanks for the idea!

Here is a comparison of the old (current) crucible/pantry zone packages and the new ones built from this PR in conjunction with oxidecomputer/omicron#4927. Please note that the missing root/opt/oxide/lib/svc/manifest/crucible/agent.sh and root/opt/oxide/lib/svc/manifest/crucible/pantry.sh are the files I removed in favour of executing the commands directly on the xml file to avoid as much as possible having bash scripts.

Contents of new zone packages

$ tar -ztvf crucible-zone.tar.gz 
Decompressing 'crucible-zone.tar.gz' with '/usr/bin/gzcat'...
-rw-r--r--   0/0       61 Jul 24 01:21 2006 oxide.json
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest/site
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest/site/crucible
-rw-r--r--   0/0     2243 Jul 24 01:21 2006 root/var/svc/manifest/site/crucible/agent.xml
-rw-r--r--   0/0     1718 Jul 24 01:21 2006 root/var/svc/manifest/site/crucible/downstairs.xml
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib/svc
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib/svc/manifest
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib/svc/manifest/crucible
-rwxr-xr-x   0/0      992 Jul 24 01:21 2006 root/opt/oxide/lib/svc/manifest/crucible/downstairs.sh
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/crucible
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/crucible/bin
-rwxr-xr-x   0/0   12517688 Jul 24 01:21 2006 root/opt/oxide/crucible/bin/crucible-agent
-rwxr-xr-x   0/0   21128616 Jul 24 01:21 2006 root/opt/oxide/crucible/bin/crucible-downstairs
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest/site
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest/site/zone-network-setup
-rw-r--r--   0/0     1609 Jul 24 01:21 2006 root/var/svc/manifest/site/zone-network-setup/manifest.xml
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/zone-network-setup
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/zone-network-setup/bin
-rwxr-xr-x   0/0   33315752 Jul 24 01:21 2006 root/opt/oxide/zone-network-setup/bin/zone-networking
$ tar -ztvf crucible-pantry-zone.tar.gz 
Decompressing 'crucible-pantry-zone.tar.gz' with '/usr/bin/gzcat'...
-rw-r--r--   0/0       68 Jul 24 01:21 2006 oxide.json
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest/site
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest/site/crucible
-rw-r--r--   0/0     1653 Jul 24 01:21 2006 root/var/svc/manifest/site/crucible/pantry.xml
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/pantry
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/pantry/bin
-rwxr-xr-x   0/0   18514904 Jul 24 01:21 2006 root/opt/oxide/pantry/bin/crucible-pantry
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest/site
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest/site/zone-network-setup
-rw-r--r--   0/0     1609 Jul 24 01:21 2006 root/var/svc/manifest/site/zone-network-setup/manifest.xml
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/zone-network-setup
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/zone-network-setup/bin
-rwxr-xr-x   0/0   33315752 Jul 24 01:21 2006 root/opt/oxide/zone-network-setup/bin/zone-networking

Contents of old zone packages:

$ tar -ztvf crucible.tar.gz 
Decompressing 'crucible.tar.gz' with '/usr/bin/gzcat'...
-rw-r--r--   0/0       56 Jul 24 01:21 2006 oxide.json
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest/site
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest/site/crucible
-rw-r--r--   0/0     1926 Jul 24 01:21 2006 root/var/svc/manifest/site/crucible/agent.xml
-rw-r--r--   0/0     1718 Jul 24 01:21 2006 root/var/svc/manifest/site/crucible/downstairs.xml
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib/svc
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib/svc/manifest
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib/svc/manifest/crucible
-rwxr-xr-x   0/0      992 Jul 24 01:21 2006 root/opt/oxide/lib/svc/manifest/crucible/downstairs.sh
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib/svc
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib/svc/manifest
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib/svc/manifest/crucible
-rwxr-xr-x   0/0     1549 Jul 24 01:21 2006 root/opt/oxide/lib/svc/manifest/crucible/agent.sh
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/crucible
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/crucible/bin
-rwxr-xr-x   0/0   27502168 Jul 24 01:21 2006 root/opt/oxide/crucible/bin/crucible-agent
-rwxr-xr-x   0/0   59436392 Jul 24 01:21 2006 root/opt/oxide/crucible/bin/crucible-downstairs
$ tar -ztvf crucible-pantry.tar.gz 
Decompressing 'crucible-pantry.tar.gz' with '/usr/bin/gzcat'...
-rw-r--r--   0/0       63 Jul 24 01:21 2006 oxide.json
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest/site
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest/site/crucible
-rw-r--r--   0/0     1549 Jul 24 01:21 2006 root/var/svc/manifest/site/crucible/pantry.xml
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib/svc
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib/svc/manifest
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib/svc/manifest/crucible
-rwxr-xr-x   0/0     1135 Jul 24 01:21 2006 root/opt/oxide/lib/svc/manifest/crucible/pantry.sh
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/pantry
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/pantry/bin
-rwxr-xr-x   0/0   42846504 Jul 24 01:21 2006 root/opt/oxide/pantry/bin/crucible-pantry

@smklein, I tried deploying this manually on a Helios machine but it seems it didn't recognise my manually built crucible/pantry images. Here is a snippet of the omicron package-manifest.toml I used:

[package.crucible-zone]
service_name = "crucible-zone"
only_for_targets.image = "standard"
source.type = "composite"
source.packages = [ "crucible.tar.gz", "zone-network-setup.tar.gz" ]
output.type = "zone"


[package.crucible-pantry-zone]
service_name = "crucible_pantry_zone"
only_for_targets.image = "standard"
source.type = "composite"
source.packages = [ "crucible-pantry.tar.gz", "zone-network-setup.tar.gz" ]
output.type = "zone"

# Packages not built within Omicron, but which must be imported.

# Refer to
#   https://github.com/oxidecomputer/crucible/blob/main/package/README.md
# for instructions on building this manually.
[package.crucible]
service_name = "crucible"
only_for_targets.image = "standard"
# To manually override the package source (for example, to test a change in
# both Crucible and Omicron simultaneously):
#
# 1. Build the zone image manually
# 2. Copy the output zone image from crucible/out to omicron/out
# 3. Use source.type = "manual" instead of "prebuilt"
source.type = "manual"
source.repo = "crucible"
source.commit = "2d4bc11232d53f177c286383926fa5f8c1b2a938"
# The SHA256 digest is automatically posted to:
# https://buildomat.eng.oxide.computer/public/file/oxidecomputer/crucible/image/<commit>/crucible.sha256.txt
source.sha256 = "88ec93657a644e8f10a32d1d22cc027db901aea81027f49ce7bee58fc4a35755"
output.type = "zone"
output.intermediate_only = true

[package.crucible-pantry]
service_name = "crucible_pantry"
only_for_targets.image = "standard"
source.type = "manual"
source.repo = "crucible"
source.commit = "2d4bc11232d53f177c286383926fa5f8c1b2a938"
# The SHA256 digest is automatically posted to:
# https://buildomat.eng.oxide.computer/public/file/oxidecomputer/crucible/image/<commit>/crucible-pantry.sha256.txt
source.sha256 = "e2c3ed2d4cd6b5da3d38dd52df6d4a259280be7d45c30a363e9c71b174ecc6f8"
output.type = "zone"
output.intermediate_only = true

But it seems that even though I changed prebuilt for manual it ignored it and went ahead and pulled the buildomat images. I can confirm this as the agent.xml file inside the zone has none of the changes I made in this PR:

root@oxz_crucible_a124c3ef:~# cat /var/svc/manifest/site/crucible/agent.xml 
<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">

<service_bundle type='manifest' name='oxide-crucible-agent'>

<service name='oxide/crucible/agent' type='service' version='1'>
  <create_default_instance enabled='true' />

  <!-- Run once we hit multi-user, so that the network and file systems have
    been set up. -->
  <dependency name='multi-user' grouping='require_all' restart_on='none'
    type='service'>
    <service_fmri value='svc:/milestone/multi-user' />
  </dependency>

  <exec_method type='method' name='start'
    exec='/opt/oxide/lib/svc/manifest/crucible/agent.sh'
    timeout_seconds='30'
    />

  <exec_method type='method' name='stop' exec=':kill' timeout_seconds='30' />

  <property_group name='startd' type='framework'>
    <propval name='duration' type='astring' value='child' />
  </property_group>

  <property_group name='config' type='application'>
    <propval name='datalink' type='astring' value='unknown' />
    <propval name='gateway' type='astring' value='unknown' />
    <propval name='dataset' type='astring' value='' />
    <propval name='listen_addr' type='astring' value='127.0.0.1' />
    <propval name='listen_port' type='astring' value='17000' />
    <propval name='uuid' type='astring' value='' />
    <propval name='nexus' type='astring' value='127.0.0.1:12221' />
    <propval name='portbase' type='astring' value='19000' />
    <propval name='downstairs_prefix' type='astring' value='downstairs' />
    <propval name='snapshot_prefix' type='astring' value='snapshot' />
  </property_group>

  <stability value='Unstable' />

  <template>
    <common_name>
      <loctext xml:lang='C'>Oxide Crucible Downstairs</loctext>
    </common_name>
    <description>
      <loctext xml:lang='C'>Disk-side storage component</loctext>
    </description>
  </template>
</service>

</service_bundle>
<!-- vim: set ts=2 sts=2 sw=2 et: -->

Is there anything I'm missing here?

And finally, next up is deploying this in conjunction with oxidecomputer/omicron#4927. Am I right in assuming the steps are:

  1. Merge this PR.
  2. Quickly copy the crucible git commit and SHA from buildomat, paste it on the omicron PR and merge fast before other people do things.

If so, it's probably best to hold off on doing the merge dance until after this upcoming release is done and dusted, WDYT?

@karencfv karencfv marked this pull request as ready for review February 8, 2024 08:58
@smklein
Copy link
Contributor

smklein commented Feb 8, 2024

It is actually possible to git commits and SHAs from branches that haven't been merged, if you want! You should definitely be able to test this with Omicron before merging this specific branch.

For example:

https://buildomat.eng.oxide.computer/public/file/oxidecomputer/crucible/image/COMMIT/crucible.tar.gz

If you use your latest commit - 3e2bd7c5f79b59c66d848a6c19042394c28616ff - as COMMIT, that downloads the tarball created from this repo. You can update Omicron locally and use prebuilt to use this artifact.

@smklein
Copy link
Contributor

smklein commented Feb 8, 2024

@karencfv
Copy link
Contributor Author

karencfv commented Feb 8, 2024

Thanks @smklein ! Trying out the different approaches we mentioned in matrix as well :)

@karencfv
Copy link
Contributor Author

karencfv commented Feb 9, 2024

Update:

I commented out source.commit and source.sha256 from package-manifest.toml and it still installed the old image 🤯

[package.crucible]
service_name = "crucible"
only_for_targets.image = "standard"
# To manually override the package source (for example, to test a change in
# both Crucible and Omicron simultaneously):
#
# 1. Build the zone image manually
# 2. Copy the output zone image from crucible/out to omicron/out
# 3. Use source.type = "manual" instead of "prebuilt"
source.type = "manual"
source.repo = "crucible"
# source.commit = "2d4bc11232d53f177c286383926fa5f8c1b2a938"
# The SHA256 digest is automatically posted to:
# https://buildomat.eng.oxide.computer/public/file/oxidecomputer/crucible/image/<commit>/crucible.sha256.txt
# source.sha256 = "88ec93657a644e8f10a32d1d22cc027db901aea81027f49ce7bee58fc4a35755"
output.type = "zone"
output.intermediate_only = true

[package.crucible-pantry]
service_name = "crucible_pantry"
only_for_targets.image = "standard"
source.type = "manual"
source.repo = "crucible"
# source.commit = "2d4bc11232d53f177c286383926fa5f8c1b2a938"
# The SHA256 digest is automatically posted to:
# https://buildomat.eng.oxide.computer/public/file/oxidecomputer/crucible/image/<commit>/crucible-pantry.sha256.txt
# source.sha256 = "e2c3ed2d4cd6b5da3d38dd52df6d4a259280be7d45c30a363e9c71b174ecc6f8"
output.type = "zone"
output.intermediate_only = true

What's even weirder is that the logs say the correct composite package I just created is being used:

$ pfexec ./target/release/omicron-package -t bench install
Feb 09 00:49:56.936 DEBG target[bench]: Target({"image": "standard", "machine": "non-gimlet", "rack-topology": "single-sled", "switch": "softnpu"})
Feb 09 00:49:56.939 INFO Installing service, dst: /opt/oxide/crucible-zone.tar.gz, src: out/crucible-zone.tar.gz
Feb 09 00:49:56.939 INFO Installing service, dst: /opt/oxide/cockroachdb.tar.gz, src: out/cockroachdb.tar.gz
Feb 09 00:49:56.939 INFO Installing service, dst: /opt/oxide/clickhouse_keeper.tar.gz, src: out/clickhouse_keeper.tar.gz
Feb 09 00:49:56.939 INFO Installing service, dst: /opt/oxide/clickhouse.tar.gz, src: out/clickhouse.tar.gz
Feb 09 00:49:56.939 INFO Installing service, dst: /opt/oxide/oxlog.tar, src: out/oxlog.tar
Feb 09 00:49:56.939 INFO Installing service, dst: /opt/oxide/overlay.tar.gz, src: out/overlay.tar.gz
Feb 09 00:49:56.939 INFO Installing service, dst: /opt/oxide/ntp.tar.gz, src: out/ntp.tar.gz
Feb 09 00:49:56.939 INFO Installing service, dst: /opt/oxide/internal_dns.tar.gz, src: out/internal-dns.tar.gz
Feb 09 00:49:56.939 INFO Installing service, dst: /opt/oxide/mgs.tar.gz, src: out/omicron-gateway-softnpu.tar.gz
Feb 09 00:49:56.940 INFO Installing service, dst: /opt/oxide/oximeter.tar.gz, src: out/oximeter.tar.gz
Feb 09 00:49:57.172 INFO Installing service, dst: /opt/oxide/mg-ddm.tar, src: out/mg-ddm-gz.tar
Feb 09 00:49:57.214 INFO Installing service, dst: /opt/oxide/nexus.tar.gz, src: out/omicron-nexus.tar.gz
Feb 09 00:49:57.409 INFO Installing service, dst: /opt/oxide/external_dns.tar.gz, src: out/external-dns.tar.gz
Feb 09 00:49:57.409 INFO Installing service, dst: /opt/oxide/propolis-server.tar.gz, src: out/propolis-server.tar.gz
Feb 09 00:49:57.804 INFO Installing service, dst: /opt/oxide/switch.tar.gz, src: out/switch-softnpu.tar.gz
Feb 09 00:49:57.822 INFO Installing service, dst: /opt/oxide/crucible_pantry_zone.tar.gz, src: out/crucible-pantry-zone.tar.gz
Feb 09 00:49:58.365 INFO Installing service, dst: /opt/oxide/sled-agent.tar, src: out/omicron-sled-agent.tar
Feb 09 00:50:00.973 INFO Unpacking service tarball, service_path: /opt/oxide/mg-ddm, tar_path: /opt/oxide/mg-ddm.tar
Feb 09 00:50:01.184 INFO Unpacking service tarball, service_path: /opt/oxide/sled-agent, tar_path: /opt/oxide/sled-agent.tar
Feb 09 00:50:01.315 INFO Unpacking service tarball, service_path: /opt/oxide/oxlog, tar_path: /opt/oxide/oxlog.tar
Feb 09 00:50:01.339 INFO Installing bootstrap service from /opt/oxide/sled-agent/pkg/manifest.xml

...and that tar file is correct

$ tar -tvf out/crucible-zone.tar.gz 
Decompressing 'out/crucible-zone.tar.gz' with '/usr/bin/gzcat'...
-rw-r--r--   0/0       61 Jul 24 01:21 2006 oxide.json
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest/site
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest/site/crucible
-rw-r--r--   0/0     2243 Jul 24 01:21 2006 root/var/svc/manifest/site/crucible/agent.xml
-rw-r--r--   0/0     1718 Jul 24 01:21 2006 root/var/svc/manifest/site/crucible/downstairs.xml
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib/svc
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib/svc/manifest
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/lib/svc/manifest/crucible
-rwxr-xr-x   0/0      992 Jul 24 01:21 2006 root/opt/oxide/lib/svc/manifest/crucible/downstairs.sh
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/crucible
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/crucible/bin
-rwxr-xr-x   0/0   24107632 Jul 24 01:21 2006 root/opt/oxide/crucible/bin/crucible-agent
-rwxr-xr-x   0/0   50996384 Jul 24 01:21 2006 root/opt/oxide/crucible/bin/crucible-downstairs
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest/site
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/var/svc/manifest/site/zone-network-setup
-rw-r--r--   0/0     1609 Jul 24 01:21 2006 root/var/svc/manifest/site/zone-network-setup/manifest.xml
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/zone-network-setup
-rwxr-xr-x   0/0        0 Jul 24 01:21 2006 root/opt/oxide/zone-network-setup/bin
-rwxr-xr-x   0/0   33323240 Jul 24 01:21 2006 root/opt/oxide/zone-network-setup/bin/zone-networking

I thought perhaps I made a mistake adding the networking service, but the zone even has the agent.sh file I deleted here

root@oxz_crucible_cce7dd93:~# ls /opt/oxide/lib/svc/manifest/crucible/
agent.sh       downstairs.sh

🤷‍♀️

I seriously don't know what's going on here, I'll try with this suggestion next and report back (this may take a while since my little ThinkCentre tends to crash when I'm installing Omicron 😢 )! #1096 (comment)

@smklein
Copy link
Contributor

smklein commented Feb 9, 2024

@karencfv i think I need to understand your installation workflow a little better to understand "why this failed". Are you using multiple machines? Are you using a single machine?

The two most critical commands here are package and install:

Package

So in summary:

  • manifest + target describe which packages to build
  • they get built, and put in out/
  • after running package, you should be able to inspect what you just built

Install

Feb 09 00:49:56.939 INFO Installing service, dst: /opt/oxide/crucible-zone.tar.gz, src: out/crucible-zone.tar.gz

Now, it looks like that command is identifying "I want to grab the thing from out/crucible-zone.tar.gz and install it. This looks like it's working as expected.

In your log, here's what I see, filtering for crucible:

Feb 09 00:49:56.936 DEBG target[bench]: Target({"image": "standard", "machine": "non-gimlet", "rack-topology": "single-sled", "switch": "softnpu"})
Feb 09 00:49:56.939 INFO Installing service, dst: /opt/oxide/crucible-zone.tar.gz, src: out/crucible-zone.tar.gz
Feb 09 00:49:57.822 INFO Installing service, dst: /opt/oxide/crucible_pantry_zone.tar.gz, src: out/crucible-pantry-zone.tar.gz

I do not see installation of a package named crucible.tar.gz. On main, I do see this package.

Is it possible that you renamed the package, but the system is still looking for crucible.tar.gz, which was using the old installed file?

@karencfv
Copy link
Contributor Author

Is it possible that you renamed the package, but the system is still looking for crucible.tar.gz, which was using the old installed file?

aha! yes, looks like the service name must be the same as the name of the zone oxidecomputer/omicron@2f790f4 Where would be the best place for me to document this? in the omicron-package repo, or directly on the omicron one?

@smklein Thanks for the help figuring this out!

Ok, so I got the omicron PR working temporarily with the packages from my branch (I'll update the SHAs when I merge this). The composite packages are correct https://github.com/oxidecomputer/omicron/pull/4927/checks?check_run_id=21460157839 and the services started up correctly https://github.com/oxidecomputer/omicron/pull/4927/checks?check_run_id=21460157724

Networking for crucible: https://buildomat.eng.oxide.computer/wg/0/artefact/01HPDEWZGTF2TFK1P2GJE9H6XK/sMaio44NoyL7GBt8sPDNzWGp8BVVE0rjpB4O97GqaNfetaCt/01HPDEX98DHY3BEV889Q524Z7F/01HPDJ7BB6AP601H76THPY7JF3/oxide-zone-network-setup:default.log
Crucible agent: https://buildomat.eng.oxide.computer/wg/0/artefact/01HPDEWZGTF2TFK1P2GJE9H6XK/sMaio44NoyL7GBt8sPDNzWGp8BVVE0rjpB4O97GqaNfetaCt/01HPDEX98DHY3BEV889Q524Z7F/01HPDJ7725FJRZSR2R2W52F6MD/oxide-crucible-agent:default.log
2 crucible downstairs: https://buildomat.eng.oxide.computer/wg/0/artefact/01HPDEWZGTF2TFK1P2GJE9H6XK/sMaio44NoyL7GBt8sPDNzWGp8BVVE0rjpB4O97GqaNfetaCt/01HPDEX98DHY3BEV889Q524Z7F/01HPDJ796M5EKVJFAPW92G5822/oxide-crucible-downstairs:downstairs-eb322695-b813-4b40-992d-02ce0c880a67.log?format=x-bunyan and https://buildomat.eng.oxide.computer/wg/0/artefact/01HPDEWZGTF2TFK1P2GJE9H6XK/sMaio44NoyL7GBt8sPDNzWGp8BVVE0rjpB4O97GqaNfetaCt/01HPDEX98DHY3BEV889Q524Z7F/01HPDJ7A8WK87JJG53A70V7FNT/oxide-crucible-downstairs:downstairs-f2577088-47d9-4120-a85b-d4d539d8d16f.log?format=x-bunyan (These looked odd to me but I checked other test runs from other PRs and the logs look the same)

Networking for crucible pantry: https://buildomat.eng.oxide.computer/wg/0/artefact/01HPDEWZGTF2TFK1P2GJE9H6XK/sMaio44NoyL7GBt8sPDNzWGp8BVVE0rjpB4O97GqaNfetaCt/01HPDEX98DHY3BEV889Q524Z7F/01HPDJ7DRBQR7SF4W10FQK0WPM/oxide-zone-network-setup:default.log?format=x-bunyan
Crucible pantry: https://buildomat.eng.oxide.computer/wg/0/artefact/01HPDEWZGTF2TFK1P2GJE9H6XK/sMaio44NoyL7GBt8sPDNzWGp8BVVE0rjpB4O97GqaNfetaCt/01HPDEX98DHY3BEV889Q524Z7F/01HPDJ7CP2SR21CJ8437F7MYB7/oxide-crucible-pantry:default.log?format=x-bunyan

All looks good to me unless there are any objections :)

And finally, next up is deploying this in conjunction with oxidecomputer/omicron#4927. Am I right in assuming the steps are:

  1. Merge this PR.
  2. Quickly copy the crucible git commit and SHA from buildomat, paste it on the omicron PR and merge fast before other people do things.

If so, it's probably best to hold off on doing the merge dance until after this upcoming release is done and dusted, WDYT?

@karencfv
Copy link
Contributor Author

@smklein @leftwo tiny ping :)

Now that the release is behind us, I'd like to see if we can get this merged. Also, could I get confirmation that the merge dance I described above is the process we want to follow?

@smklein
Copy link
Contributor

smklein commented Feb 20, 2024

@smklein @leftwo tiny ping :)

Now that the release is behind us, I'd like to see if we can get this merged. Also, could I get confirmation that the merge dance I described above is the process we want to follow?

I think the approach of "merge in the non-Omicron repos first, then update the SHA / commits in Omicron pointing to the new repo rev makes sense". As mentioned last week -- this should be doable with commits that haven't been merged if you want to test manually before merging.

Comment on lines 37 to 38
<propval name='uuid' type='astring' value='' />
<propval name='nexus' type='astring' value='127.0.0.1:12221' />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They might not be used -- I think nexus might have been a relic from before using DNS to find the Nexus address

@karencfv
Copy link
Contributor Author

As mentioned last week -- this should be doable with commits that haven't been merged if you want to test manually before merging.

Thanks @smklein! I was able to test here oxidecomputer/omicron#4927 I swapped the commits for these and all worked fine

@karencfv karencfv merged commit fe0c5c7 into oxidecomputer:main Feb 20, 2024
18 checks passed
@karencfv karencfv deleted the use-nw-service branch February 20, 2024 22:57
karencfv added a commit to oxidecomputer/omicron that referenced this pull request Feb 21, 2024
Create new packages for crucible and pantry to include the zone network
config service.

Depends on oxidecomputer/crucible#1096.

These two PRs should be merged in coordination

Related: #1898

### Crucible updates

This PR also merges a few changes from Crucible:

* fe0c5c7 - [smf] Use new zone network config service  
* 3d48060 - (upstream/main) Move a few methods into downstairs 
* b01e15c - Remove extra clone in upstairs read 
* b4f37b4 - Make `crucible-downstairs` not depend on upstairs 
* 733b7f9 - Update Rust crate rusqlite to 0.31 
* 961e971 - Update Rust crate reedline to 0.29.0 
* b946a04 - Update Rust crate clap to 4.5 
* 39f1f3f - Update Rust crate indicatif to 0.17.8 
* 4ea9387 - Update progenitor to bc0bb4b 
* ace10f4 - Do not 500 on snapshot delete for deleted region 
* 4105133 - Drop jobs from Offline downstairs. 
* 43dace9 - `Mutex<Work>` → `Work` 
* a1f3207 - Added a contributing.md 
* 13b8669 - Remove ExtentFlushClose::source_downstairs 
* 9b3f366 - Remove unnecessary mutexes from Downstairs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants