-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rearrange buildomat jobs; rewrite releng process in rust and aggressively parallelize #5744
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like it means it's really easy to build a TUF repo via xtask?
client | ||
.get(format!("{}/artifact/{}", base_url, hash)) | ||
.send() | ||
.and_then(|response| response.json()) | ||
.await | ||
.with_context(|| { | ||
format!( | ||
"failed to fetch hubris artifact {} from {}", | ||
hash, base_url | ||
) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's unfortunate that permslip is still a private repo but I guess it's mostly just a wrapper around reqwest anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an earlier revision (not pushed) I ended up copying the openapi.json out of the repo and generating a client with Progenitor, and then I realized how much that was slowing down rust-analyzer just for a single API call. So I went with this relatively-small wrapper.
(we haven't used this since starting to use permslip for hubris artifacts, anyway)
That, and running it locally re-uses all the Omicron artifacts too, so iterating on this process is a lot easier. (I think it takes about 8 minutes total if the control plane is previously built; that time is mostly building the OS images.) |
Thanks for the great description, and looking forward to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First-pass review, generally looks good!
@@ -3,24 +3,11 @@ | |||
#: name = "helios / package" | |||
#: variety = "basic" | |||
#: target = "helios-2.0" | |||
#: rust_toolchain = "1.72.1" | |||
#: rust_toolchain = "1.77.2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to stay in sync with rust-toolchain.toml
? Can we read from that file directly instead? To be honest I didn't know this existed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes:
Lines 1 to 7 in d2ed452
[toolchain] | |
# NOTE: This toolchain is also specified in various jobs in | |
# .github/buildomat/jobs/. If you update it here, update those files too. | |
# | |
# We choose a specific toolchain (rather than "stable") for repeatability. The | |
# intent is to keep this up-to-date with recently-released stable Rust. | |
channel = "1.77.2" |
If it's not in sync it doesn't actually break anything, but the job runs a bit slower (Buildomat downloads 1.72.1 and then rustup's Cargo proxy sees you actually wanted 1.77.2 and downloads that).
Buildomat feature request: oxidecomputer/buildomat#55
dev-tools/releng/src/cmd.rs
Outdated
for (name, value) in command.get_envs() { | ||
if let Some(value) = value { | ||
write!( | ||
command_str, | ||
"{}={} ", | ||
shell_words::quote(&name.to_string_lossy()), | ||
shell_words::quote(&value.to_string_lossy()) | ||
) | ||
.unwrap(); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh good spot.
docs/releng.adoc
Outdated
also manage installation and uninstallation of these zones; see | ||
how-to-run.adoc.) | ||
. Some of the packaged artifacts are installed directly in the OS | ||
images; `cargo xtask releng` unpacks these into a temporary directory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a bit of thinking to understand this — I assume it refers to stuff like putting the console assets directly in the file system of the control plane zone in the host image. But the line as-is is pretty cryptic; it could stand to be two or three sentences.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I rewrote this bullet, is it better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked through again, just a few more comments but pretty happy otherwise!
I think I've addressed all outstanding review feedback; I still need to figure out a pattern for external xtasks (clap is currently making me unhappy), but I think that's the last TODO here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
love it, thanks for doing this work!
impl std::fmt::Display for Command { | ||
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { | ||
let command = self.inner.as_std(); | ||
for (name, value) in command.get_envs() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How noisy is this? I think this would print out the entire environment, which would be a lot: on my machine, export | wc -c
returns 4923.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only prints the environment variables explicitly set in the command builder. https://doc.rust-lang.org/stable/std/process/struct.Command.html#method.get_envs
struct Args { | ||
/// ZFS dataset to use for `helios-build` when building the host image | ||
#[clap(long, default_value_t = Self::default_dataset("host"))] | ||
host_dataset: String, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is there some validation we can do on the dataset names? Would that be worth it at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think verifying the datasets already exist (one of the preflight checks we run) is sufficient.
Despite the documentation all saying |
I forgot to remove these in #5744. These scripts were adapted into `cargo xtask releng` and aren't referenced anywhere in this repo anymore.
Note: documentation says
cargo xtask releng
but I am going to wire that up in a follow-up PR; the current equivalent iscargo run --release --bin omicron-releng
.Prior to this change we have five main "release engineering" Buildomat jobs that do operations beyond running the test suite:
This looks like:
(There are also the currently-disabled a4x2 jobs but those are independent of this particular graph.)
I think the initial idea behind this was to reuse build artifacts where possible, but this is pretty complicated and adds a lot more output upload/download overhead than expected, which slows down the time to get the end artifact we actually want.
This PR changes the graph to:
And the TUF repo job primarily runs a new releng binary, which runs all of the steps required to download and build all the components of the TUF repo in a single task, using a terrible job runner I wrote.
(thank you to @lifning for the image)
The primary goal here was to reduce the time from pushing a commit to getting a TUF repo out the other end; this drops time-to-TUF-repo from ~80 minutes to ~45. In the process this also made it much easier to build a TUF repo (and iterate on that process) locally: just run
cargo xtask releng
(TODO: soon). It also deleted a lot of Bash.One thing to note is that, in service of the mission to get time-to-TUF-repo down as much as possible, that job only uploads the TUF repo (and some logs). I also put all of the outputs for the package job into a single tarball for the deploy job to unpack. There are no longer separate uploads for the OS images and each zone; these can be extracted from the repo as we normally do.