Skip to content

Latest commit

 

History

History
665 lines (514 loc) · 26.9 KB

README.md

File metadata and controls

665 lines (514 loc) · 26.9 KB

hammer logo
B U I L D O M A T
a software build labour-saving device


Buildomat manages the provisioning of ephemeral UNIX systems (e.g., instances in AWS EC2) on which to run software builds. It logs job output, collects build artefacts, and reports status. The system integrates with GitHub through the Checks API, to allow build jobs to be triggered by pushes and pull requests.

Components

Buildomat is made up of a variety of crates, loosely grouped into areas of related functionality:

$ cargo xtask crates
buildomat                    /bin
buildomat-agent              /agent
buildomat-bunyan             /bunyan
buildomat-client             /client
buildomat-common             /common
buildomat-database           /database
buildomat-server             /server
buildomat-types              /types

buildomat-factory-aws        /factory/aws
buildomat-factory-lab        /factory/lab

buildomat-github-common      /github/common
buildomat-github-database    /github/database
buildomat-github-dbtool      /github/dbtool
buildomat-github-ghtool      /github/ghtool
buildomat-github-server      /github/server

xtask                        /xtask

Buildomat Core

The buildomat core is responsible for authenticating users and remote services, for managing build systems, and for running jobs and collecting output.

Server (buildomat-server, in server/)

The core buildomat API server. Coordinates the creation, tracking, and destruction of workers in which to execute jobs. This component sits at the centre of the system and is used by the GitHub integration server, the client command, the agent running within each worker for control of the job, and any factories.

Client Command (buildomat, in bin/)

A client tool that uses the client library to interface with and manipulate the core server. The tool has both administrative and user-level functions, expressed in a relatively regular hierarchy of commands; e.g., buildomat job run or buildomat user ls.

$ ./target/release/buildomat
Usage: buildomat [OPTS] COMMAND [ARGS...]

Commands:
    info                get information about server and user account
    control             server control functions
    job                 job management
    user                user management


Options:
        --help          usage information
    -p, --profile PROFILE
                        authentication and server profile

ERROR: choose a command

Client Library (buildomat-client, in client/)

A HTTP client library for accessing the core buildomat server. This client is generated at build time by progenitor, an OpenAPI client generator.

The client is generated based an OpenAPI document managed in the repository and generated by Dropshot based on the implementation of the server and then checked in to the repository. If you make changes to the API exposed by the core server, you will need to regenerate the document, client/openapi.json, using:

$ cargo xtask openapi

Agent (buildomat-agent, in agent/)

A process that is injected into an ephemeral AWS EC2 instance to allow the buildomat core server to take control and run jobs. This process receives single-use credentials at provisioning time from the core server, and connects out to receive instructions. The agent does not require a public IP, or any direct inbound connectivity, to allow agents to run inside remote NAT environments.

Factories

Buildomat jobs are specified to execute within a particular target environment. Concrete instances of those target environments (known as workers) are created, managed, and destroyed by factories. Factories are long-lived server processes that connect to the core API and offer to construct workers as needed. When a worker has finished executing the job, or when requested by an operator, the factory is also responsible for freeing any resources that were in use by the worker.

AWS Factory (buildomat-factory-aws in factory/aws/)

The AWS factory creates ephemeral AWS instances that are used to run one job and are then destroyed. The factory arranges for the agent to be installed and start automatically in each instance that is created. The factory is responsible for ensuring no stale resources are left behind, and for enforcing a cap on the concurrent use of resources at AWS. Each target provided by an AWS factory can support a different instance type (i.e., CPU and RAM capacity), a different image (AMI), and a different root disk size.

Lab Factory (buildomat-factory-lab in factory/lab/)

The lab factory uses IPMI to exert control over a set of physical lab systems. When a worker is required, a lab system is booted from a ramdisk and the agent is started, just as it would be for an AWS instance. From that point on, operation is quite similar to AWS instances: the agent communicates directly with the core API. When tearing down a lab worker, the machine is rebooted (again via IPMI) to clear out the prior ramdisk state. Each target provided by a lab factory can boot from a different ramdisk image stored on a local server.

GitHub Integration (formerly known as Wollongong)

The GitHub-specific portion of the buildomat suite sits in front of the core buildomat service. It is responsible for receiving and processing notifications of new commits and pull requests on GitHub, starting any configured build jobs, and reporting the results so that they are visible through the GitHub user interface.

Server (buildomat-github-server, in github/server/)

This server acts as a GitHub App. It is responsible for processing incoming GitHub webhooks that notify the system about commits and pull requests in authorised repositories. In addition to relaying jobs between GitHub and the buildomat core, this service provides an additional HTML presentation of job state (e.g., detailed logs) and access to any artefacts that jobs produce. This server keeps state required to manage the interaction with GitHub, but does not store job data; requests for logs or artefacts are proxied back to the core server.

Database Tool (buildomat-github-dbtool, in github/dbtool/)

This tool can be used to inspect the database state kept by the GitHub integration as it tracks GitHub pull requests and commits. Unlike the core client tool, this program directly interacts with a local SQLite database.

$ buildomat-github-dbtool
Usage: buildomat-github-dbtool COMMAND [ARGS...]

Commands:
    delivery (del)      webhook deliveries
    repository (repo)   GitHub repositories
    check               GitHub checks


Options:
    --help              usage information

ERROR: choose a command

Of particular note, the tool is useful for inspecting and replaying received webhook events; e.g.,

$ buildomat-github-dbtool del ls
SEQ   ACK RECVTIME             EVENT          ACTION
0     1   2021-10-05T01:58:32Z ping           -
1     1   2021-10-05T02:25:33Z installation   created
2     1   2021-10-05T02:26:53Z push
3     1   2021-10-05T02:26:53Z check_suite    requested
4     1   2021-10-05T02:26:56Z check_suite    completed
5     1   2021-10-05T02:26:56Z check_run      completed
6     1   2021-10-05T02:26:56Z check_run      created
7     1   2021-10-05T02:26:56Z check_run      created
8     1   2021-10-05T02:26:57Z check_run      created
...

The buildomat-github-dbtool del unack SEQ command can be used to trigger the reprocessing of an invididual webhook message.

Per-repository Configuration

Buildomat works as a GitHub App, which is generally "installed" at the level of an Organisation. Installing the App allows buildomat to receive notifications about events, such as git pushes and pull requests, from all repositories (public and private) within the organisation. In order to avoid accidents, buildomat requires that the service be explicitly configured for a repository before it will take any actions.

Per-repository configuration is achieved by creating a file in the default branch (e.g., main) of the repository in question, named .github/buildomat/config.toml. This file is written in TOML, with a handful of simple values. Supported properties in this file include:

  • enable (boolean)

    Must be present and have the value true in order for buildomat to consider the repository for jobs; e.g.,

    enable = true
  • org_only (boolean, defaults to true if missing)

    If set to true, or missing from the file, buildomat will not automatically run jobs in response to pull requests opened by users that are not a member of the GitHub Organisation which owns the repository. If set to false, any GitHub user can cause a job to be executed.

    This property is important for security if your repository is able to create any jobs that have access to secrets, or to restricted networks.

  • allow_users (array of strings, each a GitHub login name)

    If specified, jobs will be started automatically for users in this list, regardless of whether they are a member of the Organisation that owns the repository or not, and regardless of the value of the org_only property.

    This is often useful for pre-authorising jobs driven by Pull Requests made by various automated systems; e.g.,

    allow_users = [
            "dependabot[bot]",
            "renovate[bot]",
    ]

Note that buildomat will only ever read this configuration file from the most recent commit in the default branch of the repository, not from the contents of another branch or pull request. This is of particular importance for security-sensitive properties like org_only, where the policy set by users with full write access to the repository must not be overridden by changes from potentially untrusted users. If a pull request with a malicious policy change is merged, it will then be in the default branch and active for subsequent pull requests; maintainers must carefully review pull requests that change this file.

Specifying Jobs

Once you have configured buildomat at the repository level, you can specify some number of jobs to execute automatically in response to pushes and pull requests. While per-repository configuration is read from the default branch, jobs are read from the commit under test.

Jobs are specified as bash programs with some configuration directives embedded in comments. These job files must be named .github/buildomat/jobs/*.sh. Unexpected additional files in .github/buildomat/jobs will result in an error.

Job files should begin with an interpreter line, followed by TOML-formatted configuration prefixed with #: so that they will be identified as configuration by buildomat, but ignored by the shell. For example, a minimal job that would just execute uname -a:

#!/bin/bash
#:
#: name = "build"
#: variety = "basic"
#:
uname -a

The minimum set of properties that must always appear in the TOML frontmatter is:

  • name (string)

    Must be present in all jobs. This name is used for at least two things: as the name of the Check Run in the GitHub user interface, and when specifying that some other job depends on this job. The job name must be unique amongst all jobs within the commit under test.

    In general, it is probably best to keep these short, lower-case, and without spaces. It is conventional to use the same name for the job file and the job, e.g., name = "build" in file .github/buildomat/jobs/build.sh.

  • variety (string)

    To allow the system to evolve over time, a job must specify a variety, which defines some things about the way a job executes and what additional configuration options are required or available.

These properties are optional, but not variety-specific:

  • enable (boolean)

    To exclude a particular job file from processing, set this to false. If not specified, this property defaults to true. This allows a job to be temporarily disabled without needing to be removed from the repository.

The rest of the configuration is variety-specific.

Variety: Basic

Each basic variety job (selected by specifying variety = "basic" in the frontmatter) takes a single bash program and runs it in an ephemeral environment. The composition of that environment, such as compute and memory capacity or the availability of specific toolchains and other software, depends on the target option.

Basic variety jobs can produce output files (see the configuration options output_rules and publish). They can also depend on the successful completion of other jobs, gaining access to any output files from the upstream job (see the dependencies option). Jobs are generally executed in parallel, unless they are waiting for a dependency or for capacity to become available.

Execution Environment

By default, an ephemeral system (generally a virtual machine) will be provisioned for each job. The system will be discarded at the end of the job, so no detritus is left behind. Once the environment is provisioned, the bash program in the job file is executed as-is.

Jobs are executed as an unprivileged user, build, with home directory /home/build. If required, this user is able to escalate to root privileges through the use of pfexec(1). Systems that do not have a native pfexec will be furnished with a compatible wrapper around a native escalation facility, to ease the construction of cross-platform jobs.

By default, the working directory for the job is based on the name of the repository; e.g., for https://github.com/oxidecomputer/buildomat, the working directory would be /work/oxidecomputer/buildomat. The system will arrange for the repository to be cloned at that location with the commit under test checked out. A simple job could directly invoke some build tool like gmake or cargo build, and the build would occur at the root of the clone. The skip_clone configuration option can disable this behaviour.

Most targets provide toolchains from common metapackages like build-essential; e.g., gmake and gcc. If a Rust toolchain is required, one can be requested through the rust_toolchain configuration option. This will be installed using rustup.

Environment Variables

While the complete set of environment variables is generally target-specific, the common minimum for all targets includes:

  • BUILDOMAT_JOB_ID will be set to the unique ID of this job
  • CI will be set to true
  • GITHUB_REPOSITORY set to owner/repository; e.g., oxidecomputer/buildomat
  • GITHUB_SHA set to the commit ID of the commit under test
  • If the commit under test is part of a branch, then GITHUB_BRANCH will be set to the branch name (e.g., main) and GITHUB_REF will be set to the ref name; e.g., res/heads/main.
  • HOME, set to the home directory of the build user
  • USER and LOGNAME, set to the username of the build user
  • PATH set to include relevant directories for toolchains and other useful software
  • TZ will be set to UTC
  • LANG and LC_ALL will be set to en_US.UTF-8
Available Commands

Cross-platform shell programming can be challenging due to differences between different operating systems. To make this a little easier, we ensure that each buildomat target can provide a basic suite of tools that are helpful in constructing succint jobs:

  • pfexec(1) allows escalation from the unprivileged build user to root; e.g., pfexec id -a.
  • ptime(1) runs a program and provides (with -m, detailed) timing information; e.g., ptime -m cargo test.
  • banner(1) prints its arguments in large letters on standard output, and is useful for producing headings in job log output; e.g., banner testing.

Configuration

Configuration properties supported for basic jobs include:

  • access_repos (array of strings)

    Jobs can be created in both public and private repositories. Public repositories are available to everybody, but private repositories require appropriate credentials. By default, an ephemeral, read-only token is injected into the execution environment (in the $HOME/.netrc file) that is able to access only the repository directly under test.

    If a job requires access to additional private repositories beyond the direct repository, they may be specified in this list, in the form owner/repository; e.g.,

    #: access_repos = [
    #:	"oxidecomputer/clandestine-project",
    #:	"oxidecomputer/secret-plans",
    #: ]

    Note that this option only works for repositories within the same organisation as the direct repository. Using the option will trigger a requirement for job-level authorisation by a member of the organisation.

  • dependencies (table)

    A job may depend on the successful completion of one or more other jobs from the same commit under test. If the dependency is cancelled or fails to complete successfully for some other reason, that failure will be propagated forward as a failure of this job.

    Each entry in the dependencies table is itself a table with a name for the dependency, and the following per-dependency properties:

    • job (string)

      Specifies the job that this job should wait on for execution. The job value must exactly match the name property of some other basic variety job available in the same commit.

    Any artefacts output by the job named in the dependency will be made available automatically under /input/$dependency using the dependency name. For example, consider this dependency directive:

    #: [dependencies.otherjob]
    #: job = "the-other-job!"

    If the job with the name the-other-job! produces an output file, /tmp/output.zip, then it will be made available within this job as the file /input/otherjob/tmp/output.zip.

    Using this facility, one can easily split a job into a "build" phase that runs under a target with access to toolchains, and one or more "test" phases that can take the build output and run it in under another target that might not have a toolchain or may have access to other resources that have limited availability like test hardware.

    Jobs can also depend on more than one other job, allowing a job to aggregate artefacts from several other jobs together in one place. This might be useful when building binaries for more than one different OS, with a final step that publishes multi-OS packages if all the other builds were successful.

    Cycles in the dependency graph are not allowed.

  • output_rules (array of strings)

    Jobs may produce artefacts that we wish to survive beyond the lifetime of the ephemeral build environment. A job may specify one or more files for preservation by the system; e.g., a build job may produce binaries or packages that can then be downloaded and installed, or a test job may produce JUnit XML files or other diagnostic logs that can be inspected by engineers.

    The output_rules property is a list of /-anchored glob patterns that match files in the ephemeral machine; e.g., /tmp/*.txt would match /tmp/something.txt but not /tmp/nothing.png. Like the shell, a single asterisk (*) will not descend into a hierarchy of directories. If you want to match recursively, a double asterisk (**) pattern will match the current directory or any directory under that directory, but not files. You can combine these to get a recursive match; e.g., /tmp/**/*.txt would match /tmp/a.txt, /tmp/dir/a.txt, and /tmp/dir/dir/a.txt.

    By default, it is not an error to specify a pattern that does not match any files. Provided the job is not cancelled, matching files are uploaded whether the job program exits with a zero status (denoting success) or a non-zero status (denoting failure). These behaviours can be used to upload diagnostic logs left behind by unexpected test failures that are cleaned up on success; e.g.,

    #: output_rules = [
    #:	"/tmp/test_output/*",
    #: ]

    If the success of a job requires that a particular artefact is produced, the = prefix may be used to signify "this rule must match at least one file". If the rule does not match at least one output file, the job is marked as failed even if the job program otherwise succeeded. This can be used to make sure that, say, a release binary build job produces an archive with the expected name; e.g.,

    #: output_rules = [
    #:	"=/work/pkg/important.tar.gz",
    #:	"=/work/pkg/important.sha256.txt",
    #: ]

    By default, the system attempts to ensure that a job has not accidentally left background processes running that continue to modify the output artefacts. If the size or modified time of a file changes while it is being uploaded, the job will fail. To relax this restriction, the % prefix may be used to signify that "this file is allowed to change while it is being uploaded". The % prefix will also ignore a file that is completely removed by a background process before it is able to be uploaded. This is used to make best effort uploads of diagnostic log files for background processes which may continue running even though the job is nominally complete; e.g.,

    #: output_rules = [
    #:	"%/var/svc/log/*.log",
    #: ]

    To exclude specific files from upload, the ! prefix can be used to signify that "any file that matches this pattern should be ignored, even if it was nominally included by another pattern". Order in the array is not important; a match of any exclusion rule will prevent that file from behing uploaded. For example, to upload anything left in /tmp except for pictures:

    #: output_rules = [
    #:	"/tmp/*",
    #:	"!/tmp/*.jpg",
    #: ]

    The must-match (=) and allow-change (%) prefixes may be combined in a single output rule. The exclusion prefix (!) may not be combined with any other prefix. For example, to require at least one log file (which may still be growing) that is not big-and-useless.log:

    #: output_rules = [
    #:	"=%/tmp/*.log",
    #:	"!/tmp/big-and-useless.log",
    #: ]
  • publish (array of tables)

    Some jobs may wish to publish a specific subset of their output artefacts at a predictable URL based on the commit ID of the commit under test, for reference by other jobs from other repositories, or end user tools.

    Each table in the publish array of tables must contain these properties:

    • from_output (string)

      Specify the full path of the output artefact to be published without using any wildcard syntax. The output rule that provides this artefact should be specified using a must-match (=) prefix so that the job fails if it is not produced. Each publish entry can specify exactly one output artefact.

    • series (string)

      Specify a series name to group a set of uploads together. This is useful to group related files together in the URL space, even if they are produced by several different jobs. This value should be short and URL-safe.

    • name (string)

      Specify the publically visible name of this file, which must be unique within the series for this commit for this repository. This value should be short and URL-safe.

    Each file published this way will be available at a predictable URL of the form:

    https://buildomat.eng.oxide.computer/public/file/OWNER/REPO/SERIES/VERSION/NAME
    

    The VERSION value is the commit ID (full SHA) of the commit under test, and the SERIES and NAME come from the publish entry.

    For example, if commit e65aace9237833ec775253cfde97f59a0af5bc3d from repository oxidecomputer/software included this publish directive:

    #: [[publish]]
    #: from_output = "/work/important-packaged-files.tar.gz"
    #: series = "packages"
    #: name = "files.tar.gz"

    A published file would be available at the URL:

    https://buildomat.eng.oxide.computer/public/file/oxidecomputer/software/packages/e65aace9237833ec775253cfde97f59a0af5bc3d/files.tar.gz
    

    Note that files published this way from private repositories will be available without authentication.

  • rust_toolchain (string or boolean)

    If specified, rustup will be installed in the environment and the nominated toolchain will be available as the default toolchain. Any toolchain specification that rustup accepts should work here; e.g., something general like stable or nightly, or a specific nightly date, like nightly-2022-04-27.

    #: rust_toolchain = "stable"

    It is also possible to use the boolean value true here, at which point the system will interpret the contents of the rust-toolchain.toml file in the root of the repository to decide what to install. The file must contain a valid channel value, and may also contain a valid profile value. Neither the legacy (pre-TOML) file format, nor TOML files which contain the path directive, are supported.

    #: rust_toolchain = true
  • skip_clone (boolean)

    By default, a basic job will clone the repository and check out the commit under test. The working directory for the job will be named for the GitHub repository; e.g., for https://github.com/oxidecomputer/buildomat, the directory would be /work/oxidecomputer/buildomat.

    If this option is specifed with the value true, no clone will be performed. The working directory for the job will be /work without subdirectories. This is useful in targets that do not provide toolchains or git, or where no source files from the repository (beyond the job program itself) are required for correct execution.

    #: skip_clone = true
  • target (string)

    The target for a job, which specifies the composition of the execution environment, can be specified by name. Some targets (e.g., lab) are privileged, and not available to all repositories.

    The list of unrestricted targets available for all jobs includes:

    • helios-latest; an illumos execution environment (Oxide Helios distribution) running in an ephemeral virtual machine, with a reasonable set of build tools. 32GB of RAM and 200GB of disk should be available.
    • omnios-r151038; an illumos execution environment (OmniOS r151038 LTS) running in an ephemeral virtual machine, with a reasonable set of build tools. 32GB of RAM and 200GB of disk should be available.
    • ubuntu-18.04, ubuntu-20.04, and ubuntu-22.04; an Ubuntu execution environment running in an ephemeral virtual machine, with a reasonable set of build tools. 32GB of RAM and 200GB of disk should be available.

Licence

Unless otherwise noted, all components are licenced under the Mozilla Public License Version 2.0.