From 8559ab3c054ecf1833a254b5b6d3b6222591b356 Mon Sep 17 00:00:00 2001 From: Austin Seipp Date: Tue, 25 Jun 2024 00:03:06 -0500 Subject: [PATCH] docs: add buck2 docs Signed-off-by: Austin Seipp --- docs/buck2.md | 504 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 504 insertions(+) create mode 100644 docs/buck2.md diff --git a/docs/buck2.md b/docs/buck2.md new file mode 100644 index 0000000000..9b3e085ba0 --- /dev/null +++ b/docs/buck2.md @@ -0,0 +1,504 @@ +# Buck2 builds + +> [!TIP] +> This document is primarily of interest to developers. See also [Contributing] +> for more information on how to contribute in general. + +There is experimental support for building `jj` with [`buck2`][Buck2] as an +alternative to `cargo`. Buck2 is a hermetic and reproducible build system +designed for multiple programming languages. + +- If you're wondering "Why?", please read the section below titled "[Why + Buck2](#why-buck2)" +- If you're interested in using Buck2 for development, please read the section + below titled "[Step 1: Please please please install + Dotslash](#step-1-please-please-please-install-dotslash)" + +> [!WARNING] +> Buck2 support is a work in progress, and is not yet complete; writing patches +> still requires `cargo` in practice, and so it is not recommended for primary +> development use. + +## Current support & feature parity + +Some notes about build compatibility are included below. + +Legend: + +- ✅: Supported +- ⚠️: Partial support/WIP +- ❌: Not supported +- ❓: Status unknown/needs testing +- ⛔: Unsupported + +### Overall status + +| Feature | Status | +|------------------------|--------| +| `rust-analyzer` | ⚠️ | +| CI setup (GHA) | ✅ | +| Cargo (re)synchronization | ✅1 | + +1. `Cargo.toml` files remain the source of truth for Rust dependency info, and a +tool to resynchronize `BUILD` files with `Cargo.toml` is provided. + +| Unique features | Status | +|------------------------|--------| +| Hermetic toolchain | ⚠️ | +| RBE/GHA `ActionCache` | ❌ | +| Auto `gen-protos` | ✅ 1 | +| [OpenSSL-on-Win32](https://github.com/martinvonz/jj/pull/3554) | ❓ 2 | + +1. `gen-protos` rebuilds `.proto` files automatically if they change, so there +is no need to use the committed `.rs` files. +2. `libssh2` in this setup always uses `bssl`, but this particular issue + (ed25519 key support) has not been actually tested. + +### vs Cargo + +| Feature | Cargo | Buck2 | +|------------------------|-------|-------| +| `rust-analyzer` | ✅ | ⚠️ | +| Fully working build | ✅ | ✅ | +| Debug/Release configs | ✅ | ✅ | +| Full test suite | ✅ |️ ❌ | +| Release-able binaries | ✅ |️ ❌1,2 | +| Supports Nix devShell | ✅ |️ ⚠️3,4 | + +1. macOS and Windows binaries are theoretically usable and distributable (no 3rd +party shared object dependencies), except for being untested. +2. Linux binaries are fully supported but `musl` builds are not yet supported, +which makes them less useful for distribution. +3. Works fine on Linux, not macOS. +4. It is unclear whether Nix+Buck2 will be a supported combination in the long run + +### Platform support + +| OS | Architecture | Status | +|---------|--------------|--------| +| Linux | x86_64 | ✅ | +| | aarch64 | ⚠️1 | +| macOS | x86_64 | ❌ | +| | aarch64 | ✅ | +| Windows | x86_64 | ✅ | +| | aarch64 | ❌2 | + +1. Actually works fine, but there aren't premade `bindgen` binaries for aarch64, + so `bssl-sys` will currently fail. This can easily be worked around with a + minor tweak to the `BUILD` file for `bssl-sys` for now. +2. Entirely theoretical at this point because many other tools need to support + it, but a logical conclusion to all the other supported builds. + +## Step 1: Please please please install Dotslash + +Hermetic builds require using consistent tools across developers, and this is a +big selling point of solutions like Nix, Bazel, or Buck2. These tools can +download and manage consistent versions on our behalf. But then how do we +"bootstrap" the world with a consistent version of Buck2 to start the process? + +Answer: We use [Dotslash] to manage Buck2 versions in a way that's consistent +across all developers and amenable to version control. In short, a Dotslash file +is merely a JSON file that lists a binary that should be downloaded and run, +e.g. download binary `example.com/aarch64.tar.gz` on `aarch64-linux`, and run +the binary `bin/foo` inside. + +By marking these JSON files as `+x` executable, and using Dotslash as the +"interpreter" for them, we can transparently download and run the correct +version of Buck2 for the current platform. Most importantly, these JSON files +are very small, easy to read, and can be recorded in version control history. +That means you'll always get a consistent build even when checking out +historical versions or when working on a different machine. + +You can install Dotslash binaries by following the instructions at: + +- + +Or, if you have Rust installed, you can install Dotslash by running: + +```sh +cargo install dotslash +``` + +Or, if you have Nix, you can install that way as well: + +```sh +nix profile install 'nixpkgs#dotslash' +``` + +> [!TIP] +> Check out the [Dotslash documentation](https://dotslash-cli.com/docs/), +> including the "Motivation" section, for more information about the design and +> use of Dotslash. + +## Step 2: Building `jj` with Buck2 + +After installing `dotslash` into your `$PATH`, you can build `jj` with the +included `buck2` file under `./tools/bin`: + +```sh +# Linux/macOS +export $PATH="$(jj root)/tools/bin:$PATH" +buck2 run cli -- version +``` + +```powershell +# Windows +dotslash ./tools/bin/buck2 run cli -- version +``` + +Dotslash will transparently run the correct version of `buck2`, and `buck2` will +build and run `jj version` on your behalf. + +--- + +## Why Buck2 + +Currently Cargo suits the needs of the Jujutsu develoeprs quite well, as the +repository is almost entirely written in Rust. Despite that, certain limitations +exist, and as we look to grow and expand the project with new functionality some +of those become more apparent and difficult to handle. + +### Multi-language and project support + +The most glaring limitation of Cargo is that, like all other language-specific +build tools, its view of a build graph has dependencies between "targets", but +it has a limited language-specific notion of what a "target" is. Extending its +dependency graph beyond that is nearly impossible, resulting in extra tools +needed that express only coarse-grained dependencies between multiple large +tools. + +This has practical and pragmatic consequences. For Jujutsu, two of them in +particular are relevant to the developers today: usage of C, and usage of +JavaScript, and usage of Python. + +
+Case 1: C dependencies +
+
+
+Case 2: JavaScript usage + +We would like to implement an equivalent to Sapling's `sl web` command, and +perhaps even share the code for this with a project like [`gg`][gg] and package +Tauri apps. Usage of tech like Svelte or React will effectively require us to +integrate tools like `pnpm` or `yarn` into the build process. Even without those +tools, it will require our build graph to ultimately have knowledge of +JavaScript in some way. +
+
+
+Case 3: Python usage +
+
+ +The common refrain at this point is to use something like `cargo xtask` or +`make` in order to represent the dependency graph of the entire project. A +common belief is that doing so is low-cost because it does not "introduce new +dependencies" due to their ubiquity. However, the cost of a solution has to +consider not just adoption costs but ongoing costs to the system, and `xtask` +and `make` are not needed today, and so they really *are* new dependencies, +which means we need to support and understand them like any other. + +Ultimately, the solutions that arise from tacking `xtask` or `make` onto an +existing group of tools all run afoul of the same fundamental problems described +in Peter Miller's important 1997 paper ["Recursive Make Considered +Harmful"](https://aegis.sourceforge.io/auug97.pdf), including build graphs that +are too conservative (because finer dependencies can't be expressed, so you must +be safe) and are fundamentally incomplete (because the build system can't see +the whole picture of input and output files). + +[gg]: https://github.com/gulbanana/gg + +### Hermetic, safe, scalable builds + +> [!IMPORTANT] +> Buck2 builds of `jj` are not yet hermetically sound. In particular, +> unrestricted access to the filesystem is allowed at build time and we do not +> yet provide hermetic toolchains. + +Buck ultimately wants the build process to be a *pure function* of its inputs, +including all the compilers and tools and source code needed. Given the same +inputs, you always get the same outputs, down to the same binary bits. As a +result of this, the build graph that Buck constructs will be "complete" and will +fully capture the relationships between all inputs, commands, and outputs. That +means the build is fundamentally *hermetic*. + +Hermetic builds are essential as any project grows in size, because the +probability of introducing arbitrary untracked side effects into the build +process approaches 1 as time goes on, often with unintended consequences. The +most ideal outcome of this is a simple failure to compile; more dangerous +results are flaky builds, silent miscompilations, and non-deterministic build +outputs based on internal implementation details (e.g. an unstable sort). + +Furthermore, hermeticity is an essential feature for *build performance* at +scale because it is required to avoid miscompilation and overly conservative +rules, and required for implementing caching safely. The relationships Buck +captures are ultimately as fine grained as desired, down to individual files and +commands, across any language. Such fine detail can only be achieved with a very +complete understanding of the inputs. + +### Remote cache support + +Because Buck can see the entire build graph, and the input/output relationship +between every file and command, it is possible to cache the results of every +build command and every file that is produced, and then download them +transparently on another (compatible) machine. + +The most common case of this is between the CI system and the developer; every +change must pass CI to be merged, and when a change is merged the results of +that build are put in a public cache. A developer may go to sleep for the night +and something gets merged during their slumber. When they wake up, then can +update to the new `main` branch, run `buck2 build`, and will instantly get a +cached build instead of recompiling. + +### Early movers advantage + +NIH. + +## Buck2 crash course + +The following is an extremely minimal crash course in Buck2 concepts and how to +use it. + +### Targets + +Buck2 is used to build **targets**, that exists in **packages**, which are part +of a **cell**. The most explicit syntax for referring to a target is the +following: + +```text +cell//path/to/package:target-name +``` + +A cell is a short name that maps to a directory in the code repository. A +package is a subdirectory underneath the cell that contains the build rules for +the targets. A target is a buildable unit of code, like a binary or a library, +named in the `BUILD` file inside that package. + +`buck2 build` works by giving it a target name, like the one above. The above is +an example of a "fully qualified" target name which is an unambiguous reference. + +A fully-qualified reference to a target works anywhere in the source code tree, +so you can build or test any component no matter what directory you're in. + +So, given a cell named `foobar//` located underneath `code/foobar`, and a +package `bar/baz` in that cell, leads to a file + +```text +code/foobar/bar/baz/BUILD +``` + +Which contains the targets that can be built. + +There are several shorthands for a target: + +- NIH. + +### Abstract targets & action graphs + +NIH. + +### Target visibility + +### Package files + +In a package, there can exist a `PACKAGE` file alongside every `BUILD` file. The +package file can specifie metadata about the package, and also control the +default visibility of targets in the package. + +### Mode files + +In order to support concepts like debug and release builds, we use the concept +of "mode files" in Buck2. These are files that contain a list of command line +options to apply to a build to achieve the desired effect. + +For example, to build in debug mode, you can simply include the contents of the +file `mode//debug` (using cell syntax) onto the command line. This can +conveniently be done with "at-file" syntax when invoking `buck2`: + +```sh +buck2 build cli @mode//debug +buck2 build cli @mode//release +``` + +Where `@path/to/file` is the at-file syntax for including the contents of a file +on the command line. This syntax supports `cell//` references to Buck cells, as +well. + +In short, `buck2 build @mode//file` will apply the contents of `file` to your +invocation. We keep a convenient set of these files maintained under the +`mode//` cell, located under [`./buck/mode`](../buck/mode). + +#### At-file syntax + +The `buck2` CLI supports a convenient modern feature called "at-file" syntax, +where the invocation `buck2 @path/to/file` is effectively equivalent to the +bash-ism `buck2 $(cat path/to/file)`, where each line of the file is a single +command line entry, in a consistent and portable way that doesn't have any limit +to the size of the underlying file. + +For example, assuming the file `foo/bar` contained the contents + +```text +--foo=1 +--bar=false +``` + +Then `buck2 --test @foo/bar` and `buck2 --test --foo=1 --bar=false` are +equivalent. + +### Buck Extension Language (BXL) + +NIH. + +## Examples + +Some examples are included below. + +
+Run the jj CLI + +The following shorthand is equivalent to the full target `root//cli:cli`: + +```sh +buck2 run //cli +``` + +This works anywhere in the source tree. It can be shortened to `buck2 run cli` +if you're already in the root of the repository. +
+ +
+Run BoringSSL bssl speed tests + +```sh +buck2 run third-party//bssl @mode//release -- speed +``` + +
+ +
+Build all Rust dependencies + +```sh +buck2 build third-party//rust +``` + +
+ +
+Download all http_archive dependencies + +Useful for downloading all dependencies, then testing clean build times afterwards. + +```sh +buck2 build $(buck2 uquery "kind('http_archive', deps('//...'))" | grep third-party//) +``` + +
+ +## Development notes + +Notes for `jj` developers using Buck2. + +### Build mode reference + +You can pass these to any `build` or `run` invocation. + +- `@mode//debug` +- `@mode//release` + +### Cargo dependency management + +Although Buck2 downloads and runs `rustc` on its own to build crate +dependencies, our `Cargo.toml` build files act as the source of truth for +dependency information in both Cargo and Buck2. + +Updating the dependency graph for Cargo-based projects typically comes in one of +two forms: + +- Updating a dependency version in the top-level workspace `Cargo.toml` file +- Adding a newly required dependency to `[dependencies]` in the `Cargo.toml` + file for a crate + +After doing either of these actions, you can synchronize the Buck2 dependencies +with the Cargo dependencies with the following command: + +```bash +buck2 -v0 run third-party//rust:sync.py +``` + +This must be run from the root of the repository. Eyeball the output of `jj +diff` and make sure it looks fine, then test, before committing the changes. + +This step will re-synchronize all `third-party//rust` crates with the versions +in the workspace Cargo file, and then also update the `BUILD` files in the +source code with any newly added build dependencies that were added or removed +(not just updated). + +### `rust-analyzer` support + +Coming soon. + +## TODO + known Buck2 bugs + +TODO list: + +- [ ] Build time improvements + - Clean from scratch build is still about 2x slower than `cargo` + - Incremental rebuilds are quite comparable, though +- [ ] Investigate `rust-analyzer` support + - nightly `rust-analyzer` with + required +- [ ] hermetic toolchain + - [ ] system bootstrap python via + - [ ] clang/lld + - [ ] rustc +- [ ] remote caching +- [ ] remote execution +- macOS: + - [ ] x86_64: get build working + - mostly due to lack of an available x86_64 macOS machine + - [ ] get working in nix devShell, somehow + - linking `libiconv` is an issue, as usual + - requires the right shell settings, I assume +- Linux + - [ ] aarch64: get `bssl-sys` working with bindgen + - workaround: use `bindgen` from `$PATH` for now? +- Windows + - [ ] Is hermetic MSVC possible? + - [ ] [windows_shim for DotSlash](https://dotslash-cli.com/docs/windows/), + improving Windows ergonomics + - Requires committing binary `.exe` files to the repo, so optimized size is + critical + - Currently does not exist upstream; TBA + +Miscellaneous things: + +- [ ] Why does `buck2 build @mode//release` and then `buck2 build @mode//debug` + cause a redownload of `.crate` files? + - Only happens when switching modes; incremental builds with the same mode + are fine + - Early cutoff kicks in so this only incurs a few seconds of extra time + typically, because once Buck sees that the `.crate` files haven't actually + changed it can quit early. + +Upstream buck2 bugs: + +- [ ] `buck2` Linux binaries don't work on 16k page sizes +- [ ] Aggressive and annoying download warnings +- RE support: + - [ ] Missing `ActionCache` support + - [ ] File size logic bugs + - [ ] Buggy concurrency limiter + - [ ] Failure diagonstics +- `rust-analyzer` + - [ ] Unbreak OSS usage of `rust-project` +- Miscellaneous + - [ ] Distributing log files + + + +[Contributing]: https://martinvonz.github.io/jj/latest/contributing/ +[Buck2]: https://buck2.build/ +[Dotslash]: https://dotslash-cli.com/