diff --git a/docs/buck2.md b/docs/buck2.md new file mode 100644 index 0000000000..fb4bd7a005 --- /dev/null +++ b/docs/buck2.md @@ -0,0 +1,725 @@ +# Buck2 builds + +> [!TIP] +> This document is primarily of interest to developers. See also +> [Contributing] for more information on how to contribute in general. + +There is experimental support for building `jj` with [`buck2`][Buck2] as an +alternative to `cargo`. Buck2 is a hermetic and reproducible build system +designed for multiple programming languages. + +- If you're wondering "Why?", please read the section below titled + "[Why Buck2](#why-buck2)" +- If you're interested in using Buck2 for development, please read the section + below titled + "[Step 1: Please please please install + Dotslash](#step-1-please-please-please-install-dotslash)" + +> [!WARNING] +> Buck2 support is a work in progress, and is not yet complete; +> writing patches still requires `cargo` in practice, and so it is not +> recommended for primary development use. It may never be recommended for +> primary development use or merged into the main tree. + +## Current support & feature parity + +Some notes about build compatibility are included below. + +Legend: + +- ✅: Supported +- ⚠️: Partial support/WIP +- ❌: Not supported +- ❓: Status unknown/needs testing +- ⛔: Unsupported + +### Overall status + +| Feature | Status | +| ------------------------- | -------------- | +| `rust-analyzer` | ⚠️ | +| CI setup (GHA) | ✅ | +| Cargo (re)synchronization | ✅1 | + +1. `Cargo.toml` files remain the source of truth for Rust dependency info, and a + tool to resynchronize `BUILD` files with `Cargo.toml` is provided. + +| Unique features | Status | +| --------------------- | --------------- | +| Hermetic toolchain | ⚠️ | +| RBE/GHA `ActionCache` | ❌ | +| Auto `gen-protos` | ✅ 1 | + +1. `gen-protos` rebuilds `.proto` files automatically if they change, so there + is no need to use the committed `.rs` files. + +### vs Cargo + +| Feature | Cargo | Buck2 | +| --------------------- | ----- | ----------------- | +| `rust-analyzer` | ✅ | ⚠️ | +| Fully working build | ✅ | ✅ | +| Debug/Release configs | ✅ | ✅ | +| Full test suite | ✅ | ️ ❌ | +| Release-able binaries | ✅ | ️ ❌1,2 | +| Supports Nix devShell | ✅ | ️ ⚠️3,4 | + +1. macOS and Windows binaries are theoretically usable and distributable (no 3rd + party shared object dependencies), except for being untested. +2. Linux binaries are working but we can't yet produce `musl` builds, which + makes them less useful for distribution. However, glibc builds will often be + faster (faster malloc and faster memcpy/string routines), so it may be good + to support both. +3. Works fine on Linux, not macOS. +4. It is unclear whether Nix+Buck2 will be a supported combination in the long + run + +### Platform support + +| OS | Architecture | Status | +| ------- | ------------ | -------------- | +| Linux | x86_64 | ✅ | +| | aarch64 | ✅1 | +| macOS | x86_64 | ❌ | +| | aarch64 | ✅ | +| Windows | x86_64 | ✅ | +| | aarch64 | ❌2 | + +1. `aarch64-linux` requires [`bindgen`][bindgen] in `$PATH` +2. Entirely theoretical at this point because many other tools need to support + it, but a logical conclusion to all the other supported builds. + +[bindgen]: https://rust-lang.github.io/rust-bindgen/command-line-usage.html + +### Fixed and related bugs + +The Buck2 build is known to fix at least the following bugs, though they may all +have alternative solutions to varying degrees: + +- https://github.com/martinvonz/jj/issues/3984 + - libssh2 is built correctly by Buck2 on a fresh Windows system +- https://github.com/martinvonz/jj/issues/3322 + - BoringSSL enables ed25519 keys on all platforms in all builds +- https://github.com/martinvonz/jj/pull/3554 + - BoringSSL builds do not require perl/make +- https://github.com/martinvonz/jj/issues/4005 + - Buck2-built `jj` binaries have a statically built CRT on Windows + - Fixed in `main` by https://github.com/martinvonz/jj/pull/4096 + +## Step 1: Please please please install Dotslash + +Hermetic builds require using consistent build tools across developers; a major +selling point of solutions like Nix, Bazel, or Buck2 is that they do this for +us. But then how do we "bootstrap" the world with a consistent version of Buck2 +to start the process? + +Answer: We use [Dotslash] to manage Buck2 versions in a way that's consistent +across all developers and amenable to version control. In short, `dotslash` is +an interpreter for "dotslash files", and a Dotslash file is merely a JSON file +that lists a binary that should be downloaded and run, e.g. download binary +`example.com/aarch64.tar.gz` on `aarch64-linux`, and run the binary `bin/foo` +inside. + +By marking these JSON files as `+x` executable, and using Dotslash as the +"interpreter" for them, we can transparently download and run the correct +version of Buck2 for the current platform. Most importantly, these JSON files +are very small, easy to read, and can be recorded in version control history. +That means you'll always get a consistent build even when checking out +historical versions or when working on a different machine. + +You can install Dotslash binaries by following the instructions at: + +- + +Or, if you have Rust installed, you can install Dotslash by running: + +```sh +cargo install dotslash +``` + +Or, if you have Nix, you can install that way as well: + +```sh +nix profile install 'nixpkgs#dotslash' +``` + +> [!TIP] +> Check out the [Dotslash documentation](https://dotslash-cli.com/docs/), +> including the "Motivation" section, for more information about the design and +> use of Dotslash. + +## Step 2: Building `jj` with Buck2 + +After installing `dotslash` into your `$PATH`, you can build `jj` with the +included `buck2` file under `./tools/bin`: + +```sh +# Linux/macOS +export $PATH="$(jj root)/tools/bin:$PATH" +buck2 run cli -- version +``` + +```powershell +# Windows +dotslash ./tools/bin/buck2 run cli -- version +``` + +Dotslash will transparently run the correct version of `buck2`, and `buck2` will +build and run `jj version` on your behalf. + +--- + +## Why Buck2 + +Today, Cargo suits Jujutsu developers quite well, as the entire project is +written in Rust. But as time goes on and we look to grow, certain limitations +will start to be felt acutely. + +### Multi-language and project support + +The most glaring limitation of Cargo is that, like all other language-specific +build tools, its build graph has dependencies between "targets", but it only has +a Rust-specific notion of what a "target" is. Extending its dependency graph +beyond that is nearly impossible, resulting in the need for extra tools that can +only express coarse-grained dependencies between multiple larger targets. + +This has practical and pragmatic consequences. For Jujutsu, three of them in +particular are relevant to the developers today: usage of C, and usage of +JavaScript, and usage of Python. + +
+Case 1: C dependencies + +Jujutsu is written in Rust, but it currently has 3 major C libraries as +dependencies: + +- `libgit2` for Git support, which needs +- `libssh2` for SSH support, which needs +- `openssl` for cryptography (e.g. TLS transport for `https` clones and ed25519 + support in libssh2) + +Currently, all of these are managed on each platform by `cargo` through the use +of `build.rs` scripts that are opaque and have effectively unlimited power. +However, this has some unfortunate consequences for multi-platform support. + +The most notable is that `openssl` is complicated handle on Windows due to the +requirements for Make and Perl that are needed to build it; that means it isn't +enough to just have the source code, MSVC, and the Rust compiler, but often you +will need a third party toolchain like vcpkg to provide prebuilt `.lib` files, +or you must use other tools like `msys` to provide a Bash shell with working +`perl`/`make`. (On Linux and macOS, OpenSSL support is often easily available in +some form provided by the operating system.) + +To make this simpler, we _do_ have the option to refrain from `libssh2` using +OpenSSL on Windows, instead using the Windows Cryptography Next Gen (NCG) +library. This is the default when compiling from source with `cargo`. + +But this gives a poorer user experience for our Windows users who compile from +source to report bugs upstream, or fix issues. For example, +[#3322](https://github.com/martinvonz/jj/issues/3322) describes a bug where a +user can no longer clone a repository because NCG does not support Ed25519 host +keys, which are offered by GitHub (requiring an extra `ssh-keyscan` step to +fix). A fix to always use OpenSSL on Windows was proposed in +[#3554](https://github.com/martinvonz/jj/pull/3554), which "vendors" OpenSSL as +part of building the `openssl` Rust crate, but returns us back to the world of +Make and Perl, which is not a great experience for users to figure out, and +seemingly requires a significant amount of platform-specific details to use the +right tools from MSYS2 or vcpkg. + +This also results in a poor feedback loop: Windows users may build binaries that +are silently different from the ones they install from upstream, e.g. a user +installs an `.exe` from our release page, then builds a copy of `jj` on their +own computer, then finds the two behave differently. There is also no clear way +to know what the differences are or alert the user to them, as of today. + +In contrast, the Buck2 build of Jujutsu builds exactly one version of each of +its C dependencies, and has chosen [BoringSSL] as its cryptography library on +all platforms, by shimming it into the Rust build process. BoringSSL is built +manually with our own `BUILD` files. This results in a build of `libssh2` with +identical cryptographic support, including Ed25519 keys, for all users on all +platforms. This means that Windows users can build Jujutsu with nothing more +than MSVC, the Rust compiler, and Buck2, and everything will work handily. + +In the future, it may be possible to replace all these libraries with Rust +equivalents, negating the complex C build process factors. But ultimately, C or +Rust, this is an example of how a dependency you rely on and ship is ultimately +your responsibility to handle in the end. Even if the problem doesn't exist +immediately in your own codebase, it can still be a major source of confusion +and frustration for your users. + +
+
+
+Case 2: JavaScript usage + +We would like to implement an equivalent to Sapling's `sl web` command, and +perhaps even share the code for this with a project like [`gg`][gg] and package +Tauri apps inside the main repository. There has also been discussion of +extensions for VSCode. These all require use of JavaScript, and in practice +without extreme diligence will effectively require us to integrate tools like +`pnpm` or `yarn` into the build process. Even without those tools, it will +require our build graph to ultimately have knowledge of JavaScript in some way. + +A concrete example of this problem is in the [`diffedit3`][diffedit3] package by +Jujutsu contributor Ilya Grigoriev. We may even integrate `diffedit` into the +Jujutsu repository in the future. The source code repository currently is a +mixture of Cargo and npm packages, and due to the inability accurately track +changes between them, it is expected that the developers run `cargo build` and +`npm run build` in sequence and then commit the output `.js` file to the +repository (under `./webapp/dist/assets`). Not only does this bloat repository +sizes, it's unauditable too because there's no clear way to know what exactly +produced the `.js` file. Even doing such updates automatically with trusted +infrastructure (e.g. CI tools) would already require even further bespoke +tooling to be written, so the problem still exists. + +
+
+
+Case 3: Python usage + +TODO: Currently Python is used to build our website. Explain how this is another +manifestation of the same problems above. + +
+
+ +The common refrain at this point is to use something like `cargo xtask` or +`make` in order to represent the dependency graph of the entire project. A +common belief is that doing so is low-cost because it does not "introduce new +dependencies" due to their ubiquity; for instance, `make` is probably installed +on Unicies while `xtask` is already common in Rust. However, the cost of a +solution has to consider adoption as well as ongoing costs. And we do not need +`xtask` and `make` today, so adding them really is adding a new dependency, even +if they're common. + +Ultimately, the solutions that arise from tacking `xtask` or `make` onto an +existing group of tools all run afoul of the same fundamental problems described +in Peter Miller's important 1997 paper +["Recursive Make Considered Harmful"][rmch], including build graphs that are too +conservative (because finer dependencies can't be expressed, so you must be +safe) and are fundamentally incomplete (because the build system can't see the +whole picture of input and output files). + +[BoringSSL]: https://github.com/google/boringssl +[gg]: https://github.com/gulbanana/gg +[diffedit3]: https://github.com/ilyagr/diffedit3/ +[rmch]: https://aegis.sourceforge.io/auug97.pdf + +### Hermetic, safe, scalable builds + +> [!IMPORTANT] +> Buck2 builds of `jj` are not yet hermetically sound. In particular, +> unrestricted access to the filesystem is allowed at build time and we do not +> yet provide hermetic toolchains. + +Buck ultimately wants the build process to be a _pure function_ of its inputs, +including all the compilers and tools and source code needed. Given the same +inputs, you always get the same outputs, down to the same binary bits. As a +result of this, the build graph that Buck constructs will be "complete" and will +fully capture the relationships between all inputs, commands, and outputs. That +means the build is fundamentally _hermetic_. + +Hermetic builds are essential as any project grows in size, because the +probability of introducing arbitrary untracked side effects into the build +process approaches 1 as time goes on, often with unintended consequences. The +most ideal outcome of this is a simple failure to compile; more dangerous +results are flaky builds, silent miscompilations, and non-deterministic build +outputs based on internal implementation details (e.g. an unstable sort). + +Hermetic builds are also essential for _security_, because they help ensure that +builds are repeatable given a known "ground truth". Scenarios like the +[xz utils backdoor] have many complex factors involved, but an easy to +understand one is that the backdoor relied on the build process being +non-hermetic; the backdoor was inserted under a specific set of trigger criteria +that modified the build system actions, which could have been detected more +easily had there been a known reproducible output to compare against. Hermetic +builds derived from source code mean that backdoors often have to be inserted +in-band _into the code itself_ and cannot be inserted out-of-band into the build +process so easily. + +Finally, hermeticity is an essential feature for _build performance_ at scale +because it is required to allow sensible remote execution, avoid overly +conservative rules, and enable safe caching. The relationships Buck captures are +ultimately as fine grained as desired, down to individual files and commands, +across any language. Such fine detail can only be achieved with a very complete +understanding of the inputs. + +[xz utils backdoor]: https://en.wikipedia.org/wiki/XZ_Utils_backdoor + +### Remote cache support + +Because Buck can see the entire build graph, and the input/output relationship +between every file and command, it is possible to cache the results of every +build command and every file that is produced, and then download them +transparently on another (compatible) machine. + +The most common case of this is between the CI system and the developer; every +change must pass CI to be merged, and when a change is merged the results of +that build are put in a public cache. A developer may go to sleep for the night +and something gets merged during their slumber. When they wake up, then can +update to the new `main` branch, run `buck2 build`, and will instantly get a +cached build instead of recompiling. + +### Early movers advantage + +Given the fact that Cargo currently works well for our needs, why should we +investigate Buck2 now? Wouldn't it be better to wait until much later on when +it's really needed? Are there any major benefits now? + +Most people think of large-scale build tools like Nix or Bazel as necessary only +once the project has begin growning out of control. But by that point, there is +often [strong inertia against such a change][xkcd1172] and large amounts of +technical debt in the way, making such migration difficult and costly. + +[xkcd1172]: https://xkcd.com/1172/ + +The reality is that the easiest time to adopt hermetic and scalable build +systems is _early on_ in a project's lifecycle, even when the benefits are not +fully realized, because this is when it is easiest, and helps prevent impedance +mismatches from being introduced later on. + +Furthermore, executing early on this means that we are not blocked on +compromises like handling JavaScript in the case of `diffedit3` — meaning +we may be able to execute on certain plans _earlier_ than we otherwise would +have been able to had we stuck with Cargo. This is not the same as a free lunch; +rather the _path_ to achieving difficult things is unblocked, even if the road +to get there still requires work. For more information on this, see the section +below titled "[Future endeavours](#future-endeavours)". + +--- + +## Buck2 crash course + +The following is an extremely minimal crash course in Buck2 concepts and how to +use it. + +### Target names + +For users, Buck2 is used to build **targets**, that exists in **packages**, +which are part of a **cell**. The most explicit syntax for referring to a target +is the following: + +```text +cell//path/to/package:target-name +``` + +A cell is a short name that maps to a directory in the code repository. A +package is a subdirectory underneath the cell that contains the build rules for +the targets. A target is a buildable unit of code, like a binary or a library, +named in the `BUILD` file inside that package (more on that soon). + +`buck2 build` requires at least one target name, like the one above. The above +is an example of a "fully qualified" target name which is an unambiguous +reference. + +A fully-qualified reference to a target works anywhere in the source code tree, +so you can build or test any component no matter what directory you're in. + +So, given a cell named `foobar//` located underneath `code/foobar`, and a +package `bar/baz` in that cell, leads to a file + +```text +code/foobar/bar/baz/BUILD +``` + +Which contains the targets that can be built. + +There are several shorthands for a target: + +- NIH. + +### `BUILD` files + +If we consider tools like `make` and `cargo` to exist at different points in a +spectrum, then Buck is actually much closer to `make` in spirit than most of the +others. + +A `BUILD` file (also sometimes named `BUCK` or `TARGETS`) for a package lists +targets, which specify their dependencies as other targets. Therefore the +`BUILD` files in a project collectively form a directed acyclic graph (DAG) much +like a set of (non-recursive) Makefiles might. This is called the target graph. + +A `BUILD` file generally looks like this: + +```bazel +cxx_rule(name = 'foo', ...) + +rust_rule(name = 'bar', deps = [ ":foo" ], ...) + +java_rule(name = 'baz', deps = [ ":foo", ":bar" ], ...) +``` + +In this example, `foo` is a C++ binary, `bar` is a Rust binary that depends on +`foo`, and `baz` is a Java binary that depends on both `foo` and `bar`. (It is +easy to see how this is somewhat spritually similar to a Makefile.) + +A target is created by applying a rule, such as `cxx_rule` or `rust_rule`, and +assigning it a `name`. There can only be one target with a given name in a +package, but you can use the same rule multiple times with different names. + +Unlike Make, Buck requires that the body of a rule, its "implementation", must +be defined separately from where the rule is used. A rule can not be defined in +`BUILD` files, but only applied to arguments and bound to a name. + +It is important to note that these rules have no evaluation order defined. You +are allowed to write `cxx_rule` at the bottom of the file in the above example. +The name of the target is what matters, not the order in which the targets are +written. `BUILD` files only describe a graph, not a sequence of operations. + +More generally, a rule is just a function, a target is just the application of a +function to arguments, and the `name` field is a special argument that defines a +"bound name" for the result of the function call. So a `BUILD` file is just a +series of function calls, that might depend on one another. In a more "ordinary" +language, the above example might look like this: + +```bazel +bar = rust_rule(deps = [ foo ], ...) + +baz = java_rule(deps = [ foo, bar ], ...) + +foo = cxx_rule(...) +``` + +While this is a deeper topic, ultimately, the syntax Buck2 uses is a pragmatic +compromise, given the semantics of existing `BUILD` files. We can't change that, +but the "function application" metaphor is a very useful one to keep in mind. + +### Abstract targets & action graphs + +NIH. + +### Target visibility + +Every target can have an associated _visibility list_, which restricts who is +capable of depending on the target. There are two types of visibility: + +- `visibility` - The list of targets that can see and depend on this target. +- `within_view` - The list of targets that this target can see and depend on. + +Visibility is a practical and powerful tool for avoiding accidental +dependencies. For example, an experimental crate can have its `visibility` +prevent general usage, except by specific other targets that are testing it +before committing to a full migration. + +### Package files + +In a package, there can exist a `PACKAGE` file alongside every `BUILD` file. The +package file can specifie metadata about the package, and also control the +default visibility of targets in the package. + +### Mode files + +In order to support concepts like debug and release builds, we use the concept +of "mode files" in Buck2. These are files that contain a list of command line +options to apply to a build to achieve the desired effect. + +For example, to build in debug mode, you can simply include the contents of the +file `mode//debug` (using cell syntax) onto the command line. This can +conveniently be done with "at-file" syntax when invoking `buck2`: + +```sh +buck2 build cli @mode//debug +buck2 build cli @mode//release +``` + +Where `@path/to/file` is the at-file syntax for including the contents of a file +on the command line. This syntax supports `cell//` references to Buck cells, as +well. + +In short, `buck2 build @mode//file` will apply the contents of `file` to your +invocation. We keep a convenient set of these files maintained under the +`mode//` cell, located under [`./buck/mode`](../buck/mode). + +#### At-file syntax + +The `buck2` CLI supports a convenient modern feature called "at-file" syntax, +where the invocation `buck2 @path/to/file` is effectively equivalent to the +bash-ism `buck2 $(cat path/to/file)`, where each line of the file is a single +command line entry, in a consistent and portable way that doesn't have any limit +to the size of the underlying file. + +For example, assuming the file `foo/bar` contained the contents + +```text +--foo=1 +--bar=false +``` + +Then `buck2 --test @foo/bar` and `buck2 --test --foo=1 --bar=false` are +equivalent. + +### Buck Extension Language (BXL) + +NIH. + +## Examples + +Some examples are included below. + +
+Run the jj CLI + +The following shorthand is equivalent to the full target `root//cli:cli`: + +```sh +buck2 run //cli +``` + +This works anywhere in the source tree. It can be shortened to `buck2 run cli` +if you're already in the root of the repository. + +
+ +
+Run BoringSSL bssl speed tests + +```sh +buck2 run third-party//bssl @mode//release -- speed +``` + +
+ +
+Build all Rust dependencies + +```sh +buck2 build third-party//rust +``` + +
+ +
+Download all http_archive dependencies + +Useful for downloading all dependencies, then testing clean build times +afterwards. + +```sh +buck2 build $(buck2 uquery "kind('http_archive', deps('//...'))" | grep third-party//) +``` + +
+ +--- + +## Future endeavours + +NIH + +--- + +## Development notes + +Notes for `jj` developers using Buck2. + +### Build mode reference + +You can pass these to any `build` or `run` invocation. + +- `@mode//debug` +- `@mode//release` + +### Cargo dependency management + +Although Buck2 downloads and runs `rustc` on its own to build crate +dependencies, our `Cargo.toml` build files act as the source of truth for +dependency information in both Cargo and Buck2. + +Updating the dependency graph for Cargo-based projects typically comes in one of +two forms: + +- Updating a dependency version in the top-level workspace `Cargo.toml` file +- Adding a newly required dependency to `[dependencies]` in the `Cargo.toml` + file for a crate + +After doing either of these actions, you can synchronize the Buck2 dependencies +with the Cargo dependencies with the following command: + +```bash +buck2 -v0 run third-party//rust:sync.py +``` + +This must be run from the root of the repository. Eyeball the output of +`jj diff` and make sure it looks fine, then test, before committing the changes. + +This step will re-synchronize all `third-party//rust` crates with the versions +in the workspace Cargo file, and then also update the `BUILD` files in the +source code with any newly added build dependencies that were added or removed +(not just updated). + +### `rust-analyzer` support + +Coming soon. + +--- + +## TODO + known Buck2 bugs + +TODO list: + +- [ ] Build time improvements + - Clean from scratch build is still about 2x slower than `cargo` + - Incremental rebuilds are quite comparable, though +- [ ] Investigate `rust-analyzer` support + - nightly `rust-analyzer` with + required + - some experiments have worked, and support is relatively close +- [ ] hermetic toolchain + - [x] ~~system bootstrap python via + ~~ + - [ ] rustc + - [ ] clang/lld +- [ ] remote caching +- [ ] remote execution +- macOS: + - [ ] x86_64: get build working + - mostly due to lack of an available x86_64 macOS machine + - GHA x86_64 runners seem to be slow and have limited availability? + - [ ] get working in nix devShell, somehow + - linking `libiconv` is an issue, as usual + - requires the right shell settings, I assume +- Linux + - [x] ~~aarch64-linux: get `bssl-sys` working with bindgen~~ + - [ ] remove workaround: aarch64-linux requires `bindgen` in `$PATH`, for now +- Windows + - [ ] Is hermetic MSVC possible? + - [ ] [windows_shim for DotSlash](https://dotslash-cli.com/docs/windows/), + improving Windows ergonomics + - Requires committing binary `.exe` files to the repo, so optimized size is + critical + - Currently does not exist upstream; TBA + +Miscellaneous things: + +- [ ] Why does `buck2 build @mode//release` and then `buck2 build @mode//debug` + cause a redownload of `.crate` files? + - Only happens when switching modes; incremental builds with the same mode are + fine + - Early cutoff kicks in so this only incurs a few seconds of extra time + typically, because once Buck sees that the `.crate` files haven't actually + changed it can quit early. + +Upstream buck2 bugs: + +- [x] ~~`buck2` aarch64-linux binaries don't with 16k page size + ~~ +- [ ] Aggressively annoying download warnings + +- RE/AC support: + - [ ] Missing `ActionCache` support + + - [x] File size logic bugs + - [x] Buggy concurrency limiter + - [ ] Failure diagonstics +- `rust-analyzer` + - [x] Unbreak OSS usage of `rust-project` + +- Miscellaneous + - [ ] Distributing log files + - Buck2 logs are included in CI artifacts, but not published anywhere + + + +[Contributing]: https://martinvonz.github.io/jj/latest/contributing/ +[Buck2]: https://buck2.build/ +[Dotslash]: https://dotslash-cli.com/