From 9ee6fc570a4c685b6f41855556dfb2784e405935 Mon Sep 17 00:00:00 2001 From: Philip Metzger Date: Wed, 6 Sep 2023 00:49:59 +0200 Subject: [PATCH] docs/design: Move the `run` doc to github. This ticks another box in #1869. Co-Authored-By: arxanas Co-Authored-By: hooper Co-Authored-By: martinvonz --- docs/design/run.md | 268 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 268 insertions(+) create mode 100644 docs/design/run.md diff --git a/docs/design/run.md b/docs/design/run.md new file mode 100644 index 00000000000..b3c9b195b44 --- /dev/null +++ b/docs/design/run.md @@ -0,0 +1,268 @@ +# Introducing JJ run + +Authors: [Philip Metzger](philipmetzger@bluewin.ch), [Martin von Zweigberk](martinvonz@google.com), [Danny Hooper](hooper@google.com), [Waleed Khan](me@waleedkhan.name) + +Initial Version, 10.12.2022 [^1] + + +**Summary:** This Document documents the design of a new `run` command for +Jujutsu which will be used to seamlessly integrate with build systems, linters +and formatters. This is achieved by running a user-provided command or script +across multiple revisions. For more details, read the +[Use-Cases of jj run.](#Use-Cases-of-jj-run) + +## Preface + +The goal of this Design Document is to specify the correct behavior of `jj run`. +The points we decide on here I (Philip Metzger) will try to implement. There +exists some prior work in other DVCS: +* `git test`: part of git-branchless. Similar to this proposal for `jj run? . +* `hg run`: Google's internal Mercurial extension. Similar to this proposal for +`jj run`. +Details not available. +* `hg fix`_ Google's open source Mercurial extension: [source code][fix-src]. A +more specialized approach to rewriting file content without full context of the +working directory. +* `git rebase -x`: runs commands opportunistically as part of rebase. +* `git bisect run`: run a command to determine which commit introduced a bug. + +## Context and Scope: + +The initial need for some kind of command runner integrated in the VCS, surfaced +in a [github discussion.][pre-commit] In a discussion on discord about the +git-hook model, there was consensus about not repeating their mistakes. + +For `jj run? there is prior art in Mercurial, git branchless and Google's +internal Mercurial. Currently git-branchless `git test` and `hg fix` implement +some kind of command runner. While the Google internal `hg run` works in +conjunction with CitC (Clients in the Cloud) which allows it to lazily apply +the current command to any affected fgile. The base Jujutsu backend does not +have a fancy virtual filesystem suppporting it, so we can't apply this +optimization. + +## Goals and Non-Goals: + +### Goals: + +We should be able to apply the command to any revision, published or unpublished. +We should be able to parallelize running the actual command, while preserving a +good console output. +The run command should be able to work in the working copy. +There should exist some way to signal hard failure. +The command should build enough infrastructure for `jj test`, `jj fix` and +`jj format`. +The main goal is to be good enough, as we can always expand the functionality +in the future. + +### Non-Goals: + +While we should build a base for `jj test`, `jj format` and `jj fix`, we +shouldn't mash their use-cases into `jj run`. +The command shouldn't be too smart, as too many assumptions about workflows +makes the command confusing for users. +The smart caching of outputs, as user input commands can be unpredictable. +Fine grained user facing configuration, as it's unwarranted complexity. +A `fix` subcommand as it cuts too much design space. + +## Use-Cases of jj run: + +**Linting and Formatting:** +- `jj run 'pre-commit run' -r $revset` +- `jj run 'cargo clippy' -r $revset` +- `jj run 'cargo +nightly fmt'` +**Large scale changess across repositories, local and remote:** +- `jj run 'sed s/some/test' -r 'draft() & ~remote_branches(exact:"origin")'` +- `jj run '$rewrite-tool' -r '$revset'` +**Build systems:** +- `jj run 'bazel build //some/target:somewhere'` +- `jj run 'ninja check-lld + +Some of these use-cases should get a specialized command, as this allows +further optimization. A command could be `jj format`, which runs a list of +formatters over a subset of a file in a revision. Another command could be +`jj fix`, which runsa command like `rustfmt --fix` or `cargo clippy --fix` over +a subset of a file in a revision. + +## Design: + +### Base Design: + +All the work will be done in the `.jj/` directory. This allows us to hide all +complexity from the users, while preserving the user's current workspace. + +We will copy the approach from git-branchless's `git test` of creating a +temporary working copy for each parallel command. The working copies will be +reused between `jj run` invocations. They will also be reused within `jj run` +invocation if there are more commits to run on than there are parallel jobs. + +We will leave ignored files in the temporary directory between runs. That +enables incremental builds (e.g by letting cargo reuse its `target/` directory). +However, it also means that runs potentially become less reproducible. We will +provide a flag for removing ignroed files from the temporary working copies to +adress that. + +Another problem with leaving ignored files in the temporary directories is that +they take up space. That is especially problematic in the case of cargo (the +`target/` directory often takes up tens of GBs). The same flag for cleaning up +ignored files can be used to adress that. We may want to also have a flag for +cleaning up temporary working copies *after* running the command. + +An early version of the command will directly use [Treestate][treestate] to +to manage the temporary working copies. That means that running `jj` inside the +temporary working copies will not work . We can later extend that to use a full +[Workspace][workspace]. To prevent operations in the working copies from +impacting the repo, we can use a separate [OpHeadsStore][opheads] for it. + +### Modifying the Working Copy: + +Since the subprocesses will run in temporary working copies by default, they +won't interfere with the user's working copy. The user can therefore continue +to work in it while `jj run` is running. + +We want subprocesses to be able to make changes to the repo by updating their +assigned working copy. Let's say the user runs `jj run` on just commits A and +B, where B's parent is A. Any changes made on top of A would be squashed into +A, forming A'. Similarly B' would be formed by squasing it into B. We can then +either do a normal rebase of B' onto A', or we can simply update its parent to +A'. The former is useful, e.g when the subprocess only makes a partial update +of the tree based on the parent commit. In addition to these two modes, we may +want to have an option to ignore any changes made in the subprocess's working +copy. + +### Modifying the Repo: + +Once we give the subprocess access to a fork of the repo via separate +[OpHeadsStore][opheads], it will be able to create new operateions in its fork. +If the user runs `jj run -r foo` and the subprocess checks out another commit, +it's not clear what that should do. We should probably just verify that the +working-copy commit's parents are unchanged after the subprocess returns. Any +operations creeatd by the subprocess will be ignored. + +### Rewriting the revisions. + +We should handle public and private revisions differently. We choose to operate +on an immutable history by default. + +### Public revisions: + +For published revisions, we will not allow `jj run? to modify them and then +immediately error out, as published history should be immutable. We may want to +support a `--force` flag for an override but it won't be available in the first +iteration of the command. + +### Private/Draft revisions: + +For private/draft revisions, we just amend the changes, as Jujutsu usually does. +We also expose the actual behavior as a command option. + +## Exectuion order/parallelism: + +It may be useful to execute commands in topological order. For example, +commands with costs proportional to incremental changes, like build systems. +There may also be other revelant heuristics, but topological order is an easy +and effective way to start. + +Parallel execution of commands on different commits may choose to schedule +commits to still reduce incremental changes in the working copy used by each +execution slot/"thread". However, running the command on all commits +concurrently should be possible if desired. + +Executing commands in topological order allows for more meaningful use of any +potential features that stop execution "at the first failure". For example, +when running tests on a chain of commits, it might be useful to proceed in +topological/chronological order, and stop on the first failure, because it +might imply that the remaining executions will be undesirable because they will +also fail. + +## Dealing with failure: + +It will be useful to have multiple strategies to deal with failures on a single +or multiple revisions. The reason for these strategies is to allow customized +conflict handling. These strategies then can be exposed in the ui with a +matching command. + +**Continue:** If any subprocess fails, we will continue the work on child +revisions. Notify the user on exit about the failed revisions. + +**Stop:** Signal a fatal failure and cancel any scheduled work that has not +yet started running, but let any already started subprocess finish. Notify the +user about the failed command and display the generated error from the +subprocess. + +**Fatal:** Signal a fatal failure and immediately stop processing and kill any +runnign processes. Notify the user that we failed to apply the command to the +specific revision. + +We will leave any affected in its current state, if any subprocess fails. This +allows us provide a better user experience, as leaving revisions in an +undesirable state, e.g partially formatted, may confuse users. + +## Resource constraints: + +It will be useful to constrain the execution to prevent resource exhaustion. +Relevant resources could include: +- CPU and memory available on the machine running the commands. `jj run` can +provide some simple mitigations like limiting parallelism to "number of CPUs" +by default, and limiting parallelism by dividing "available memory" by some +estimate or measurement of per-invocation memory use of the commands. +- External resourec that are not immediately known to jj. For example, +commands run in parallel may wish to limit the toal number of connections +to a server. We might choose to defer any handling of this to the +implementation of the command being invoked, instead of trying to +communicate that information to jj. + + +## Command Options: + +The base command of any jj command should be usable. By default `jj run` works +on the `@` the current working copy. +* --command, explicit name of the first argument +* -x, for git compatibility (may alias another command) +* -j, --jobs, the amount of parallelism to use +* -k, --keep-going, continue on failure (may alias another command) +* --show, display the diff for an affected revision +* --dry-run, do the command execution without doing any work, logging all +intended files and arguments +* --rebase, rebase all parents on the consulitng diff (may alias another +command) +* --reparent, change the parent of an effected revision to the new change +(may alias another command) +* --clean, remove existing workspaces and remove the ignored files +* --readonly, ignore changes across multiple run invocations +* --error-strategy=`continue|stop|fatal`, see [Dealing with failure](#Dealing-with-failure) + +### Integrating with other commands: + +`jj log`: No special handling needed +`jj diff`: No special handling needed +`jj st`: For now reprint the final output of `jj run` +`jj op log`: No special handling needed, but awaits further discussion in +[#963][963] +`jj undo/jj op undo`: No special handling needed + + +## Open Points: + +Should the command be backend specific? +How do we maange the Processes which the command will spawn? +Configuration options, User and Repository Wide? + +## Future possibilites: + +We could rewrite the file in memory, which is a neat optimization +Exposing some internal state, to allow preciser resource constraints +Integration options for virtual filesystems, which allow them to cache the +needed working copies. +A Jujutsu wide concept for a cached working copy, as they could be expensive +to materialize. +Customized failure messages, this maybe useful for bots, it could be similar +to Bazel's `select(..., message = "arch not supported for $project"). +Make `jj run` asynchronous by spawning a `main` process, directly return to the +user and incrementally updating the output of `jj st`. + + +[^1]: You can find the full history [here](https://docs.google.com/document/d/14BiAoEEy_e-BRPHYpXRFjvHMfgYVKh-pKWzzTDi-v-g/edit). +[963]: https://github.com/martinvonz/jj/issues/963 +[treestate]: https://github.com/martinvonz/jj/blob/af85f552b676d66ed0e9ae0d401cd0c4ffbbeb21/lib/src/working_copy.rs#L117 +[opheads]: https://github.com/martinvonz/jj/blob/main/lib/src/op_heads_store.rs +[workspace]: https://github.com/martinvonz/jj/blob/af85f552b676d66ed0e9ae0d401cd0c4ffbbeb21/lib/src/workspace.rs#L54