From 9ee6fc570a4c685b6f41855556dfb2784e405935 Mon Sep 17 00:00:00 2001
From: Philip Metzger <philipmetzger@bluewin.ch>
Date: Wed, 6 Sep 2023 00:49:59 +0200
Subject: [PATCH] docs/design: Move the `run` doc to github.

This ticks another box in #1869.

Co-Authored-By: arxanas <me@waleedkhan.name>
Co-Authored-By: hooper <hooper@google.com>
Co-Authored-By: martinvonz <martinvonz@google.com>
---
 docs/design/run.md | 268 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 268 insertions(+)
 create mode 100644 docs/design/run.md

diff --git a/docs/design/run.md b/docs/design/run.md
new file mode 100644
index 00000000000..b3c9b195b44
--- /dev/null
+++ b/docs/design/run.md
@@ -0,0 +1,268 @@
+# Introducing JJ run
+
+Authors: [Philip Metzger](philipmetzger@bluewin.ch), [Martin von Zweigberk](martinvonz@google.com), [Danny Hooper](hooper@google.com), [Waleed Khan](me@waleedkhan.name)
+
+Initial Version, 10.12.2022 [^1]
+
+
+**Summary:** This Document documents the design of a new `run` command for 
+Jujutsu which will be used to seamlessly integrate with build systems, linters
+and formatters. This is achieved by running a user-provided command or script 
+across multiple revisions. For more details, read the 
+[Use-Cases of jj run.](#Use-Cases-of-jj-run)
+
+## Preface
+
+The goal of this Design Document is to specify the correct behavior of `jj run`.
+The points we decide on here I (Philip Metzger) will try to implement. There 
+exists some prior work in other DVCS:
+* `git test`: part of git-branchless. Similar to this proposal for `jj run? . 
+* `hg run`: Google's internal Mercurial extension. Similar to this proposal for
+`jj run`.
+Details not available. 
+* `hg fix`_ Google's open source Mercurial extension: [source code][fix-src]. A
+more specialized approach to rewriting file content without full context of the
+working directory. 
+* `git rebase -x`: runs commands opportunistically as part of rebase. 
+* `git bisect run`: run a command to determine which commit introduced a bug.
+
+## Context and Scope:
+
+The initial need for some kind of command runner integrated in the VCS, surfaced
+in a [github discussion.][pre-commit]  In a discussion on discord about the 
+git-hook model, there was consensus about not repeating their mistakes.
+
+For `jj run? there is prior art in Mercurial, git branchless and Google's 
+internal Mercurial. Currently git-branchless `git test` and `hg fix` implement
+some kind of command runner. While the Google internal `hg run` works in 
+conjunction with CitC (Clients in the Cloud) which allows it to lazily apply
+the current command to any affected fgile. The base Jujutsu backend does not
+have a fancy virtual filesystem suppporting it, so we can't apply this 
+optimization. 
+
+## Goals and Non-Goals: 
+
+### Goals:
+
+We should be able to apply the command to any revision, published or unpublished.
+We should be able to parallelize running the actual command, while preserving a
+good console output.
+The run command should be able to work in the working copy. 
+There should exist some way to signal hard failure. 
+The command should build enough infrastructure for `jj test`, `jj fix` and 
+`jj format`.
+The main goal is to be good enough, as we can always expand the functionality 
+in the future.
+
+### Non-Goals:
+
+While we should build a base for `jj test`, `jj format` and `jj fix`, we 
+shouldn't mash their use-cases into `jj run`.
+The command shouldn't be too smart, as too many assumptions about workflows 
+makes the command confusing for users. 
+The smart caching of outputs, as user input commands can be unpredictable.
+Fine grained user facing configuration, as it's unwarranted complexity.
+A `fix` subcommand as it cuts too much design space.
+
+## Use-Cases of jj run:
+
+**Linting and Formatting:**
+- `jj run 'pre-commit run' -r $revset`
+- `jj run 'cargo clippy' -r $revset`
+- `jj run 'cargo +nightly fmt'`
+**Large scale changess across repositories, local and remote:**
+- `jj run 'sed s/some/test' -r 'draft() & ~remote_branches(exact:"origin")'`
+- `jj run '$rewrite-tool' -r '$revset'`
+**Build systems:**
+- `jj run 'bazel build //some/target:somewhere'`
+- `jj run 'ninja check-lld
+
+Some of these use-cases should get a specialized command, as this allows 
+further optimization. A command could be `jj format`, which runs a list of 
+formatters over a subset of a file in a revision. Another command could be 
+`jj fix`, which runsa command like `rustfmt --fix` or `cargo clippy --fix` over
+a subset of a file in a revision.
+
+## Design:
+
+### Base Design: 
+
+All the work will be done in the `.jj/` directory. This allows us to hide all 
+complexity from the users, while preserving the user's current workspace.
+
+We will copy the approach from git-branchless's `git test` of creating a 
+temporary working copy for each parallel command. The working copies will be 
+reused between `jj run` invocations. They will also be reused within `jj run` 
+invocation if there are more commits to run on than there are parallel jobs.
+
+We will leave ignored files in the temporary directory between runs. That 
+enables incremental builds (e.g by letting cargo reuse its `target/` directory).
+However, it also means that runs potentially become less reproducible. We will 
+provide a flag for removing ignroed files from the temporary working copies to
+adress that. 
+
+Another problem with leaving ignored files in the temporary directories is that
+they take up space. That is especially problematic in the case of cargo (the 
+`target/` directory often takes up tens of GBs). The same flag for cleaning up
+ignored files can be used to adress that. We may want to also have a flag for 
+cleaning up temporary working copies *after* running the command. 
+
+An early version of the command will directly use [Treestate][treestate] to 
+to manage the temporary working copies. That means that running `jj` inside the 
+temporary working copies will not work . We can later extend that to use a full
+[Workspace][workspace]. To prevent operations in the working copies from 
+impacting the repo, we can use a separate [OpHeadsStore][opheads] for it.
+
+### Modifying the Working Copy:
+
+Since the subprocesses will run in temporary working copies by default, they 
+won't interfere with the user's working copy. The user can therefore continue
+to work in it while `jj run` is running. 
+
+We want subprocesses to be able to make changes to the repo by updating their
+assigned working copy. Let's say the user runs `jj run` on just commits A and 
+B, where B's parent is A. Any changes made on top of A would be squashed into 
+A, forming A'. Similarly B' would be formed by squasing it into B. We can then
+either do a normal rebase of B' onto A', or we can simply update its parent to
+A'. The former is useful, e.g when the subprocess only makes a partial update
+of the tree based on the parent commit. In addition to these two modes, we may 
+want to have an option to ignore any changes made in the subprocess's working 
+copy.
+
+### Modifying the Repo:
+
+Once we give the subprocess access to a fork of the repo via separate 
+[OpHeadsStore][opheads], it will be able to create new operateions in its fork.
+If the user runs `jj run -r foo` and the subprocess checks out another commit,
+it's not clear what that should do. We should probably just verify that the 
+working-copy commit's parents are unchanged after the subprocess returns. Any
+operations creeatd by the subprocess will be ignored. 
+
+### Rewriting the revisions. 
+
+We should handle public and private revisions differently. We choose to operate
+on an immutable history by default.
+
+### Public revisions:
+
+For published revisions, we will not allow `jj run?  to modify them and then 
+immediately error out, as published history should be immutable. We may want to
+support a `--force` flag for an override but it won't be available in the first
+iteration of the command. 
+
+### Private/Draft revisions:
+
+For private/draft revisions, we just amend the changes, as Jujutsu usually does. 
+We also expose the actual behavior as a command option.
+
+## Exectuion order/parallelism:
+
+It may be useful to execute commands in topological order. For example, 
+commands with costs proportional to incremental changes, like build systems. 
+There may also be other revelant heuristics, but topological order is an easy
+and effective way to start. 
+
+Parallel execution of commands on different commits may choose to schedule 
+commits to still reduce incremental changes in the working copy used by each
+execution slot/"thread". However, running the command on all commits 
+concurrently should be possible if desired. 
+
+Executing commands in topological order allows for more meaningful use of any 
+potential features that stop execution "at the first failure". For example, 
+when running tests on a chain of commits, it might be useful to proceed in 
+topological/chronological order, and stop on the first failure, because it 
+might imply that the remaining executions will be undesirable because they will
+also fail.
+
+## Dealing with failure:
+
+It will be useful to have multiple strategies to deal with failures on a single
+or multiple revisions. The reason for these strategies is to allow customized
+conflict handling. These strategies then can be exposed in the ui with a 
+matching command.
+
+**Continue:** If any subprocess fails, we will continue the work on child 
+revisions. Notify the user on exit about the failed revisions. 
+
+**Stop:** Signal a fatal failure and cancel any scheduled work that has not
+yet started running, but let any already started subprocess finish. Notify the
+user about the failed command and display the generated error from the 
+subprocess. 
+
+**Fatal:** Signal a fatal failure and immediately stop processing and kill any 
+runnign processes. Notify the user that we failed to apply the command to the 
+specific revision. 
+
+We will leave any affected in its current state, if any subprocess fails. This
+allows us provide a better user experience, as leaving revisions in an 
+undesirable state, e.g partially formatted, may confuse users.
+
+## Resource constraints:
+
+It will be useful to constrain the execution to prevent resource exhaustion. 
+Relevant resources could include:
+- CPU and memory available on the machine running the commands. `jj run` can
+provide some simple mitigations like limiting parallelism to "number of CPUs" 
+by default, and limiting parallelism by dividing "available memory" by some 
+estimate or measurement of per-invocation memory use of the commands.
+- External resourec that are not immediately known to jj. For example, 
+commands run in parallel may wish to limit the toal number of connections
+to a server. We might choose to defer any handling of this to the 
+implementation of the command being invoked, instead of trying to 
+communicate that information to jj.
+
+
+## Command Options:
+
+The base command of any jj command should be usable. By default `jj run` works 
+on the `@` the current working copy.
+* --command, explicit name of the first argument
+* -x, for git compatibility (may alias another command)
+* -j, --jobs, the amount of parallelism to use
+* -k, --keep-going, continue on failure (may alias another command)
+* --show, display the diff for an affected revision
+* --dry-run, do the command execution without doing any work, logging all 
+intended files and arguments
+* --rebase, rebase all parents on the consulitng diff (may alias another 
+command)
+* --reparent, change the parent of an effected revision to the new change 
+(may alias another command)
+* --clean, remove existing workspaces and remove the ignored files
+* --readonly, ignore changes across multiple run invocations
+* --error-strategy=`continue|stop|fatal`, see [Dealing with failure](#Dealing-with-failure)
+
+### Integrating with other commands:
+
+`jj log`: No special handling needed
+`jj diff`: No special handling needed
+`jj st`: For now reprint the final output of `jj run`
+`jj op log`: No special handling needed, but awaits further discussion in 
+[#963][963]
+`jj undo/jj op undo`: No special handling needed
+
+
+## Open Points:
+
+Should the command be backend specific?  
+How do we maange the Processes which the command will spawn?  
+Configuration options, User and Repository Wide?
+
+## Future possibilites:
+
+We could rewrite the file in memory, which is a neat optimization  
+Exposing some internal state, to allow preciser resource constraints  
+Integration options for virtual filesystems, which allow them to cache the 
+needed working copies.  
+A Jujutsu wide concept for a cached working copy, as they could be expensive
+to materialize.  
+Customized failure messages, this maybe useful for bots, it could be similar 
+to Bazel's `select(..., message = "arch not supported for $project").
+Make `jj run` asynchronous by spawning a `main` process, directly return to the
+user and incrementally updating the output of `jj st`. 
+
+
+[^1]: You can find the full history [here](https://docs.google.com/document/d/14BiAoEEy_e-BRPHYpXRFjvHMfgYVKh-pKWzzTDi-v-g/edit).
+[963]: https://github.com/martinvonz/jj/issues/963 
+[treestate]: https://github.com/martinvonz/jj/blob/af85f552b676d66ed0e9ae0d401cd0c4ffbbeb21/lib/src/working_copy.rs#L117
+[opheads]: https://github.com/martinvonz/jj/blob/main/lib/src/op_heads_store.rs
+[workspace]: https://github.com/martinvonz/jj/blob/af85f552b676d66ed0e9ae0d401cd0c4ffbbeb21/lib/src/workspace.rs#L54