From a77433e067f2340e2591dfa1820016b6cb4610ab Mon Sep 17 00:00:00 2001 From: Daniel Ploch Date: Tue, 23 Jan 2024 12:53:07 -0500 Subject: [PATCH] sparse-v2: design doc proposition for Sparse Patterns refactoring Kicks off work on issue #1896 --- docs/design/sparse-v2.md | 308 +++++++++++++++++++++++++++++++++++++++ mkdocs.yml | 1 + 2 files changed, 309 insertions(+) create mode 100644 docs/design/sparse-v2.md diff --git a/docs/design/sparse-v2.md b/docs/design/sparse-v2.md new file mode 100644 index 0000000000..4bf4f2b3c9 --- /dev/null +++ b/docs/design/sparse-v2.md @@ -0,0 +1,308 @@ +# Sparse Patterns v2 redesign + +Authors: [Daniel Ploch](mailto:dploch@google.com) + +**Summary:** This Document documents a redesign of the sparse command and +it's internal storage format in jj, in order to facilitate several desirable +improvements for large repos. It covers both the migration path and the planned +end state. + +## Objective + +Redesign Sparse Patterns to accommodate more advanced features for native +and custom implementations. This includes three main goals: + +1. Sparse Patterns should be versioned with the working copy +1. Sparse Patterns should support more [flexible matching rules](https://github.com/martinvonz/jj/issues/1896) +1. Sparse Patterns should support [client path remapping](https://github.com/martinvonz/jj/issues/2288) + +## Current State (as of jj 0.13.0) + +Sparse patterns are an effectively unordered list of prefix strings: + +```txt +path/one +path/to/dir/two +``` + +The _set_ of files identified by the Sparse Patterns is all paths which match +any provided prefix. This governs what gets materialized in the working copy on +checkout, and what is updated on snapshot. The set is stored in working copy +state files which are not versioned in the Op Store. + +Because all paths are bare strings with no escaping or higher-level formatting, +the current design makes it difficult to add new features like exclusions or +path remappings. + +## Proposed State (Sparse Patterns v2) + +Sparse Patterns v2 will be stored as objects in the Op Store, referenced +by a `WorkingCopyPatternsId` from the active `View`. They will have a new, +ordered structure which can fully represent previous patterns. + +```rust +/// Analogues of RepoPath, specifically describing paths in the working copy. +struct WorkingCopyPathBuf { + String +} +struct WorkingCopyPath { + str +} + +pub enum SparsePatternsPathType { + Dir, // Everything under /... + Files, // Files under /* + Exact, // exactly +} + +pub struct SparsePatternsPath { + path_type: SparsePatternsPathType, + include: bool, // True if included, false if excluded. + path: RepoPathBuf, +} + +pub struct WorkingCopyMapping { + src_path: RepoPathBuf, + dst_path: WorkingCopyPathBuf, + recursive: bool, // If false, only immediate children of src_path (files) are renamed. +} + +pub struct WorkingCopyPatterns { + sparse_paths: Vec, + mappings: Vec, +} + +pub trait OpStore { + ... + pub fn read_working_copy_patterns(&self, id: &WorkingCopyPatternsId) -> OpStoreResult { ... } + pub fn write_working_copy_patterns(&self, sparse_patterns: &WorkingCopyPatterns) -> OpStoreResult { .. } +} +``` + +To support these more complex behaviors, a new `WorkingCopyPatterns` trait will +be introduced, initially only as a thin wrapper around the existing prefix +format, but soon to be expanded with richer types and functionality. + +```rust +impl WorkingCopyPatterns { + pub fn to_matcher(&self) -> Box { + ... + } + + ... +} +``` + +### Command Syntax + +`SparsePatternsPath` rules can be specified on the CLI and in an editor via a +compact syntax: + +```txt +(include|exclude):(dir|files|exact): +``` + +If both prefix terms are omitted, then `include:dir:` is assumed. If any prefix +is specified, both must be specified. The editor and CLI will both accept path +rules in either format going forward. + +- `jj sparse set --add foo/bar` is equal to `jj sparse set --add include:dir:foo/bar` +- `jj sparse set --add exclude:dir:foo/bar` adds a new `Dir` type rule with `include = false` +- `jj sparse set --exclude foo/bar` as a possible shorthand for the above +- `jj sparse list` will print the explicit rules + +Paths will be stored in an ordered, canonical form which unambiguously describes +the set of files to be included. Every `--add` command will append to the end of +this list before the patterns are canonicalized. Whether a file is included is +determined by the first matching rule in reverse order. + +For example: + +```txt +include:dir:foo +exclude:dir:foo/bar +include:dir:foo/bar/baz +exclude:dir:foo/bar/baz/qux +``` + +Produces rule set which includes "foo/file.txt", excludes "foo/bar/file.txt", +includes "foo/bar/baz/file.txt", and excludes "foo/bar/baz/qux/file.txt". + +If the rules are subtly re-ordered, they become canonicalized to a smaller, but +functionally equivalent form: + +```txt +# Before +include:dir:foo +exclude:dir:foo/bar/baz/qux +include:dir:foo/bar/baz +exclude:dir:foo/bar + +# Canonicalized +include:dir:foo +exclude:dir:foo/bar +``` + +#### Canonicalization + +There are many ways to represent functionally equivalent `WorkingCopyPatterns`. +For instance, the following 4 rule sets are all functionally equivalent: + +```txt +# Set 1 +include:dir:bar +include:dir:foo + +# Set 2 +include:dir:foo +include:dir:bar + +# Set 3 +include:dir:bar +include:dir:bar/baz/qux +include:dir:foo + +# Set 4 +include:dir:foo +exclude:dir:foo/baz +include:dir:bar +include:dir:foo/baz +``` + +Because these patterns are stored in the Op Store now, it is useful for all of +these representations to be rewritten into a minimal, canonical form before +serialization. In this case, `Set 1` will be the canonical set. The canonical +form of a `WorkingCopyPatterns` is defined as the form such that: + +- Every rule affects the functionality (there are no redundant rules) +- Rules are sorted lexicographically, but with '/' sorted before all else + - This special sorting order is useful for constructing path tries + +### Working Copy Map + +WARNING: This section is intentionally lacking, more research is needed. + +All `WorkingCopyPatterns` will come equipped with a default no-op mapping. +These mappings are inspired by and similar to [Perforce client views](https://www.perforce.com/manuals/cmdref/Content/CmdRef/views.html). + +```rust +vec![WorkingCopyMapping { + src_path: RepoPathBuf::root(), + dst_path: WorkingCopyPathBuf::root(), + recursive: true, +}] +``` + +`WorkingCopyPatterns` will provide an interface to map working copy paths into +repo paths and vice versa. The `WorkingCopy`` trait will apply this mapping to +all snapshot and checkout operations, and jj commands which accept relative +paths will need to be updated to perform working copy path -> repo path +translations as needed. It's not clear at this time _which_ commands will need +changing, as some are more likely to refer to repo paths rather than working +copy paths. + +TODO: Expand this section. + +In particular, the path rules for sparse patterns will _always_ be repo paths, +not working copy paths. Thus, if the working copy wants to track "foo" and +rename it to "subdir/bar", they must `jj sparse set --add foo` and +`jj map set --from foo --to bar`. In other words, the mapping operation can +be thought of as always _after_ the sparse operation. + +#### Command Syntax + +New commands will enable editing of the `WorkingCopyMapping`s: + +TODO: Maybe this should be `jj workspace map ...`? + +- `jj map list` will print all mapping pairs. +- `jj map add --from foo --to bar` will add a new mapping to the end of the list. +- `jj map remove --from foo` will remove a specific mapping rule. +- `jj map edit` will pull up a text editor for manual editing. + +Like sparse paths, mappings will have a compact text syntax for editing in file +form, or for adding a rule textually on the CLI: + +```txt +"" -> "" [nonrecursive] +``` + +Like sparse paths, mapping rules are defined to apply in _order_ and on any +save operation will be modified to a minimal canonical form. Thus, +`jj map set --from "" --to ""` will always completely wipe the map. +The first matching rule in reverse list order determines how a particular +repo path should be mapped into the working copy, and likewise how a particular +working copy path should be mapped into the repo. For simplicity, the +'last rule wins' applies both for repo->WC conversions, as well as WC->repo +conversions, using the same ordering. + +If a working copy mapping places the same repo file at two distinct working +copy paths, snapshotting will fail unless these files are identical. Some +specialized filesystems may even treat these as the 'same' file, allowing this +to work in some cases. + +If a working copy mapping places two distinct repo files at the same working +copy path, checkout will fail with an error regardless of equivalence. + +### Versioning and Storage + +Updating the active `WorkingCopyPatterns` for a particular working copy will now +take place in two separate steps: one transaction which updates the op store, +and a separate `LockedWorkingCopy` operation which actually updates the working +copy. The working copy proto will no longer store `WorkingCopyPatterns` +directly, instead storing only a `WorkingCopyPatternsId`. On mismatch with the +current op head, the user will be prompted to run `jj workspace update-stale`. + +This gives the user the ability to update the active `WorkingCopyPatterns` +whilst not interacting with the local working copy, which is useful for custom +integrations which may not be _able_ to check out particular working copy +patterns due to problems with the backend (encoding, permission errors, etc.). A +bad `jj sparse set --add oops` command can thus be undone, even via `jj op undo` +if desired. + +#### View Updates + +The View object will be migrated to store working copy patterns via id. The +indirection will save on storage since working copy patterns are not expected to +change very frequently. + +```rust +// Before: +pub wc_commit_ids: HashMap, + +// After: +pub struct WorkingCopyInfo { + pub commit_id: CommitId, + pub wc_patterns_id: WorkingCopyPatternsId, +} +... +pub wc_info: HashMap, +``` + +A View object with no stored working copy patterns will be modified at read +time to include the current working copy patterns, thus all `read_view` +operations will need to pass in the current working copy patterns for a +migration period of at least 6 months. After that, we may choose to auto-fill +missing working copy infos with a default `WorkingCopyPatterns` as needed. + +### Appendix + +#### Related Work + +[Perforce client maps](https://www.perforce.com/manuals/cmdref/Content/CmdRef/views.html) + are very similar in concept to the entirety of `WorkingCopyPatterns`, and this + design aims to achieve similar functionality. + +The [Josh Project](https://github.com/josh-project/josh) implements partial git +clones in a way similar to how sparse patterns try to work. + +#### Patterns via configuration + +There may be some scenarios where it is valuable to configure working copy +patterns via a configuration file, rather than through explicit commands. +Generally this only makes sense for automated repos, with the configuration +coming from outside the repo - there are too many caveats and edge cases if the +configuration comes from inside the repo and/or is fought with by a human. + +No configuration syntax is planned at this time but if we add any, we should +probably reuse the compact line syntaxes as much as possible for consistency. diff --git a/mkdocs.yml b/mkdocs.yml index 1d19f34fd5..c801ef2651 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -115,6 +115,7 @@ nav: - 'git-submodules': 'design/git-submodules.md' - 'git-submodule-storage': 'design/git-submodule-storage.md' - 'JJ run': 'design/run.md' + - 'Sparse Patterns v2': 'design/sparse-v2.md' - 'Tracking branches': 'design/tracking-branches.md'