Skip to content

Commit

Permalink
copy-info: initial design doc
Browse files Browse the repository at this point in the history
  • Loading branch information
torquestomp committed May 3, 2024
1 parent 0d630ac commit cfd7e64
Showing 1 changed file with 189 additions and 0 deletions.
189 changes: 189 additions & 0 deletions docs/design/copy-info.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
# Copy Info Design

Authors: [Daniel Ploch](mailto:[email protected])

**Summary:** This Document documents an approach to tracking and detecting copy
information in jj repos, in a way that is compatible with both Git's detection
model and with custom backends that have more complicated tracking of copy
information. This design affects the output of diff commands as well as the
results of rebasing across remote copies.

## Objective

Implement extensible APIs for recording and retrieving copy info for the
purposes of diffing and rebasing across renames and copies more accurately.
This should be performant both for Git, which synthesizes copy info on the fly
between arbitrary trees, and for custom extensions which may explicitly record
and re-serve copy info over arbitrarily large commit ranges.

The APIs should be defined in a way that makes it easy for custom backends to
ignore copy info entirely until they are ready to implement it.

### Read API

Copy information will be served both by a new Backend trait method described
below, as well as a new field on Commit objects for backends that support copy
tracking:

```rust
/// An individual copy source.
pub struct CopySource {
/// The source path a target was copied from.
/// It is not requires that the source path is different than the target
/// path. A custom backend may choose to represent 'rollbacks' as copies
/// from a file unto itself, from a specific prior commit.
path: RepoPathBuf,
file: FileId,
/// The source commit the target was copied from. If not specified, then the
/// parent of the target commit is the source commit. Backends may use this
/// field to implement 'integration' logic, where a source may be
/// periodically merged into a target, similar to a branch, but the
/// branching occurs at the file level rather than the repository level. It
/// also follows naturally that any copy source targeted to a specific
/// commit should avoid copy propagation on rebasing, which is desirable
/// for 'fork' style copies.
commit: Option<CommitId>,
}

using MergedCopySoure = Merge<Option<CopySource>>;

/// An individual copy event, from file A -> B.
pub struct CopyRecord {
/// The destination of the copy, B.
target: RepoPathBuf,
/// The CommitId where the copy took place.
id: CommitId,
/// The source of the copy, A.
source: MergedCopySoure,
}

/// Backend options for fetching copy records.
pub struct CopyRecordOpts {
/// If true, follow transitive copy records. That is, if file A is copied to
/// B and then to C, a request for copy records of file C should also return
/// the copy from A to B. These two copies will be returned in separate
/// CopyRecords.
transitive: bool
// TODO: Probably something for git similarity detection
}

using CopyRecordStream = Pin<Box<dyn Stream<Item = BackendResult<CopyRecord>>>>;

pub trait Backend {
/// Get all copy records for `paths` in the dag range `roots..heads`.
///
/// The exact order these are returned is unspecified, but it is guaranteed
/// to be topological. That is, for any two copy records with different
/// commit ids A and B, if A is an ancestor of B, A is streamed after B.
async fn get_copy_records(&self, paths: &[RepoPathBuf], roots: &[CommitId], heads: &[CommitId]) -> CopyRecordStream;
}
```

Obtaining copy records for a single commit requires first computing the files
list for that commit, then calling get_copy_records with `heads = [id]` and
`roots = parents()`. This enables commands like `jj diff` to produce better
diffs that take copy sources into account.

### Write API

Backends that support tracking copy records at the commit level will do so
through a new field on `backend::Commit` objects:

```rust
pub struct Commit {
...
copies: Option<HashMap<RepoPathBuf, MergedCopySoure>>,
}

pub trait Backend {
/// Whether this backend supports storing explicit copy records on write.
fn supports_copy_tracking(&self) -> bool;
}
```

This field will be ignored by backends that do not support copy tracking, and
always set to `None` when read from such backends. Backends that do support copy
tracking are required to preserve the field value always.

This API will enable the creation of new `jj` commands for recording copies:

```shell
jj cp $SRC $DEST [OPTIONS]
jj mv $SRC $DEST [OPTIONS]
```

These commands will rewrite the target commit to reflect the given move/copy
instructions in its tree, as well as recording the rewrites on the Commit
object itself for backends that support it (for backends that do not,
these copy records will be silently discarded).

Flags for the first two commands will include:

```
-r/--revision
perform the copy or move at the specified revision
defaults to the working copy commit if unspecified
-f
force overwrite the destination path
--after
record the copy retroactively, without modifying the targeted commit
--resolve
overwrite all previous copy intents for this $DEST
--allow-ignore-copy
don't error if the backend doesn't support copy tracking
--from REV
specify a commit id for the copy source that isn't the parent commit
```

For backends which do not support copy tracking, it will be an error to use
`--after`, since this has no effect on anything and the user should know that.
The `supports_copy_tracking()` trait method is used to determine this.

An additional command is provided to deliberately discard copy info for a
destination path, possibly as a means of resolving a conflict.

```shell
jj forget-cp $DEST [-r REV]
```

### Rebase Changes

A well known and thorny problem in Mercurial occurs in the following scenario:

1. Create a new file A
1. Create new commits on top that make changes to file A
1. Whoops, I should rename file A to B. Do so, amend the first commit.
1. Because the first commit created file A, there is no rename to record; it's changing to a commit that instead creates file B.
1. All child commits get sad on evolve

In jj, we have an opportunity to fix this because all rebasing occurs atomically
and transactionally within memory. The exact implementation of this is yet to be
determined, but conceptually the following should produce desirable results:

1. Rebase commit A from parents [B] to parents [C]
1. Get copy records from [D]->[B] and [D]->[C], where [D] are the common ancestors of [B] and [C]
1. DescendantRebaser maintains an in-memory map of commits to extra copy info, which it may inject into (2). When squashing a rename of a newly created file into the commit that creates that file, DescendentRebase will return this rename for all rebases of descendants of the newly modified commit. The rename lives ephemerally in memory and has no persistence after the rebase completes.
1. A to-be-determined algorithm diffs the copy records between [D]->[B] and [D]->[C] in order to make changes to the rebased commit. This results in edits to renamed files being propagated to those renamed files, and avoiding conflicts on the deletion of their sources. A copy/move may also be undone in this way; abandoning a commit which renames A->B should move all descendant edits of B back into A.

In general, we want to support the following use cases:

- A rebase of an edited file A across a rename of A->B should transparently move the edits to B.
- A rebase of an edited file A across a copy from A->B should _optionally_ copy the edits to B. A configuration option should be defined to enable/disable this behavior.
- TODO: Others?

## Non-goals

### Tracking copies in Git

Git uses rename detection rather than copy tracking, generating copy info on
the fly between two arbitrary trees. It does not have any place for explicit
copy info that _exchanges_ with other users of the same git repo, so any
enhancements jj adds here would be local only and could potentially introduce
confusion when collaborating with other users.

### Directory copies/moves

All copy/move information will be read and written at the file level. While
`jj cp|mv` may accept directory paths as a convenience and perform the
appropriate filesystem operations, the renames will be recorded at the file
level, one for each copied/moved file.

0 comments on commit cfd7e64

Please sign in to comment.