Redesign open/closed concept? #321
Replies: 10 comments 19 replies
-
We (@hooper, @rdamazio, @spectral54, @torquestomp, and I) talked about this for quite a long time at work today. I think we all agreed that removing the concept of open commits is the best solution (i.e. the second solution above). In case it gets annoying to have to specify that you want to go back to editing a commit after you temporarily switched away, we could implement a hybrid model where we keep the open/editing commits in the repo view (suggested by @torquestomp). It's obviously simpler to not have do that, so I prefer to at least start without that feature and we can see how it goes. Someone (@spectral54, I think) suggested adding a @torquestomp suggested using We talked about how the new model (and the current model, actually) would interact with multiple workspaces and with concurrency. It is currently possible to have an open commit that has descendants. If you do, any descendants will be automatically rebased every time you run a command. So it works, but it's admittedly a bit weird to be in that state. The same situation could occur with the new model if the user did Any thoughts from anyone else? Does the above sound good to you? I know @arxanas had some ideas on Discord about some kind of higher-level (even hierarchical?) commit groups. |
Beta Was this translation helpful? Give feedback.
-
Oh, I forgot the bit about when to check that a commit has a description. We talked about having a config for that too. There would be one config value for prompting the user for a description when checking out away from a commit without a description. There would be another config value for not doing anything about it (but we should still prevent pushing them, as noted earlier). Maybe there would also be a config option for printing a warning. |
Beta Was this translation helpful? Give feedback.
-
Commit evolutionFirst, a digression on the topic of commit evolution. I'm sure most of this is familiar to you, but I'm not sure if we agree on the ideas around syncing. Under most current mainstream VCSes, a commit is a combination of some number of edges to parent commits and a snapshot (or patch, depending on the VCS), and some other metadata like the commit message and timestamp, which aren't important here. This handles a notion of history along the codebase dimension, where the functionality of the codebase changes over time. Mercurial (and also Jujutsu and git-branchless) have a concept of tracking the history of commits themselves as they're modified. In this model, a commit can have another kind of parent edge indicating its historical predecessor along the history dimension. Under Jujutsu, commits which are linked across the history dimension have the same change ID. One question is this: now that we're tracking the history of the commit graph, do we want to track one further level of changes, i.e. the history of the history of the commit graph? In practice, yes: we want to track the repository history so that we can undo operations. Jujutsu and git-branchless both accomplish this with an operation log, which track the commit nodes/edges we added. If you want to undo stuff, then you add even more commit nodes/edges which cancel out the effects of the previous operations. There's also the question of merging repository-level changes. Can you just apply someone else's operations to your repository and get a useful result? SyncThis model of commits allows the repository to form a natural CRDT: you can combine the states of two repositories by unioning together both their sets of commits (and edges). Any commits which have incoming history edges should be considered hidden. They've been superseded by a different commit — maybe even someone else's commit! Furthermore, the repository is in a state of conflict if there are any hidden commits which have non-hidden descendants (along the codebase dimension). In Jujutsu, this is less of a concern, since it's always safe to rebase a descendant onto the new, non-hidden version of the commit. But what if there are multiple non-hidden versions? In particular, what if we've modified an open commit? I'm assuming that Jujutsu doesn't actually properly handle this (correct me if I'm wrong). We'll come back to this shortly. IllustrationFor illustration, here are some relevant scenarios demonstrating the codebase vs. history dimension edges. To make a new commit, you create a new node and parent commit edge. Starting here: flowchart RL
B --> A
we can create a new commit flowchart RL
C--> B --> A
To amend commit flowchart RL
C -.-> B --> A
C --> A
To rebase commit flowchart RL
D -.-> B --> A
D --> C
There's not any situation where you wouldn't want to take an existing commit and preserve both its parent commit edge and parent history edge. Even when modifying commit metadata, we typically create a new commit along the history dimension which supersedes that commit. To illustrate the above sync problem for Jujutsu, suppose you start with this commit graph: flowchart RL
B ---> A
Then locally, you create a new commit flowchart RL
C ---> B ---> A
D -.-> B
E -.-> B
(AFAICT, Mermaid has no hints about how to lay out these nodes; I would have liked to have drawn the codebase axis horizontally and the history axis vertically, but it instead renders all child commits to the right.) Logically, what is the resolution? In my opinion, it would be to create two new auto-rebased commits, so that the graph looks like this: flowchart RL
C ---> B ---> A
D -.-> B
E -.-> B
F -.-> C
F ---> D
G ---> E
G -.-> C
I don't know if Jujutsu does this today. Given the above graph, how do we know that the repository is in a state of conflict, and how do we know what to do to repair it? Multi-head syncingBranchesJujutsu today allows a branch to have multiple associated heads. A branch is considered conflicted if it has more than one associated head. For example, more than one party tries to push the same branch to a remote server, that server can end up with a branch in a conflicted state. The Monotone VCS also thinks about branches in this way. However, a branch is a set of commits, rather than just a set of heads. The logical value of the branch is the unique head of that set of commits, and a branch is conflicted if there is more than one head. To push to a remote branch, you send all of your commits reachable from your repository's version of that branch. To resolve a conflicting branch, you have to create a merge commit with all of the previous heads as parents. Thus, commit resolution in Monotone is accomplished by adding more edges, not by removing some of the heads from the branch. This is elegant from a CRDT-style syncing standpoint. If you have another race, then you just end up with a branch (commit set) which still has multiple heads. Inferred branchesI said above that commit evolution and the history dimension are used to track changes to the commit graph over time. There's another interpretation: under Monotone-style branching, history edges let us infer logical branches, without having to name those branches explicitly. Instead, the branches are determined by the operations you do on the commits. I didn't figure out the exact criteria, but they're something like this:
If you would push a commit to a remote Git-style branch, you would also push all of the commits reachable from that commit (via both codebase and history edges) and union them into the commit set represented by the remote branch. If people push to the remote repository's local branch such that you end up with this graph from above: flowchart RL
C ---> B ---> A
D -.-> B
E -.-> B
then you know that the branch is in conflict because there are multiple heads for this commit set. To resolve this, first restore the abandoned-commit invariant by rebasing flowchart RL
C ---> B ---> A
D -.-> B
E -.-> B
F -.-> C
F ---> D
G ---> E
G -.-> C
Then merge the heads together in some order: flowchart RL
C ---> B ---> A
D -.-> B
E -.-> B
F -.-> C
F ---> D
G ---> E
G -.-> C
H --> F
H --> G
(Maybe the merging edges should be history edges and not parent commit edges?) Now the conflict has been reduced from a branch-level conflict to a commit-level conflict. (One other question is: can we model commit-level and branch-level conflicts in the same way or using the same algorithms?) Commit groupsMerge commits are a sham. Practically every workflow in use is essentially a patch-based workflow. For example, the Git repository merges changes through a series of progressively-more-stable branches, but they also cherry-pick individual commits into branches for backporting. They use the three-way diff functionality that merge commits give you, but really, they just want to apply all unapplied patches onto a given branch. There is no "canonical" form of a merge commit under Git; you can introduce "evil" merges which don't really reflect the underlying composition of both branches. The order of the parents is also significant. Here's another common complaint: when you merge a branch into another branch, you lose information, such as the historical branch name. We can generalize the open/closed commit idea into patch-based hierarchical commit groups and get rid of merge commits, while preserving the logical structure of the commit graph. IllustrationSuppose we start with this commit graph, and that flowchart RL
B --> A
There's now two notions of committing. A "closed" commit is as usual: flowchart RL
C --> B --> A
An "open" commit adds an extra internal commit to the list of commits, so the graph would look kind of like this instead. Note that both flowchart RL
subgraph X
direction RL
B2[B]
C --> B2
end
X --> A
X -.-> B --> A
A rebase looks like this: flowchart RL
subgraph X
direction RL
B2[B]
C --> B2
end
X --> D
X -.-> B --> A
The advantage is that you can group arbitrary levels of commits and preserve the groups as they move around the commit graph. Suppose you have an offshoot branch flowchart RL
subgraph foo
direction RL
D --> C
end
foo --> A
B --> A
You can rebase it like this: flowchart RL
subgraph foo
direction RL
D --> C
end
subgraph foo'
direction RL
D2[D] --> C2[C]
end
foo' -.-> foo
foo' --> B
foo --> A
B --> A
And the fact that there was a grouping of commits is preserved. FormalizationDefine Composition is total, so The There's an identity patch object Patches are idempotent: Then define The first element is the parent commit (or some object Let The interesting operations are something like this: There are no merge commits; you have to rebase one commit onto another. By patch idempotency, if This structure also means that the commit graph is independent of the conflict-resolution mechanism, so you can change the conflict resolution algorithm later and re-evaluate the commit graph. So the answer to the question
is that open/closed commits are only leaf nodes in the current implementation, but you should be able to nest commits arbitrarily. Then, to check out a commit, you want to specify whether you want an inner or outer commit (which would have different commit hashes). SyncRegarding this question:
The above commentary on commit evolution provides a(n apparently-sound?) semantics for syncing. In particular, it's independent of the type of the snapshot associated with a commit, as long as you can rebase one commit onto another. That means that you can swap in the commit group concept here where we have a list of commits instead of a single snapshot and still have meaningful syncing (except that above, when we merged to resolve conflicts, we should have instead rebased, because commit groups don't support merging). |
Beta Was this translation helpful? Give feedback.
-
What about And if one types (Sorry I didn't read all the discussion above, just skimmed 1/5 of it. And I don't know much about how jj works.)
Then Sorry again for not having read everything. I did search for ( Does |
Beta Was this translation helpful? Give feedback.
-
A bit late to the discussion, but my 2c as a user: Though I agree that having to learn the open/closed concept is an extra burden for users, in practice I really like the workflow where I can just check-out a commit, start working on it and, for the most part, |
Beta Was this translation helpful? Give feedback.
-
From the first message in this discussion:
Here are some case additional cases where we rely on the current default behavior for updating the working copy (creating a new commit on top iff the new commit is closed). In some cases, we will have to change the code to preserve a behavior similar to today, and in some cases we can't do that and we instead get a different behavior.
|
Beta Was this translation helpful? Give feedback.
-
With PR #390 merged, if you want to experiment with the new workflow, add this to your
That will make I've used this workflow for only a day or two and it's worked fine so far. I haven't wanted to use the new |
Beta Was this translation helpful? Give feedback.
-
I started using jj a week ago, when open commits were already gone, and I somehow missed the existence of I was mostly looking for this when I'd be using I would suggest that, when open commits are disabled, Other than this, not having open commits seems fine. Nothing was such an annoyance as to motivate me to enable them and learn how to use them. |
Beta Was this translation helpful? Give feedback.
-
This is unlikely to work well with submodules. To illustrate why, consider a naive implementation of checkout with submodules where we create a working copy commit on every submodule. Since each submodule is represented in the superproject by their commit SHA, we've marked every submodule as dirty relative to the parent! In that situation, it would be much more desirable to lazily create the submodule working copy commit upon a snapshot. An alternative set of checkout and open/closed semantics might be useful here:
With this scheme, we can make
And since This does complicate some matters though, since |
Beta Was this translation helpful? Give feedback.
-
I had been thinking that we would not create working-copy commits for submodules. When snapshotting the working copy and there had been changes in a submodule, we could then create a new commit on top of the current commit. If there are further changes in the working copy and we snapshot again, we could decide that the commit should be amended (instead of creating a new commit on top) by noticing that the submodule commit is already different from what the superproject's parent commit says. Do you think that would work? |
Beta Was this translation helpful? Give feedback.
-
Background
Because we automatically commit the working copy when we detect changes, we have the concept of "open" and "closed" commits. If you check out an open commit, any changes you make in the working copy will be automatically amended into it. If you check out a closed commit, jj will create a new open commit (with a new change ID) on top and check out that instead, so any changes in the working copy will always be amended into the current checkout. The way we tell open and closed commits apart is by a flag set in the commit. When closing a commit (by
jj close/commit
), jj will create a new commit with theopen
flag cleared.Problem
Perhaps the open/closed bit should not be embedded in the commit.
jj close
(and some others) sometimes crashes on first attempt with Git backend #27. However, I consider this a very weak argument against having the flag in the commit, because there are more general solutions for the bug.Solutions
I think we want to preserve how changes to the working copy will always result in amending the working copy commit, as opposed to creating a new commit on top when the first changes are made, for example. Given that, I think that means that we need to still have some way of indicating at the time of checkout whether to check out the specified commit or to create a new commit on top of. Note that checkouts can be a result of not only
jj checkout
, but also ofjj abandon
orjj git fetch
, for example.Store set of open commits in repo view
We could let the repo's "view" object keep track of which commits are considered open. The view concept didn't exist when I added the open/closed concept, so that's why I didn't do it that way to start with.
This proposal doesn't have much user-visible impact other than fixing the problems described above.
Make only the current checkout "open"
We could remove the per-commit open/closed concept completely by considering it to be a property of the working copy, so only the current checkout is "open". Instead of a having
jj checkout
behave differently depending on whether the commit is open or closed, we would then have to let the user decide if they wanted to check out the commit in order to edit it or in order to build on top. Perhaps that could bejj edit <commit>
to say that you want to check out a commit for editing (ignoring the fact that the command already exists and does something else) andjj checkout <commit>
to say that you want to check out a commit for building on top (or just run tests or whatever). Actually,jj new
is pretty much already that command (we should probably make it always update to the newly created commit, regardless of what else we decide about open/closed commits).This model is perhaps more intuitive - the user typically knows whether they're checking out a commit for editing or not, so it's probably not too demanding to have them say which. Removing the concept of open/closed commits seems like a good simplification.
An argument against this model is that knowing that a commit is open is a useful signal that it is unfinished.
If we decide to remove the concept of open/closed commits, we'll want to also rename or remove
jj close
. While the command no longer changes theopen
flag on the commit, it will still ask the user for a commit message (if there isn't already one), and create a new working-copy commit on top. So it might be useful as a convenience command for checking that the current working-copy commit has a description and creating a new one on top. Perhaps we should keep thejj commit
alias for the command (since that's familiar to git/hg users) and remove thejj close
name for it.If closing a commit is not necessary (or even possible), perhaps users will start using
jj new
more over time, thus bypassing the check for empty description. I don't know if there's a better place to remind the user to provide a description. Maybe we don't need to remind them at all? We will probably at least want to prevent pushing commits without a description (which we should do regardless of our decision here).Note that
jj commit
andjj new
will be very similar. The main difference would be thatjj commit -m foo
sets the description on the old commit (parent) whilejj new -m foo
sets the description on the new commit (child). Interestingly, if we makejj checkout
a synonym forjj new
, thenjj checkout
will also be a near-synonym forjj commit
.The automatic checkout we do when commits get rewritten and abandoned needs some thought. When the current checkout is abandoned, we should create a new commit for it. When the current checkout has been rewritten, we should just update to it without creating a new commit on top. Once we have support for tracking of public/draft commits (like hg's phases), we'll need to make sure that we create a new commit on top if a draft commit was rewritten and became public (actually, maybe the condition is unrelated to rewrites - we also want to create a new commit on top if current checkout became public without a rewrite).
Beta Was this translation helpful? Give feedback.
All reactions