-
Notifications
You must be signed in to change notification settings - Fork 381
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: docs/design: Topics #4496
base: main
Are you sure you want to change the base?
WIP: docs/design: Topics #4496
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
@@ -0,0 +1,168 @@ | ||||||
# Topics | ||||||
|
||||||
Authors: [Philip Metzger](mailto:[email protected]), [Noah Mayr](mailto:[email protected]) | ||||||
[Anton Bulakh](mailto:[email protected]) | ||||||
|
||||||
## Summary | ||||||
|
||||||
Introduce Topics as a truly Jujutsu native way for topological branches, which | ||||||
also replace the current bookmark concept for Git interop. As they have been | ||||||
documented to be confusing users coming from Git. They also supersede the | ||||||
`[experimental-advance-branches]` config for those who currently use it, as | ||||||
such a behavior will be built-in for Topics. | ||||||
|
||||||
Topics have been discussed heavily since their appearance in | ||||||
[this Discussion][gh-discuss]. As Noah, Anton and I had a long | ||||||
[Discord discussion][dc-thread] about them, which then also poured into the | ||||||
[Topic issue][issue]. | ||||||
|
||||||
## Prior work | ||||||
|
||||||
Currently there only is Mercurial which has a implementation of | ||||||
[Topics][hg-topics]. There also is the [Topic feature][gerrit-topics] in Gerrit, | ||||||
which groups commits with a single identifier. | ||||||
|
||||||
|
||||||
## Goals and non-goals | ||||||
|
||||||
### Goals | ||||||
|
||||||
The goals for this Project are small, see below. | ||||||
|
||||||
* Introduce the concept of native topological branches for Jujutsu. | ||||||
* Simplify Git interop by reducing the burden on Bookmarks. | ||||||
* Add Change metadata as a storage concept. | ||||||
* Remove the awkward `bookmark` to Git `branch` mapping. | ||||||
|
||||||
### Non-Goals | ||||||
|
||||||
* Making bookmarks unnecessary. | ||||||
|
||||||
## Overview | ||||||
|
||||||
Until now, Jujutsu had no native set of topological branches, just | ||||||
[Bookmarks][bm] which interact poorly with Git's expectation of branches. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if it makes sense to refer to "Git's expectation" separately from the expectations of Git users, but if we do want to call out poor interaction between bookmarks and branches, then I think we should be more specific about where the tooling actually diverges in behavior. Git has a concept of a "current branch" that
Suggested change
|
||||||
Topics on the otherhand are can be made to represent Git branches as users | ||||||
expect them, see [Julia Evans poll][jvns-poll]. They also allow us to | ||||||
seamlessly take over the [tracking-branches][tb] concept. | ||||||
|
||||||
Other use-cases they're useful for are representing a set of | ||||||
[archived commits][archived] or even a [checkout history][checkout]. | ||||||
|
||||||
|
||||||
#### Behavior | ||||||
|
||||||
A Topic is a set of Changes marked with a name, which infectiously moves | ||||||
forward with each descendant you create. All Changes without a named topic are | ||||||
implicitly in a anonymous topic, which gets separately tracked as soon as you | ||||||
send it in for review (so materialize it as a Git branch). A Topic may have | ||||||
multiple heads. A single Change may belong to multiple Topics, which means | ||||||
that they also can get exported as overlapping Git branches. | ||||||
|
||||||
|
||||||
TODO: Example here | ||||||
|
||||||
#### Major Use-Cases | ||||||
|
||||||
|
||||||
##### Obsoletion of `[experimental.auto-advance-bookmarks]` | ||||||
|
||||||
Since Topics are infectious by nature, they can perfectly map to a Git branch. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. how would a topic with multiple heads map to a single Git branch (or maybe multiple branches)? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This sentence is unclear to me as well. I'm guessing what it's supposed to mean is that topics "perfectly map to" what people expect from Git branches, rather than that they'd actually be implemented as a 1:1 equivalent for Git branches. But since Git branches don't actually work the way Topics are described (i.e. they are not merely "infectious" but actually implicitly apply to all ancestor commits, and they cannot have multiple heads), they don't actually "map perfectly" to Git branches as understood by people who deeply understand the Git model (which is probably the minority of Git users, but certainly a nontrivial contingent). Certainly this sentence cannot mean that topics are always represented in the git storage backend as branches, since that would be incompatible with both of the differences cited above. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Each separate head will get its own Git branch. There's also the variant of connecting them, but I do not think that is desirable, as the different heads should represent different kind of logical work.
That is correct. But sharing the conceptual behavior never meant that we as Jujutsu VCS need to represent them the same internally. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pushing the multiple heads of a topic as
I'm not sure why I would ever want to push a topic and get branches I think forges like Gerrit would support pushing multiple heads in the same topic, but I would expect I also don't know how creating multiple branches for a topic works as the graph changes over time. Suppose I had commits B and C in a single topic, both descending from A, and I pushed, creating branches
|
||||||
This alleviates the need for the auto-advancing bookmarks. It also makes the | ||||||
use-case of the solo developer working on Git `main` easier, as you just mark | ||||||
all descendants of `trunk()` belonging to the `main` topic, which then gets | ||||||
translated to the `main` branch on `jj git push`. | ||||||
|
||||||
|
||||||
##### Change Archival | ||||||
|
||||||
For the archival use-case the infectious property of a Topic isn't as | ||||||
important. So having a | ||||||
`jj topic create -r 'trunk()..@ & description("archive:")' should mark them all | ||||||
revisions with a `archive:` description as archived. For this use-case the | ||||||
non-contiguous property of Topics really shine. | ||||||
|
||||||
##### Checkout history | ||||||
|
||||||
### Detailed Design | ||||||
|
||||||
#### Git Interop | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a part of this that seems to be missing: where does a topic "end"? I'm guessing that you would export one git branch for each head commit in the topic, rather than one git branch for every commit in the topic (since that wouldn't really have much advantage compared to I'm guessing the answer is something like: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, this is also correct.
This is @avamsi's use-case from #4180. This means that you locally add a description like
Yes, that is correct and thanks for the formulation.
Yes, I'll write it down. |
||||||
|
||||||
For all continuous Topics we can just simply export them as Git Branches. For | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Guessing this is a typo, especially since it's contrasted with "non-contiguous":
Suggested change
|
||||||
Topics which contain non-contiguous parts, we should allow exporting them | ||||||
one-by-one or with a generated name, such as `<topic-name>-1` for each part. | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What would "one-by-one" mean, and why is it distinct from having a generated name? I'm guessing it means that the CLI would let the user decide to export just one contiguous "section" of a topic without exporting the others, but [a] what would the actual benefit to the user be, and [b] wouldn't there still need to be an auto-generated name in order to prevent future conflict when exporting another "section" of the same topic? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The distinction is mostly manual.
On point A, we give the user the choice of when they want to export another slice of a topic and on B the previous slice's branch may have already been deleted, so we'd be free to choose the name again. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit: This is something to handle whenever the topic has multiple heads, not just if the topic is disconnected. For example, the following topography is connected but can't be exported as one git branch:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There will be export restrictions on Topics, as we cannot represent this as a Git Branch. This will hit the auto-generated name I mentioned previously. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FWIW, we currently skip export of bookmarks when they're conflicted, so it wouldn't be that different to skip exporting topics when they have multiple heads. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes, that would also would be a variant of dealing with the issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like this section now disagrees with the backend-implementation section below:
Or am I misunderstanding, and topics will be exported to Git as notes and branches? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I kinda like the idea of exporting them as both. Notes will avoid modifying the commit hashes while supporting faithfully exporting discontiguous topics. We could perhaps use notes as the source of truth for syncing and treat branches as purely for compatibiltiy? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
These are options beside exporting them as branches, which is the desired implementation for Git interop. We should enumerate all options to fully consider the whole design space.
That's a good idea. I just don't know what that encompasses. |
||||||
|
||||||
#### Storage | ||||||
|
||||||
We should store `Topics` as metadata on the serialized proto, without | ||||||
considering the resulting Gencode. | ||||||
|
||||||
|
||||||
```protobuf | ||||||
// A simple Key-Value pair. | ||||||
message StringPair { | ||||||
string key = 1; | ||||||
string value = 2; | ||||||
// Could be extended by a protobuf.Any see the future possibilities section. | ||||||
} | ||||||
|
||||||
message Commit { | ||||||
//... | ||||||
repeated StringPair metadata = N; | ||||||
} | ||||||
``` | ||||||
|
||||||
while the actual code should look like this: | ||||||
|
||||||
```rust | ||||||
#[derive(ContentHash, ...)] | ||||||
struct Commit { | ||||||
//... | ||||||
// | ||||||
// This avoids rewriting the Change-ID, but must be implemented. | ||||||
#[ContentHash(ignore)] | ||||||
topics: HashMap<String, String> | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do FWIW, I assume topics are commit metadata, hence the commit id changes when topics are updated. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
How would a hashset implement a Topic on two different remotes then? It may also be worth it to just rename it to metadata.
I choose this option because of #3613 (comment), as you should be able to rewrite those lightweight tags easily. If we can change the underlying commit-id w/o rewriting the Change-ID it should be unnecessary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, are you saying that topics will not be stored in the commit metadata? I think that needs to be clarified then. It wasn't clear at all to me. (The doc says that topics will be stored in a metadata map, but it doesn't say where that map is stored.) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Where did I say that?
On the Commit which makes this dance I think necessary. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I was just asking if you did because it wasn't clear what you meant. You said "I choose this option" without saying which option you meant. It was in reply to Yuya's "I assume topics are commit metadata", which made me think it would be stored in the commit, but then you linked to a comment that said "After some discussion with @PhilipMetzger we came to the conclusion that we can't actually store topics on commits", which made me think that you meant that it should not be stored on commits.
I see. I think that information needs to go in the doc too. Thanks. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Oh, so |
||||||
} | ||||||
``` | ||||||
|
||||||
#### Backend implications | ||||||
|
||||||
If Topics were stored as commit metadata, it would allow backends to drop | ||||||
them if necessary. This property can be useful to mark tests as passing | ||||||
on a specific client or avoiding a field entirely in database backed backends. | ||||||
|
||||||
For the Git backend, we could either embed them in the message, like Arcanist | ||||||
or Gerrit do or store them as Git Notes, if necessary. | ||||||
|
||||||
## Alternatives considered | ||||||
|
||||||
### Storing Topics out-of-band | ||||||
|
||||||
See [Noah's prototype][prototype] for the variant of keeping them out of band. | ||||||
While this works it falls short of having the metadata synced by multiple | ||||||
clients, which is something desirable. The prototype thus also avoids rewriting | ||||||
the Change-ID which is a good thing, but makes them only locally available. | ||||||
Comment on lines
+142
to
+144
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This Design Doc contains two features which I desire for
And we always should design for the native future and something which is compatible with the Google backend. I only remembered the drawback of keeping them out of band from an earlier Design Doc, but I am sadly unable to find that conversation with Torquestomp.
Yes, we agreed on that for Topics. But for pure Commit Metadata we never decided on it and it is feasible without to much cost, in my opinion.
Agreed.
Where did you get that from, I never mentioned something like it? AFAIK, Gerrit Topics are once again some special refs on the Gerrit/Git Server. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Well the confusing part is then that the title says "design for topics", while as you just clarified it's actually for this generic maybe-shareable metadata that topics could be an instance of. I think combining the two into one doc was a mistake - there were a lot of discussions and some sort of vision for the out-of-band topics, but this is first time I'm hearing about the other thing and the doc doesn't seem like everything was discussed and decided on because of it, at least at the moment.
Honestly don't remember 😂 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I skimmed through the doc again - you mostly talk about topics specifically in it and a high-level results of the discussions that happened - but then starting on the storage paragraph you suddently derail into this generic commit metadata protobuf thing and it gets confusing - right after saying they could be exported to git as branches, no less
And it was probably from the "Prior work" section that mentioned gerrit topics :) |
||||||
|
||||||
|
||||||
### Single Head Topics | ||||||
|
||||||
While these are conceptually simpler, they wouldn't help with Git interop where | ||||||
it is useful to map a single underlying Topic to multiple Git branches. This | ||||||
also worsens the `jj`-`Git` interop story. | ||||||
|
||||||
## Future Possibilities | ||||||
|
||||||
In the future we could attach a `google.protobuf.Any` to the Change metadata, | ||||||
which would allow specific clients, such as testrunners to directly attach test | ||||||
results to a Change which could be neat. | ||||||
|
||||||
[archived]: https://github.com/martinvonz/jj/discussions/4180 | ||||||
[bm]: ../bookmarks.md | ||||||
[checkout]: https://github.com/martinvonz/jj/issues/3713 | ||||||
[dc-thread]: https://discord.com/channels/968932220549103686/1224085912464527502 | ||||||
[gerrit-topics]: https://gerrit-review.googlesource.com/Documentation/cross-repository-changes.html | ||||||
[gh-discuss]: https://github.com/martinvonz/jj/discussions/2425#discussioncomment-7376935 | ||||||
[hg-topics]: https://www.mercurial-scm.org/doc/evolution/tutorials/topic-tutorial.html#topic-basics | ||||||
[issue]: https://github.com/martinvonz/jj/discussions/2425#discussioncomment-7376935 | ||||||
[jvns-poll]: https://social.jvns.ca/@b0rk/111709458396281239 | ||||||
[prototype]: https://github.com/martinvonz/jj/pull/3613 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scott2000
The intent of this work is to remove this behavior and take over the necessary parts from Tracking Branches. Maybe some parts will be left intact to have something equivalent to a Git Tag, but I haven't thought deeply about it.
No, see the first answer.
Ditto.
Yes, otherwise this feature would not help with Git interop.
This will continue to use Git tooling as a implementation, so it will be the merge-base.
I don't understand your question, please elaborate.
Thanks for this question, this will need some thought. Since the intended design is to allow free rewriting of commits, they can be preserved by the respective backend/server implementation.
I don't understand your question here, since I have no base of your expectations here. Can you elaborate?
See my first answer.
You'll hopefully only be able to fetch a Git Branch as a Topic, so this shouldn't occur.
See my first answer for my current plan.
Maybe? This is to be determined on how they reflect on a backend.
This behavior is to be determined here.
I don't think so, since introducing conflicting metadata seems to be a problem I don't understand.
As above.
I expect the first behavior I think, but that depends on how we want to integrate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since a Git branch is just a pointer to a head commit, I believe it's unclear which commits are logically part of a branch unless you're comparing against some base branch/commit, so for fetching new remote branches from Git, I was just wondering how
jj
will determine which commits to include in the corresponding topic and whether it's user-configurable. It seems like there's a lot of options, such as:..branch_head
trunk()..branch_head
git_topic_base()
, then topic containsgit_topic_base()..branch_head
git_topic_revisions(branch_head)
, then topic just containsgit_topic_revisions(branch_head)
base_commit..branch_head
My concern for options 2 and 3 was that it's possible for the imported topic to be resolved to an empty set (if the branch head is an ancestor of the base branch/commit), which would need to be resolved somehow.
My concern for option 2 is that some repos have multiple long-lived branches (e.g. branches A and B), and then there are branches off of those branches (e.g. C branches off from A, and D branches off from B). Ideally, it would be nice to be able to import C as containing
A..C
and D as containingB..D
, but for option 2 they would be imported astrunk()..C
andtrunk()..D
respectively. Iftrunk()
is allowed to contain multiple revisions, maybe this wouldn't be an issue though since you could set"trunk()" = "main@origin | A@origin | B@origin"
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share the concern about branch pointers not being sufficient to indicate which commits are part of a topic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I've been thinking about it more, and it seems like there could be another problem with the current approach. Currently,
trunk()
is generally defined in terms of remote bookmarks. If we get rid of remote bookmarks and replace them with remote topics, thentrunk()
would have to be defined based on a remote topic instead. However, from what I've heard, the plan is to exclude commits from a topic if they are present intrunk()
. This leads to a circular definition, wheretrunk()
depends on remote topics, but remote topics depend ontrunk()
.Maybe it would be better to keep remote bookmarks as-is, but just allow tracking a remote bookmark as a topic with a command like
jj topic track <bookmark@remote>
. Thentrunk()
could continue to be defined in terms of remote bookmarks, and there would be no circular dependency. This would also allow a command likejj topic track <bookmark@remote> --base <revset>
to specify that the topic should include only commits in<revset>..<bookmark@remote>
instead of the defaulttrunk()..<bookmark@remote>
. I don't think it would even be necessary to store<revset>
anywhere, because for future fetches, the local topic could just be updated toroots(local_topic)-..updated_remote_bookmark
.Remote topics still could be supported, but they just wouldn't be used for the Git backend. It's conceivable that a native backend might want to support both remote bookmarks and remote topics, just like Git has both tags and branches.