lib: Add a `to_wc_name()` function for `MergedTreeId`. #2679

PhilipMetzger · 2023-12-07T20:07:17Z

To be used in a follow-up for jj run, as we use them for directory names. Documented as permanently unstable, as the representation will change in the future.

Probably missing some tests.

Checklist

If applicable:

I have added tests to cover my changes

yuja · 2023-12-08T14:28:21Z

lib/src/backend.rs

@@ -201,6 +202,33 @@ impl MergedTreeId {
            MergedTreeId::Merge(tree_ids) => tree_ids.clone(),
        }
    }
+
+    /// Represent a `MergeTreeId` in a user friendly way. This makes no
+    /// stability guarantee, as the format may change at any time.


iirc, tree id isn't provided to user in any form, so I don't think the readability of tree id would matter. If we need a short unique id, we can use the ContentHash.

Perhaps, commit id would be better, but I don't know if that makes sense in jj run.

iirc, tree id isn't provided to user in any form, so I don't think the readability of tree id would matter. If we need a short unique id, we can use the ContentHash.

Not directly but as soon as run starts to create wc's in .jj/run/, I expect curious users to "(ab)use" them. So we need a opaque identifier which we internally map to a commit. See the my questions from Discord and the suggestion from Martin.

Perhaps, commit id would be better, but I don't know if that makes sense in jj run.

I'm not quite sure.

Not directly but as soon as run starts to create wc's in .jj/run/, I expect curious users to "(ab)use" them. So we need a opaque identifier which we internally map to a commit.

I think abusing that is okay if the user knows what he's doing.

That said, do we need a way to reconstruct the original tree ids from the "opaque identifier"? If that's not needed, we won't need a chained +{id}-{id}+... string, which could be quite long and cause max path/argument length problem.

I think abusing that is okay if the user knows what he's doing.

I don't consider that worth it. You also don't remove your browser's cache by doing a rm -rf on the cache directory, you use the provided UI for it.

If a user wants to blast the wc's away they have jj run --clean.

That said, do we need a way to reconstruct the original tree ids from the "opaque identifier"?

Yes, that's a requirement.

If that's not needed, we won't need a chained +{id}-{id}+... string, which could be quite long and cause max path/argument length problem.

The caller should truncate it at some point anyway.

I think abusing that is okay if the user knows what he's doing.

I don't consider that worth it. You also don't remove your browser's cache by doing a rm -rf on the cache directory, you use the provided UI for it.

Maybe I miss the point. I was wondering why it's not a commit id. You said "we need a opaque identifier", so I thought the point of using tree ids is to obfuscate the .jj/run/{tree_ids} structure.

I think .jj/run/{commit_id} is okay. The user might be able to abuse the data, but we don't have to care about that.

If a user wants to blast the wc's away they have jj run --clean.

That said, do we need a way to reconstruct the original tree ids from the "opaque identifier"?

Yes, that's a requirement.

If that's not needed, we won't need a chained +{id}-{id}+... string, which could be quite long and cause max path/argument length problem.

The caller should truncate it at some point anyway.

Hmm, if it's supposed to be truncated, maybe better to let the caller to generate a string from [TreeId]?

I don't follow the exact problem, but both commit id and tree id will change if the tree is updated. There may be multiple commits that shares the tree, and we'll have to materialize tree for each commit if .jj/run is keyed by commit_id, but I think that's practically okay?

Basically if we have a commit range of 200 to work through with (8/16/32 cores), we'll at some point reuse the existing directories without renaming the wc's directory name (the tree will still change). If there's another opaque id instead of a commit id, we won't misslead users if they nuke the directory.

That said, commit_id can't be used if we need to checkout an auto-merge parent. I don't know if that can be a problem. Another possible problem is to check out only subtree.

I don't understand enough of the problem, can you elaborate?

if we have a commit range of 200 to work through with (8/16/32 cores), we'll at some point reuse the existing directories without renaming the wc's directory name (the tree will still change).

Ok, so the directory name (or the opaque id) doesn't represent the directory content, and we don't have to use commit id or tree ids. It could be anything like {random}-{worker_thread_id} for example.

fwiw, the external merge tools (cli/src/merge_tools/external.rs) just use $temp_dir/left/right for the checked-out contents, and left_state/right_state for the tree metadata.

Ok, so the directory name (or the opaque id) doesn't represent the directory content, and we don't have to use commit id or tree ids. It could be anything like {random}-{worker_thread_id} for example.

Yes, thats also an option but I like the arbitary string we generate at the moment.

fwiw, the external merge tools (cli/src/merge_tools/external.rs) just use $temp_dir/left/right for the checked-out contents, and left_state/right_state for the tree metadata.

I'm aware but don't think that scales to jj run's use-case.

That said, commit_id can't be used if we need to checkout an auto-merge parent. I don't know if that can be a problem. Another possible problem is to check out only subtree.

I don't understand enough of the problem, can you elaborate?

I think I understood the problem, if external tools automatically create a merge we would have a unknown (please correct me if i'm wrong) commit_id, which definitely stands in the way of just running jj run -r 'my-old-pr..main' -j8 for a user. Yeah, that disqualifies the commit_id as an wc-directory name.

Ok, so the directory name (or the opaque id) doesn't represent the directory content, and we don't have to use commit id or tree ids. It could be anything like {random}-{worker_thread_id} for example.

Yes, thats also an option but I like the arbitary string we generate at the moment.

I don't know why {tree_ids} is better, but let's see.

fwiw, the external merge tools (cli/src/merge_tools/external.rs) just use $temp_dir/left/right for the checked-out contents, and left_state/right_state for the tree metadata.

I'm aware but don't think that scales to jj run's use-case.

I meant jj run would do that in a better or more abstracted way. For example, left/right above will be some opaque ids, and these directories will be reused.

That said, commit_id can't be used if we need to checkout an auto-merge parent. I don't know if that can be a problem. Another possible problem is to check out only subtree.

Oops, my point was that a merge commit is usually diff-ed against the tree of auto-merge parents, but we don't have a commit id for that tree. I thought this could be a problem if the opaque id represents the directory contents.

I meant jj run would do that in a better or more abstracted way. For example, left/right above will be some opaque ids, and these directories will be reused.

👍 , see the next patch in this stack which started doing this, but there's no method to reinitialize a working-copy from a commit yet.

Oops, my point was that a merge commit is usually diff-ed against the tree of auto-merge parents, but we don't have a commit id for that tree. I thought this could be a problem if the opaque id represents the directory contents.

Thanks for explaining it. That property makes a commit id a no-go for jj run.

yuja · 2023-12-08T14:34:09Z

lib/src/backend.rs

+            MergedTreeId::Legacy(tree_id) => tree_id.hex(),
+            MergedTreeId::Merge(tree_ids) => {
+                let ids = tree_ids
+                    .map(|id| id.hex())


match on self.to_merge().as_resolved() instead?

MergedTreeId::Merge(_) doesn't mean the tree has conflicts, and is equivalent to Legacy(tree_id) if tree_ids.len() == 1.

match on self.to_merge().as_resolved() instead?

Why does the tree have to be resolved for a working-copy name?

MergedTreeId::Merge(_) doesn't mean the tree has conflicts, and is equivalent to Legacy(tree_id) if tree_ids.len() == 1.

Makes sense.

match on self.to_merge().as_resolved() instead?

Why does the tree have to be resolved for a working-copy name?

It doesn't have to be. I just thought you would want just {id} (without + prefix) in that case.

martinvonz · 2023-12-09T03:22:44Z

lib/src/backend.rs

+    #[allow(clippy::inherent_to_string)]
+    pub fn to_string(&self) -> String {


I think it's better to call it something else instead of allowing the clippy lint. But also agree with Yuya that we probably don't need this method at all, so...

Renamed for now.

I still think it's better to implement this in the module where it's used, at least until we see that there's a common need for it.

I would propose to move the conversation to #2686 and closing this pr, after implementing the above.

To be used in a follow-up for `jj run`, as we use them for directory names. Documented as permanently unstable, as the representation will change in the future.

PhilipMetzger · 2023-12-09T18:29:14Z

You can find it's usage in #2686.

PhilipMetzger · 2024-02-04T21:08:19Z

Closing this in favor #2686 as its next update contains this change.

PhilipMetzger force-pushed the push-ytvunlzsxzvx branch from f259495 to ef9a8fb Compare December 7, 2023 20:16

yuja reviewed Dec 8, 2023

View reviewed changes

martinvonz reviewed Dec 9, 2023

View reviewed changes

PhilipMetzger force-pushed the push-ytvunlzsxzvx branch from ef9a8fb to e689991 Compare December 9, 2023 18:11

PhilipMetzger changed the title ~~lib: Add a to_string() function for MergedTreeId.~~ lib: Add a to_wc_name() function for MergedTreeId. Dec 9, 2023

lib: Add a to_wc_name() function for MergedTreeId.

f7c87fd

To be used in a follow-up for `jj run`, as we use them for directory names. Documented as permanently unstable, as the representation will change in the future.

PhilipMetzger force-pushed the push-ytvunlzsxzvx branch from e689991 to f7c87fd Compare December 9, 2023 18:12

PhilipMetzger mentioned this pull request Jan 4, 2024

lib: Add the WorkingCopyStore trait and a default implementation. #2686

Draft

4 tasks

PhilipMetzger closed this Feb 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lib: Add a `to_wc_name()` function for `MergedTreeId`. #2679

lib: Add a `to_wc_name()` function for `MergedTreeId`. #2679

PhilipMetzger commented Dec 7, 2023

yuja Dec 8, 2023

PhilipMetzger Dec 8, 2023

yuja Dec 9, 2023

PhilipMetzger Dec 9, 2023 •

edited

Loading

yuja Dec 10, 2023

PhilipMetzger Dec 14, 2023

yuja Dec 15, 2023

PhilipMetzger Dec 15, 2023

yuja Dec 15, 2023

PhilipMetzger Dec 16, 2023

yuja Dec 8, 2023

PhilipMetzger Dec 9, 2023

yuja Dec 10, 2023

martinvonz Dec 9, 2023

PhilipMetzger Dec 9, 2023

martinvonz Dec 10, 2023

PhilipMetzger Dec 14, 2023

PhilipMetzger commented Dec 9, 2023

PhilipMetzger commented Feb 4, 2024

		#[allow(clippy::inherent_to_string)]
		pub fn to_string(&self) -> String {

lib: Add a to_wc_name() function for MergedTreeId. #2679

lib: Add a to_wc_name() function for MergedTreeId. #2679

Conversation

PhilipMetzger commented Dec 7, 2023

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PhilipMetzger Dec 9, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PhilipMetzger commented Dec 9, 2023

PhilipMetzger commented Feb 4, 2024

lib: Add a `to_wc_name()` function for `MergedTreeId`. #2679

lib: Add a `to_wc_name()` function for `MergedTreeId`. #2679

PhilipMetzger Dec 9, 2023 •

edited

Loading