Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib: Add a to_wc_name() function for MergedTreeId. #2679

Closed
wants to merge 1 commit into from

Conversation

PhilipMetzger
Copy link
Contributor

To be used in a follow-up for jj run, as we use them for directory names. Documented as permanently unstable, as the representation will change in the future.

Probably missing some tests.

Checklist

If applicable:

  • I have added tests to cover my changes

@@ -201,6 +202,33 @@ impl MergedTreeId {
MergedTreeId::Merge(tree_ids) => tree_ids.clone(),
}
}

/// Represent a `MergeTreeId` in a user friendly way. This makes no
/// stability guarantee, as the format may change at any time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc, tree id isn't provided to user in any form, so I don't think the readability of tree id would matter. If we need a short unique id, we can use the ContentHash.

Perhaps, commit id would be better, but I don't know if that makes sense in jj run.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc, tree id isn't provided to user in any form, so I don't think the readability of tree id would matter. If we need a short unique id, we can use the ContentHash.

Not directly but as soon as run starts to create wc's in .jj/run/, I expect curious users to "(ab)use" them. So we need a opaque identifier which we internally map to a commit. See the my questions from Discord and the suggestion from Martin.

Perhaps, commit id would be better, but I don't know if that makes sense in jj run.

I'm not quite sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly but as soon as run starts to create wc's in .jj/run/, I expect curious users to "(ab)use" them. So we need a opaque identifier which we internally map to a commit.

I think abusing that is okay if the user knows what he's doing.

That said, do we need a way to reconstruct the original tree ids from the "opaque identifier"? If that's not needed, we won't need a chained +{id}-{id}+... string, which could be quite long and cause max path/argument length problem.

Copy link
Contributor Author

@PhilipMetzger PhilipMetzger Dec 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think abusing that is okay if the user knows what he's doing.

I don't consider that worth it. You also don't remove your browser's cache by doing a rm -rf on the cache directory, you use the provided UI for it.

If a user wants to blast the wc's away they have jj run --clean.

That said, do we need a way to reconstruct the original tree ids from the "opaque identifier"?

Yes, that's a requirement.

If that's not needed, we won't need a chained +{id}-{id}+... string, which could be quite long and cause max path/argument length problem.

The caller should truncate it at some point anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think abusing that is okay if the user knows what he's doing.

I don't consider that worth it. You also don't remove your browser's cache by doing a rm -rf on the cache directory, you use the provided UI for it.

Maybe I miss the point. I was wondering why it's not a commit id. You said "we need a opaque identifier", so I thought the point of using tree ids is to obfuscate the .jj/run/{tree_ids} structure.

I think .jj/run/{commit_id} is okay. The user might be able to abuse the data, but we don't have to care about that.

If a user wants to blast the wc's away they have jj run --clean.

That said, do we need a way to reconstruct the original tree ids from the "opaque identifier"?

Yes, that's a requirement.

If that's not needed, we won't need a chained +{id}-{id}+... string, which could be quite long and cause max path/argument length problem.

The caller should truncate it at some point anyway.

Hmm, if it's supposed to be truncated, maybe better to let the caller to generate a string from [TreeId]?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow the exact problem, but both commit id and tree id will change if the tree is updated. There may be multiple commits that shares the tree, and we'll have to materialize tree for each commit if .jj/run is keyed by commit_id, but I think that's practically okay?

Basically if we have a commit range of 200 to work through with (8/16/32 cores), we'll at some point reuse the existing directories without renaming the wc's directory name (the tree will still change). If there's another opaque id instead of a commit id, we won't misslead users if they nuke the directory.

That said, commit_id can't be used if we need to checkout an auto-merge parent. I don't know if that can be a problem. Another possible problem is to check out only subtree.

I don't understand enough of the problem, can you elaborate?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have a commit range of 200 to work through with (8/16/32 cores), we'll at some point reuse the existing directories without renaming the wc's directory name (the tree will still change).

Ok, so the directory name (or the opaque id) doesn't represent the directory content, and we don't have to use commit id or tree ids. It could be anything like {random}-{worker_thread_id} for example.

fwiw, the external merge tools (cli/src/merge_tools/external.rs) just use $temp_dir/left/right for the checked-out contents, and left_state/right_state for the tree metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the directory name (or the opaque id) doesn't represent the directory content, and we don't have to use commit id or tree ids. It could be anything like {random}-{worker_thread_id} for example.

Yes, thats also an option but I like the arbitary string we generate at the moment.

fwiw, the external merge tools (cli/src/merge_tools/external.rs) just use $temp_dir/left/right for the checked-out contents, and left_state/right_state for the tree metadata.

I'm aware but don't think that scales to jj run's use-case.

That said, commit_id can't be used if we need to checkout an auto-merge parent. I don't know if that can be a problem. Another possible problem is to check out only subtree.

I don't understand enough of the problem, can you elaborate?

I think I understood the problem, if external tools automatically create a merge we would have a unknown (please correct me if i'm wrong) commit_id, which definitely stands in the way of just running jj run -r 'my-old-pr..main' -j8 for a user. Yeah, that disqualifies the commit_id as an wc-directory name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so the directory name (or the opaque id) doesn't represent the directory content, and we don't have to use commit id or tree ids. It could be anything like {random}-{worker_thread_id} for example.

Yes, thats also an option but I like the arbitary string we generate at the moment.

I don't know why {tree_ids} is better, but let's see.

fwiw, the external merge tools (cli/src/merge_tools/external.rs) just use $temp_dir/left/right for the checked-out contents, and left_state/right_state for the tree metadata.

I'm aware but don't think that scales to jj run's use-case.

I meant jj run would do that in a better or more abstracted way. For example, left/right above will be some opaque ids, and these directories will be reused.

That said, commit_id can't be used if we need to checkout an auto-merge parent. I don't know if that can be a problem. Another possible problem is to check out only subtree.

Oops, my point was that a merge commit is usually diff-ed against the tree of auto-merge parents, but we don't have a commit id for that tree. I thought this could be a problem if the opaque id represents the directory contents.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant jj run would do that in a better or more abstracted way. For example, left/right above will be some opaque ids, and these directories will be reused.

👍 , see the next patch in this stack which started doing this, but there's no method to reinitialize a working-copy from a commit yet.

Oops, my point was that a merge commit is usually diff-ed against the tree of auto-merge parents, but we don't have a commit id for that tree. I thought this could be a problem if the opaque id represents the directory contents.

Thanks for explaining it. That property makes a commit id a no-go for jj run.

MergedTreeId::Legacy(tree_id) => tree_id.hex(),
MergedTreeId::Merge(tree_ids) => {
let ids = tree_ids
.map(|id| id.hex())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match on self.to_merge().as_resolved() instead?

MergedTreeId::Merge(_) doesn't mean the tree has conflicts, and is equivalent to Legacy(tree_id) if tree_ids.len() == 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match on self.to_merge().as_resolved() instead?

Why does the tree have to be resolved for a working-copy name?

MergedTreeId::Merge(_) doesn't mean the tree has conflicts, and is equivalent to Legacy(tree_id) if tree_ids.len() == 1.

Makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match on self.to_merge().as_resolved() instead?

Why does the tree have to be resolved for a working-copy name?

It doesn't have to be. I just thought you would want just {id} (without + prefix) in that case.

Comment on lines 208 to 209
#[allow(clippy::inherent_to_string)]
pub fn to_string(&self) -> String {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to call it something else instead of allowing the clippy lint. But also agree with Yuya that we probably don't need this method at all, so...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think it's better to implement this in the module where it's used, at least until we see that there's a common need for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose to move the conversation to #2686 and closing this pr, after implementing the above.

@PhilipMetzger PhilipMetzger changed the title lib: Add a to_string() function for MergedTreeId. lib: Add a to_wc_name() function for MergedTreeId. Dec 9, 2023
To be used in a follow-up for `jj run`, as we use them for directory names.
Documented as permanently unstable, as the representation will change in the future.
@PhilipMetzger
Copy link
Contributor Author

You can find it's usage in #2686.

@PhilipMetzger
Copy link
Contributor Author

Closing this in favor #2686 as its next update contains this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants