-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
conflicts: escape conflict markers by making them longer #4969
base: main
Are you sure you want to change the base?
Conversation
c547024
to
d62d14c
Compare
Perhaps this should be tested by correctly parsing its own source code. |
Surprisingly, the source code for conflicts.rs doesn't actually contain any lines that look like conflict markers! The only two files that have lines which can be confused for conflict markers currently are tutorial.md and conflicts.md it looks like. |
d62d14c
to
6446b3d
Compare
6446b3d
to
9f0be3b
Compare
lib/src/conflicts.rs
Outdated
debug_assert!(expected_len >= MIN_CONFLICT_MARKER_LEN); | ||
|
||
parse_conflict_marker_any_len(line) | ||
.filter(|marker| marker.len == expected_len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we need to accept marker.len >= expected_len
?
Suppose working-copy content was materialized with N = 10
, and user removed a literal "<<<<<<<<<" (N = 9) without resolving any other conflicts, the new expected_len
would drop. iirc, the working-copy content isn't updated to use the new expected_len
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I had originally, but I think it's too permissive actually. If we just had a minimum, and didn't enforce an exact number, then this could happen:
- Conflict materialized with length 12 markers since file contains length 8 marker
- User deletes length 8 marker
- Conflict is parsed expecting length >=12 markers successfully
- User adds length 8 marker back, since they made a mistake while editing
- Conflict is parsed expecting length >=7 markers now, so both the length 8 and length 12 markers are parsed, even though the length 8 marker isn't a real marker
To solve this, I made it so that only the longest length markers in a file are parsed. So the example would look like this:
- Conflict materialized with length 12 markers since file contains length 8 marker
- User deletes length 8 marker
- Conflict is parsed expecting length 12 markers successfully
- User adds length 8 marker back, since they made a mistake while editing
- Conflict is parsed expecting length 12 markers (since 12 is the longest marker present, and 12 >= 7), so the length 8 markers are correctly ignored
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, we can't know whether the re-added marker is a literal or conflict, right? My feeling is that it's safer to assume that a parsable marker is a conflict rather than invalidating any other conflicts by single addition of marker-looking line.
If we prefer stricter behavior, we'll probably need to store the materialized marker length to TreeState
on checkout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's a good point. Sorry for the long response; I was just trying to think through all the reasonable options there might be and summarize them. I think these are the variables we can use for setting a bound:
- R = length of real conflict markers in original materialization (would need to be stored in
TreeState
) - N = length of longest fake conflict markers in last snapshot's
Merge<FileId>
- K = increment added to longest fake marker to get materialized marker length
- C = length of longest conflict marker (real or fake) in the raw file contents being snapshotted
I think it's important that we have a simple rule that the user can follow while editing a file to ensure that the real conflict markers will continue to be parsed successfully. These are the 4 options I can think of, along with the expressions that we could use to determine if a marker is parsable:
Possible options
Option 1: "Don't add any new fake markers which are longer than any fake markers in the last snapshotted version of the file"
Either len >= max(7, N+K)
or len >= max(7, N+1)
would work here.
Option 2: "Don't add any new fake markers which are longer than any fake markers in the original materialized file"
I think len == max(7, N+K, C)
matches this rule, which is what I currently have implemented. The main difference with this and option 1 is that the user is allowed to add back fake conflict markers that they removed, and they still won't be parsed.
Option 3: "Don't add any new fake markers with length greater than or equal to the markers that jj
added"
I think len >= R
is the obvious implementation of this rule. It does require adding it to the TreeState
though.
I think len == max(7, N+1, C)
would also fall into this category, since the user could still add any fake markers that are shorter than the original materialized markers, since that would keep the original materialized markers as being the longest in the file, so it would continue parsing them successfully. This is different from len == max(7, N+K, C)
, because that rule would cause snapshotting to fail the next time after adding any fake markers of length N+1
through N+K-1
, since that would result in the real conflict markers becoming considered shorter than allowed in the following snapshot.
Option 4: "Don't add any new fake markers with the exact same length as the markers that jj
added"
This is probably the easiest rule to follow for the user, and it is what we currently have in jj 0.23.0
, except that the materialized length is always 7. This could be achieved with len == R
, which also requires adding it to the TreeState
.
If we don't store the materialized marker length in the TreeState
, then I believe adding a fake marker which is longer than the real materialized markers will always cause the real markers to not be parsed correctly. With some implementations it might fail to parse in the current snapshot (because it picked the wrong markers to parse), or with other implementations it might parse successfully in the current snapshot, but then fail in the next snapshot (because the inferred minimum conflict marker length increased as a result of there being new fake conflict markers added), or it could parse successfully but just have the wrong content (since some fake markers were treated as real ones and happened to form a valid conflict).
With that in mind, I feel like these are the 3 best options, each with different tradeoffs (I think this is basically what you were saying, with the first and last options being the safest choices):
len >= max(7, N+1)
, since it is the simplest to implement, and it handles all of the common cases reasonably well.len == max(7, N+1, C)
, since it allows adding any fake markers that are shorter than the real markers, which the user might expect to work.len == R
since it relies on the fewest assumptions (but it requires storing marker length in theTreeState
).
I think I'll look into adding it to the TreeState
to see how difficult it would be. If it's a small change, then probably that would be the best option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'll look into adding it to the TreeState to see how difficult it would be. If it's a small change, then probably that would be the best option?
It wouldn't be difficult, but I'm not pretty sure if that's better because we're adding a new persistent state invisible to user. (BTW, the materialized conflict style could be stored in the TreeState if supported styles had ambiguous markers.)
@ilyagr, @martinvonz, any comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- R = length of real conflict markers in original materialization (would need to be stored in
TreeState
)Do we need to store it in the
TreeState
? Can we not get it from the previous content in the tree object?
Do you mean getting it from the tree of the working-copy parent? It might work, but I'm not sure.
FWIW, I think it makes sense to trust TreeState
here because the materialization parameter is kinda working-copy state, not a state of the committed tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the PR to first implement parsing if >= materialized_marker_len
, and then I added conflict_marker_len
to the file state in another commit to see how it could work. If we decide not to add it to the file state, I can always drop the last commit from the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean getting it from the tree of the working-copy parent? It might work, but I'm not sure.
I meant getting it from the previous snapshot. But it seems like it would not handle the case in test_update_from_content_malformed_conflict
anyway (unless we allowed the snapshotting process to make the conflict markers in the working copy longer, but that seems like a really bad idea).
I was thinking that it's good if TreeState
is just a cache that can we can discard without impact other than in the corner case where a file has been modified in a way that the cache prevents us from detecting (currently if the mtime and size is preserved but e.g. inode is changed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking that it's good if
TreeState
is just a cache that can we can discard without impact other than in the corner case where a file has been modified in a way that the cache prevents us from detecting (currently if the mtime and size is preserved but e.g. inode is changed).
Currently, I have it set to default to parsing length 7 conflict markers if there's no materialized conflict data in the TreeState
, so this should still work in many cases since it will parse all conflict markers of length >= 7
if the TreeState
is reset, and the conflict markers should usually be length 7 anyways.
If we want to make it possible to reset the TreeState
in more cases, we could have the conflict marker length default to choose_materialized_conflict_marker_len(current_tree_merge)
instead maybe? There is a tradeoff though, because doing this could break already-materialized conflicts from older versions of jj
if a user upgrades and the new materialized markers are longer than the actual markers in the file (which would be length 7 always in older versions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to keep it simple for now.
lib/src/conflicts.rs
Outdated
// considered longer than necessary the next time the file is snapshotted. | ||
let expected_marker_len = materialized_marker_len | ||
.unwrap_or_else(|| choose_conflict_marker_len(&merge_hunk)) | ||
.max(max_content_marker_len); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we trust the marker length of the new content
? Doesn't it mean any existing conflicts can be invalidated if user added long <<<<<<<<<<<<<
literal by mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, they would be invalidated if the user adds a marker with length greater than or equal to the materialized marker length. I'm not sure we can work around that though, since there will always be some length of marker that's invalid for them to add.
My goal was to make it so that if a file initially had fake markers of length N when it was materialized, then the user can always add/remove as many markers as they want with length <= N without breaking parsing, even if multiple snapshot operations happen during the process. I think it makes sense that adding markers of length > N could interfere with conflict resolution, since it makes it ambiguous whether the marker is a real one or not.
9f0be3b
to
cb50ba3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I thought I had already sent these comments last week
lib/src/conflicts.rs
Outdated
/// Represents a kind of conflict marker which can be materialized and parsed. | ||
#[derive(Clone, Copy, PartialEq, Eq)] | ||
#[repr(u8)] | ||
enum ConflictMarkerKind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: How about ConflictMarkerSeparator
or ConflictMarkerSeparatorChar
since it just represents individual separators rather than the whole marker?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how I feel about "separator". I personally have always thought of each individual line like "<<<<<<<" as being markers, rather than just being a part of a marker. Maybe ConflictMarkerLine
and ConflictMarkerLineChar
would be clearer? I don't feel strongly though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ConflictMarkerChar
, LineChar
, LineKind
are good, if the original concern was that ConflictMarkerKind
sounds like a ConflictMarkerStyle
.
lib/src/conflicts.rs
Outdated
debug_assert!(expected_len >= MIN_CONFLICT_MARKER_LEN); | ||
|
||
parse_conflict_marker_any_len(line) | ||
.filter(|marker| marker.len == expected_len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point about the invisible state. I think that's enough to convince me that one of the options is better. I'm leaning towards the first choice because it's simplest. But it's a good point about in the second option about presence of markers of different lengths. These would be real markers or markers that we would at least interpret as real. Maybe another option is to use option 1 but additionally require that all markers have the same length, or otherwise interpret the file as resolved. I don't strongly which of these three options we go with.
By the way, I think background snapshotting makes Scott's example at the start of this thread more important because you might save and then undo and save again and get two snapshots, which would then get an additional real conflict. OTOH, background snapshotting already makes it so a save while your editing conflict markers often results the file getting interpreted as resolved instead, so we're not making it much worse by adding this new risk of reinterpreting markers.
8b28066
to
5299c8f
Compare
"}; | ||
// On the first snapshot, it will parse as a conflict containing conflict | ||
// markers as text | ||
let new_conflict = parse(&conflict, new_conflict_contents.as_bytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. This one is scary, and would probably provide a worse UX than the current state.
We'll have to either remember the materialized marker length or parse unpaired markers into some form of conflict.
lib/src/conflicts.rs
Outdated
/// Represents a kind of conflict marker which can be materialized and parsed. | ||
#[derive(Clone, Copy, PartialEq, Eq)] | ||
#[repr(u8)] | ||
enum ConflictMarkerKind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ConflictMarkerChar
, LineChar
, LineKind
are good, if the original concern was that ConflictMarkerKind
sounds like a ConflictMarkerStyle
.
5299c8f
to
82cad26
Compare
lib/src/conflicts.rs
Outdated
debug_assert!(expected_len >= MIN_CONFLICT_MARKER_LEN); | ||
|
||
parse_conflict_marker_any_len(line) | ||
.filter(|marker| marker.len == expected_len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean getting it from the tree of the working-copy parent? It might work, but I'm not sure.
I meant getting it from the previous snapshot. But it seems like it would not handle the case in test_update_from_content_malformed_conflict
anyway (unless we allowed the snapshotting process to make the conflict markers in the working copy longer, but that seems like a really bad idea).
I was thinking that it's good if TreeState
is just a cache that can we can discard without impact other than in the corner case where a file has been modified in a way that the cache prevents us from detecting (currently if the mtime and size is preserved but e.g. inode is changed).
cf0c834
to
b99285a
Compare
These changes make the code a bit more readable, and they will make it easier to have conflict markers of different lengths in the next commit.
If a file contains lines which look like conflict markers, then we need to make the real conflict markers longer so that the materialized conflicts can be parsed unambiguously. When parsing the conflict, we require that the conflict markers are at least as long as the materialized conflict markers based on the current tree. This can lead to some unintuitive edge cases which will be solved in the next commit. For instance, if we have a file explaining the differences between Jujutsu's conflict markers and Git's conflict markers, it could produce a conflict with long markers like this: ``` <<<<<<<<<<< Conflict 1 of 1 %%%%%%%%%%% Changes from base to side jj-vcs#1 Jujutsu uses different conflict markers than Git, which just shows the -sides of a conflict without a diff. +sides of a conflict without a diff: + +<<<<<<< +left +||||||| +base +======= +right +>>>>>>> +++++++++++ Contents of side jj-vcs#2 Jujutsu uses different conflict markers than Git: <<<<<<< %%%%%%% -base +left +++++++ right >>>>>>> >>>>>>>>>>> Conflict 1 of 1 ends ``` We should support these options for "git" conflict marker style as well, since Git actually does support producing longer conflict markers in some cases through .gitattributes: https://git-scm.com/docs/gitattributes#_conflict_marker_size We may also want to support passing the conflict marker length to merge tools as well in the future, since Git supports a "%L" parameter to pass the conflict marker length to merge drivers: https://git-scm.com/docs/gitattributes#_defining_a_custom_merge_driver
Storing the conflict marker length in the working copy makes conflict parsing more consistent, and it allows us to parse valid conflict hunks even if the user left some invalid conflict markers in the file while resolving the conflicts.
b99285a
to
f2d6c28
Compare
Related to #3975 and #4088.
Checklist
If applicable:
CHANGELOG.md