lib conflicts: Materialize (sides with (conflict markers) or (no newline at EOF)) in a way that can be parsed #4088

ilyagr · 2024-07-15T03:43:41Z

There might still be rough spots I forgot about, but I think this is more or less ready to go.

User docs are TODO in this or a following PR.

The plan is to address #3223 in a follow-up PR, by having empty files show up as something along the lines of \JJ There is no file at this path on this side of the conflict. It may have been deleted.

Checklist

If applicable:

I have updated CHANGELOG.md
I have updated the documentation (README.md, docs/, demos/)
n/a I have updated the config schema (cli/src/config-schema.json)
I have added tests to cover my changes

yuja · 2024-07-17T10:44:17Z

lib/src/conflicts.rs

+
+    // From now on, we assume some invariants guaranteed by the encoding function
+    // above. See its docstring for details.
+


As I said, I believe it's wrong to preprocess source texts to be merged. What we need here is to encode merge_result, right? Why do we have to do source -> serialize1 -> merge -> serialize2 instead of source -> merge -> serialize?

I thought the main problem we were discussing in #3977 (comment) was that the transformation wasn't reversible (or, more precisely, that two different sides could be mapped to the same serialized version, causing a conflict to disappear). Now, the transformation is perfectly reversible.

The problem with serializing once is that there are many corner cases that I find it hard to reason about. I do not currently have a clear idea how to do it (so that it works in every case and is human-readable), and even if I did know a way, I wouldn't know how to do it elegantly without a gigantic test suite (and maintenance burden) for a feature that only comes in play rarely.

Some corner cases:

Conflict markers inside a conflict vs conflict markers that are the same on all sides of the conflict.

"No newline at the end of file" outside a conflict (this one is pretty easy) vs on all sides of a conflict vs (on some sides of a conflict that is on the end of file for all sides) vs (on no sides of a conflict at the end of file) vs (one of the last two cases for a conflict that's only at the end of file for some of the sides).

Unresolvable conflict when merging with empty file #3223 (which this approach provides a natural solution for, as I described in the PR description)

By having an intermediate state, we don't have to worry about most corner cases, we only have to make sure the first serialization satisfies a few invariants and is reversible. This way, the implementation is much simpler and, more importantly, verifying that it works correctly in all cases is much simpler.

Hmm, but we have to instead design the serialize1 part robust against line-by-line merging logic, right? Does it handle merge of two "no-eol" hunks for example? I suspect matching "no-eol" markers would be rendered out of the conflict region.

I didn't think about corner cases thoroughly, but I think nested conflict markers can be represented as a hunk (i.e. padded with " ") instead of a "base" content.

Hmm, but we have to instead design the serialize1 part robust against line-by-line merging logic, right?

Yes, that's what I think I did. Or do you mean something else?

Does it handle merge of two "no-eol" hunks for example? I suspect matching "no-eol" markers would be rendered out of the conflict region.

Yes, and then jj should import it correctly. I can add a test, it should look like:

<<<<<<< aaa ... >>>>>>> \JJ: No newline at end of file

For comparison, if the last line is "bbb" and not conflicted, it'd look like:

<<<<<<< aaa ... >>>>>>> bbb \JJ: No newline at end of file

Finally, see the test in eb0f55d#diff-a63c4cb277c53acb282fa66016517078d2d8b452259d061ba2449a27cf9ef277R390 for a conflict in whether there is a newline at the end of a file.

Ooh, so \JJ: lines are file global, not hunk local?

I would say it's super easy for users to lose track of them. I wouldn't expect that I had to fix up lines out of the loud conflict markers.

If the file is in conflict state, maybe we'll need to represent verbatim <<<<<<< as a fake 1-way conflict (to escape)?

This is similar to a previous idea I had. I ultimately decided against it because I thought it would look super-messy and would be relatively complicated to implement.

The idea was to allow the opening conflict marker to have more than 7 <s. If it has, say, 9 <s, then all conflict markers in that conflict are required to be 9 characters long. So, a sequence of 7 <s would be considered as just part of the file.

This solves the problem of conflict markers inside conflicts. Then, we'd need some syntax for one-sided conflicts outside conflicts, e.g.

<<<<<<<<<< <<<<<<< aaa ======= bbb >>>>>>> >>>>>>>>>>

This has a two issues:

The parser for conflict markers becomes more complicated. We couldn't parse it with regular expressions, and seemingly not with pest as it's not a simple enough grammar. We could do it manually or with another parser combinator library, probably (I forget what the one I'm thinking of is called).

With the simplest implementation of materialization, the conflict example above would look much more complicated (though this is equivalent):

<<<<<<<<<< <<<<<<< >>>>>>>>>> aaa <<<<<<<<<< ======= >>>>>>>>>> bbb <<<<<<<<<< >>>>>>> >>>>>>>>>>

Or, we could do a simple greedy implementation that surrounds huge regions of the file with these markers, making the markers hard to find.

So, I ended up leaning away from the idea. OTOH, the second problem would only occur for a minority of files, so maybe it's not too terrible?

Just to keep count, I now have 3 ideas in mind: this PR's idea, this PR + explicit unsetting of the conflicted state of the file in some cases (to avoid jj automatically removing \JJ lines, have a command do it), or the long conflict markers. I think the current version is the simplest, but I'm not sure by how much.

The idea was to allow the opening conflict marker to have more than 7 <s. If it has, say, 9 <s, then all conflict markers in that conflict are required to be 9 characters long.

That seems not bad, and relatively simple to implement.

Let's say any of the base file contents have lines matching ^([<>]{7,}), we'll use len(max(matches)) + N <s as conflict marker. When parsing, the (minimum?) number of the <s can be determined from the base files, I think.

When parsing, the (minimum?) number of the <s can be determined from the base files, I think.

I initially forgot about this issue, and it's critical. Relying on the original conflict sides for this is a nice idea. I'm not entirely sure it will work without being too confusing, but it might.

To elaborate for others, for the "long conflict marker" approach to have any advantage over \JJ Verbatim Line:, jj must be able to tell the difference between:

a file that contains a 7-< conflict and

a file that used to have 9-< conflicts and text with embedded 7-< conflict markers (like one of the examples I gave just above in lib conflicts: Materialize (sides with (conflict markers) or (no newline at EOF)) in a way that can be parsed #4088 (comment)), but now the user removed the conflict markers, and the remaining 7-< conflict markers should be part of the text.

The text of the file is identical in the two situations, but jj has to know there are no conflicts in the latter file.

jj already does something similar: before parsing the file as a conflict, it currently has to know the number of sides expected in the conflict. jj gets it from the in-tree version of the conflict.

I realized that if we get the required number of <s from the original conflict, there is no longer a technical need to surround unconflicted conflict markers with artificial "one-sided conflicts". OTOH, I'm worried that with or without those one-sided conflicts, they'll be confusing to the user.

Currently, my best idea is to add something like this to the beginning of the file:

<<<<<<<<< Conflict 1 of 5 (Informational, not a real conflict) \\\\\\\\\ JJ: JJ conflict markers in this file are 9 characters long, longer than the normal 7. \\\\\\\\\ JJ: This is because this file contains text that looks like conflict markers. \\\\\\\\\ JJ: This fake conflict can be safely deleted, and must be deleted for the file to be considered resolved. >>>>>>>>> End of informational conflict 1 of 5 Blah, blah. This wouldn't be a conflict: <<<<<<< aaa ======= bbb >>>>>>> bleh bleh (4 other conflicts should exist in this file somewhere)

I think, in any case, the user will need to resolve the last remaining conflict in a way that the file content is identical to the non-conflict materialization. Otherwise, the working-copy file would have diffs from the resolved version of the tree content.

I assumed the "minimum length marker" idea practically solves this problem as the user can inline verbatim <<<<<<< line if the base file contained verbatim <<<<<<< and therefore the marker line must be at least 8-char long.

lib/src/merge.rs

yuja · 2024-07-17T10:48:52Z

lib/src/conflicts.rs

-pub fn parse_conflict(input: &[u8], num_sides: usize) -> Option<Vec<Merge<ContentHunk>>> {
+pub fn parse_merge_result(input: &[u8], num_sides: usize) -> Option<Merge<ContentHunk>> {
+    let hunks = parse_conflict_into_list_of_hunks(input, num_sides)?;
+    let mut result: Merge<Vec<u8>> = Merge::from_vec(vec![vec![]; num_sides * 2 - 1]);


nit: you can build Merge<ContentHunk>, or Vec<ContentHunk> and apply Merge::from_vec() at the end.

I see you're trying to help out with this in #4106 . Thanks! I may clean it up in this PR, or we can do it separately; in any case we should figure out the bigger issues first.

`parse_merge_result` Now, it matches `materialize_merge_result`

… reference

Fixes martinvonz#3968 Fixes martinvonz#3975

ilyagr mentioned this pull request Jul 15, 2024

conflicts: insert trailing newline when materializing conflicts #3977

Closed

2 tasks

ilyagr changed the title ~~Materialize files with conflict markers in a way that can be parsed~~ Materialize files with conflict markers or no newline at EOF in a way that can be parsed Jul 15, 2024

ilyagr force-pushed the unmat branch from 03e088a to de1d3c0 Compare July 15, 2024 03:46

ilyagr changed the title ~~Materialize files with conflict markers or no newline at EOF in a way that can be parsed~~ Materialize ((unconflicted files) with (conflict markers) or (no newline at EOF)) in a way that can be parsed Jul 15, 2024

ilyagr changed the title ~~Materialize ((unconflicted files) with (conflict markers) or (no newline at EOF)) in a way that can be parsed~~ Materialize conflicts containing (sides with (conflict markers) or (no newline at EOF)) in a way that can be parsed Jul 15, 2024

ilyagr changed the title ~~Materialize conflicts containing (sides with (conflict markers) or (no newline at EOF)) in a way that can be parsed~~ lib conflicts: Materialize (sides with (conflict markers) or (no newline at EOF)) in a way that can be parsed Jul 15, 2024

ilyagr force-pushed the unmat branch 11 times, most recently from 5a67ac6 to 04083d4 Compare July 15, 2024 18:44

ilyagr marked this pull request as ready for review July 15, 2024 18:44

ilyagr marked this pull request as draft July 15, 2024 18:44

ilyagr marked this pull request as ready for review July 15, 2024 18:48

ilyagr force-pushed the unmat branch 4 times, most recently from 1797b68 to eb0f55d Compare July 16, 2024 23:48

yuja reviewed Jul 17, 2024

View reviewed changes

ilyagr force-pushed the unmat branch from eb0f55d to ea85467 Compare July 18, 2024 04:34

ilyagr mentioned this pull request Jul 18, 2024

lib conflicts: minor bugfix + tests #4119

Merged

ilyagr force-pushed the unmat branch 3 times, most recently from ccbb160 to 5519ffd Compare July 25, 2024 04:29

ilyagr mentioned this pull request Jul 26, 2024

FR: Materialize files with conflict markers (or any files) in a way that can be parsed #3975

Open

ilyagr force-pushed the unmat branch 2 times, most recently from 18b3795 to 51d761e Compare August 5, 2024 04:58

ilyagr added 4 commits August 6, 2024 22:57

lib Merge: add Merge::map_owned method that consumes the map

b1bbb0e

lib conflicts: have parse_conflict return a single hunk, rename to

889cd8f

`parse_merge_result` Now, it matches `materialize_merge_result`

conflicts: Have materialize_merge_result take the object instead of a…

097a0bb

… reference

conflicts: encode unmaterializeable lines

1991006

Fixes martinvonz#3968 Fixes martinvonz#3975

ilyagr force-pushed the unmat branch from 51d761e to 1991006 Compare August 7, 2024 06:02

ilyagr marked this pull request as draft August 8, 2024 03:10

scott2000 mentioned this pull request Nov 25, 2024

conflicts: escape conflict markers by making them longer #4969

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lib conflicts: Materialize (sides with (conflict markers) or (no newline at EOF)) in a way that can be parsed #4088

lib conflicts: Materialize (sides with (conflict markers) or (no newline at EOF)) in a way that can be parsed #4088

ilyagr commented Jul 15, 2024 •

edited

Loading

yuja Jul 17, 2024

ilyagr Jul 17, 2024 •

edited

Loading

yuja Jul 18, 2024

ilyagr Jul 18, 2024 •

edited

Loading

yuja Jul 18, 2024

ilyagr Jul 18, 2024 •

edited

Loading

yuja Jul 19, 2024

ilyagr Jul 20, 2024 •

edited

Loading

ilyagr Jul 26, 2024 •

edited

Loading

yuja Jul 27, 2024

yuja Jul 17, 2024

ilyagr Jul 17, 2024


		// From now on, we assume some invariants guaranteed by the encoding function
		// above. See its docstring for details.

lib conflicts: Materialize (sides with (conflict markers) or (no newline at EOF)) in a way that can be parsed #4088

Are you sure you want to change the base?

lib conflicts: Materialize (sides with (conflict markers) or (no newline at EOF)) in a way that can be parsed #4088

Conversation

ilyagr commented Jul 15, 2024 • edited Loading

Checklist

yuja Jul 17, 2024

Choose a reason for hiding this comment

ilyagr Jul 17, 2024 • edited Loading

Choose a reason for hiding this comment

yuja Jul 18, 2024

Choose a reason for hiding this comment

ilyagr Jul 18, 2024 • edited Loading

Choose a reason for hiding this comment

yuja Jul 18, 2024

Choose a reason for hiding this comment

ilyagr Jul 18, 2024 • edited Loading

Choose a reason for hiding this comment

yuja Jul 19, 2024

Choose a reason for hiding this comment

ilyagr Jul 20, 2024 • edited Loading

Choose a reason for hiding this comment

ilyagr Jul 26, 2024 • edited Loading

Choose a reason for hiding this comment

yuja Jul 27, 2024

Choose a reason for hiding this comment

yuja Jul 17, 2024

Choose a reason for hiding this comment

ilyagr Jul 17, 2024

Choose a reason for hiding this comment

ilyagr commented Jul 15, 2024 •

edited

Loading

ilyagr Jul 17, 2024 •

edited

Loading

ilyagr Jul 18, 2024 •

edited

Loading

ilyagr Jul 18, 2024 •

edited

Loading

ilyagr Jul 20, 2024 •

edited

Loading

ilyagr Jul 26, 2024 •

edited

Loading