Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

conflicts: insert trailing newline when materializing conflicts #3977

Closed
wants to merge 3 commits into from

Conversation

ilyagr
Copy link
Collaborator

@ilyagr ilyagr commented Jun 26, 2024

This happens for non-empty files only; the materialization code can
handle files and hunks with 0 or more lines, but each line should
terminate in a newline.

This is an imperfect fix to #3968, a better fix is TODO; see #3975.

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • (n/a) I have updated the documentation (README.md, docs/, demos/)
  • (n/a) I have updated the config schema (cli/src/config-schema.json)
  • I have added tests to cover my changes

@ilyagr ilyagr marked this pull request as ready for review June 27, 2024 00:01
@ilyagr ilyagr force-pushed the conflictnewlines branch 4 times, most recently from a3f28aa to c33166f Compare June 27, 2024 03:55
lib/src/merge.rs Outdated
@@ -415,6 +415,12 @@ impl<T> FromIterator<T> for MergeBuilder<T> {
}
}

impl<T> FromIterator<T> for Merge<T> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iirc, .collect() into Merge is not implemented by design, and the caller should use MergeBuilder instead.

Copy link
Collaborator Author

@ilyagr ilyagr Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the point was to avoid changing the length of the Merge. That is significant, thanks. [[Update: Actually, on second though, I'm not sure .collect() is any worse in this regard than the .collect::<MergeBuilder<_>>().build() it expands to, but having an iterator in a situation where the length is dangerous to change seems ugly regardless]]. Merge::from_vec still exists, but I guess we should eventually make it private.

I think it would be better to add a owned_map method to Merge (Or into_map? I'm not sure about the naming or how to make it look idiomatic. In principle, the current map could become something like .as_ref().map(), but I'm not sure that's nice). Using MergeBuilder when the length of the merge doesn't change feels ugly and unintuitive.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with into_map. PTAL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I tried to explain the reason here:
https://github.com/martinvonz/jj/blob/eba9ac0b1b6cc8a564749598dd8101d60bceffe5/lib/src/merge.rs#L374-L379

However, I don't understand it myself now. Feel free to try to remove MergeBuilder and see how it goes. Maybe I was just confused (or maybe I'm missing something now).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't understand why Option matters, but I think the goal was to make .collect() not panic.

};
ContentHunk(side)
})
.collect();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we do padding when writing hunks or added/removed contents? It will help implement "a better solution" later.

It might render eol-only change as "different", but I think that's correct.

Copy link
Collaborator Author

@ilyagr ilyagr Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of padding do you mean?

Copy link
Collaborator

@yuja yuja Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something like

write_all(data);
if !data.ends_with(b'\n') {
    // TODO: better solution ...
    write_all(b"\n");
}

Edit: I assume it's a presentation problem rather than a bug in diff/merge logic.

Copy link
Collaborator Author

@ilyagr ilyagr Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First of all, what we are trying to avoid is:

<<<<<<< Conflict 1 of 1
%%%%%%% Changes from base to side #1
-base+++++++ Contents of side #2
right>>>>>>> Conflict 1 of 1 ends

The most naive interpretation of your suggestion (putting the padding at the end of the function) wouldn't help with that AFAICT.


What makes more sense (but I'm leaning against from) is to put the padding inside the conflict processing, around https://github.com/martinvonz/jj/pull/3977/files#diff-15b48dd183d7f89959f15b80cee2dcae0bb7099e1e934497f74c73384315fa6bR254 (this is what I tried doing first, the new line is a remnant of this). Note that each side of the hunk must be padded, and this will only matter if there is a conflict hunk at the very end of the file.

I think that approach is doable, and it has the advantage that, if there is no conflict at the end of the file, the file would not be mangled.

However, the reason I moved it to the beginning is that my plan was to replace padding with the more complicated transformation I described in #3795 (Update: I meant #3975 here and elsewhere). That transformation would need to be applied both inside and outside conflicts, so the beginning of the function seems to be the right place for it.


Is moving the padding inside conflict processing what you meant?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is moving the padding inside conflict processing what you meant?

Yes. If I understand the problem, the logic is correct to the point of let merge_result = files::merge(&slices), and the rendering code is broken. We'll need to fix output.write_all(&left.0)?;, write_diff_hunks(), etc. to be compatible with line-oriented conflict format, by inserting "\n" (or "\n\\No newline at end of file?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the rendering code is internal, I was thinking that we can simply declare that there is a precondition for it to work correctly: the input has to be 0 or more newline-terminated lines.

In my mind, this is similar to the fact that our rendering code is broken (produces results that cannot be parsed correctly) when the input contains conflict markers.

To me, the rendering code is broken, so we should fix it rather than introducing lossy transformation. Perhaps, nested conflict markers can be escaped somehow (e.g. by using hunk syntax?)

I'm now thinking of just preparing the full solution from #3795, since it treats both issues.

nit: wrong issue number?

I'm worried that you'll oppose it since it's similar in spirit to this PR: each side is encoded to satisfy the 2 conditions (newlines & not having and conflict markers) before the file conflict is split into hunks.

I'm not sure why you want to preprocess the raw inputs. It might be slightly easier, but is wrong in that "foo" and "foo\n" no longer produce a conflict? That might be okay in practice, but would be weird if conflicted file had no conflicts inside.

If I remember correctly, one of the proposed solutions was to insert something like \ No newline at end of file and decode it by parse_conflict(). I think parse_conflict_hunk() will roughly do:

  1. split by conflict markers
  2. parse contents and hunks (where "No EOL" will be processed)
  3. collect them into Merge<ContentHunk>

And it makes sense that materialize_merge_result() does the reverse.

Copy link
Collaborator Author

@ilyagr ilyagr Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wrong issue number?

The correct issue number is #3975. Sorry for the confusion!

That approach would make the transformation lossless. Generally, I hope that my comments will make more sense if you look at that.

My current iteration for the plan for #3975 is to start special lines with \JJ, e.g. \JJ: No newline at the end of file or \JJ Verbatim line: <<<<<<<.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once that is done, my preliminary plan for #3223 is to materialize deleted sides as \JJ There is no file at this path on this side of the conflict. It may have been deleted.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems good (though I don't have any particular preference regarding syntax.) Anyway, my point is that the source hunks shouldn't be padded, which is lossy.

Copy link
Collaborator Author

@ilyagr ilyagr Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is wrong in that "foo" and "foo\n" no longer produce a conflict? That might be okay in practice, but would be weird if conflicted file had no conflicts inside.

By the way, this is a good point, I hadn't initially realized that there could be conflicts that are no longer conflicts after adding the newline at the end, potentially resulting in occasional problems like #3223.

I'll close this in favor in hopefully upcoming comprehensive solution.

@ilyagr ilyagr force-pushed the conflictnewlines branch 5 times, most recently from dba45bb to 0e065e8 Compare June 29, 2024 18:56
@ilyagr ilyagr force-pushed the conflictnewlines branch 2 times, most recently from 97373a3 to 2ed234e Compare July 7, 2024 07:37
ilyagr added 3 commits July 10, 2024 09:53
… reference

We are about to do some pre-processing of that value, and we
will do more in the near future.
This happens for non-empty files only; the materialization code can
handle files and hunks with 0 or more lines, but each line should
terminate in a newline.

This is an imperfect fix to jj-vcs#3968, a better fix is TODO; see jj-vcs#3975.

This also implements `Merge::into_map` to avoid distracting iterations
and MergeBuilders. I could split it into a separate commit, but then
it'd have no uses.
@ilyagr ilyagr force-pushed the conflictnewlines branch from 2ed234e to a6f8d2a Compare July 10, 2024 16:59
@ilyagr ilyagr closed this Jul 10, 2024
@ilyagr
Copy link
Collaborator Author

ilyagr commented Jul 15, 2024

Replaced by #4088

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants