Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tree merge with tree-related auto-resolution #1705

Merged
merged 20 commits into from
Dec 8, 2024
Merged

tree merge with tree-related auto-resolution #1705

merged 20 commits into from
Dec 8, 2024

Conversation

Byron
Copy link
Member

@Byron Byron commented Nov 25, 2024

Make it possible to resolve tree-conflicts with ours while making it possible to notice that a conflict would otherwise have occurred.

The idea is that one wants to produce a suitable tree while noticing that a conflict would have occurred if we wouldn't have resolved it.

Follow-up of #1661.

Tasks

  • design API so that current ancestor usecase and ours is covered
  • first simple tests
  • work through all remaining cases in the test-suite, each of them must be resolved to ours and ancestors with A-B and B-A order.
  • add support for tree-conflict resolution to gix.
  • add support for to gix merge tree/commit, start sharing settings in clap as well.
  • complex tests that replace tree with non-tree (nested) and non-tree with tree. The latter probably needs special care (see whatever/empty in simple)
  • clear out todo!() and dbg!
  • (gitoxide issue, which takes url::password() directly even though it's percent-encoded)
  • merge has to turn off empty-oid rename tracking, but only when finding individual files (directory tracking should still work)
    • rename-within-rename - needs forced resolution tests

Implementation Notes

  • We don't optimise for the ancestor case, so it's not always as good as it can possibly be, fortunately in what I would call 'niche' situations. In practice, it's only used for merging virtual merge bases.
  • The system can be thrown off if very unlikely renames are found where Git doesn't find it at all. Sometimes this leads to unexpected results and should be fixed by improving rename tracking to be closer to Git, taking names into consideration and having candidates.
  • merge.directoryRenames isn't implemented, even though Git has a handle for that. We could respect the setting by deciding what should be a conflict, even though right now we always track directory renames.

Possible Tasks

  • Submodule merges are also possible! Maybe outscope it though! libgit2 also doesn't try it.
  • textconv with context, see this gist for details.
    • There seem to be different 'tiers' of tools, some don't get GIT_DIR set, others do.
    • It also seems that diff-programs get too much context right now, but that depends on how much is passed to them by the caller as gix-command::Context.
  • How to model virtual-merge-bases? Can be none or many, user should have control over how this is done.
  • Actual tree-based merging
  • Make blob-merge based conflict handling possible in the tree merge from gix at least. - not needed for now
  • consider finding a way to more clearly say what happened, maybe with some additional explanation so it's more of a drill-down way of working, instead of a super-enum with a lot of very similar cases.
Archive

Previous Research (index creation)

  • record_conflicted_entries() sets the index to match what was recorded.
    • This happens right after checking out the new tree, which alters the index to match it and does all the safety-related work. The key here is that only the conflicted entries matter.
  • Git checks out the tree directly, which not only changes the working tree, but also adjusts the existing index efficiently by leaving the metadata uptodate. This is something we should offer as well so the index-adjustment would be done right after, with the index being written just once.
    • GB would have to checkout the tree-id then, load the index, apply the conflicting paths, to finally write the index a second time. This can be optimized later once gitoxide can checkout a tree as well (while keeping the index in memory).
  • checkout() by tree-id, with auto-updates to the index, in-memory.

Everything is about MergeORT.

ability to re-use object caches of a repo that has seen the base-tree already, but overall, who knows*

  • merge-options passed with -X ours for instance don't affect tree-related auto-resolutions, just the ones related to content. This could be implemented when there is demand though.
  • it uses an empty tree if there is no merge-base - we must allow the same.
  • it allows for multiple merge-bases, creating a virtual one by merging all merge-bases together using the same algorithm, recursively.
  • merges can have conflicts without a individual files being involved, for instance when directory renames clash
  • Note that merge-ORT cannot properly handle renames within renamed directories, ending up with the source of the subdir-rename still present.
❯ git ls-tree -r $(git merge-tree main feat)
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    a
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    git-sec-renamed/2
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    git-sec-renamed/7
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    git-sec-renamed/subdir/6
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    git-sec/subdir-renamed/6
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    git-sequencer
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    gix/5
100644 blob e69de29bb2d1d6434b8b29ae775ad8c2e48c5391    h
  • Must make sure that possible types of conflicts are properly communicated, to not degenerate information
  • It puts conflict-markers in the blobs of the result tree, with annotations to provide additional context
  • Need resolution configuration, see git2::MergeOptions.
  • data stored by path, and is interned in the map to allow pointer-based comparisons
    • merge-info with everything one needs to know, also related to renames
    • or conflict information
    • it uses a memory-pool/arena to get memory for many paths all at once (and also release it like that)
  • paths start out as conflicted, and then can later be changed to non-conflicting if a content-based merged succeeded.
    • If it remains conflicts, the meta-data is used to produce an 'as merged as possible' version with conflict markers that can be checked out to the working tree.
  • hunks can partially overlap, but can also be resolved line-by line to some extend.

… to indicate auto-resolution.

Previously it wasn't possible to detect auto-resolution, even though it can be assumed if a
resolution mode was provided.
@Byron Byron force-pushed the merge branch 2 times, most recently from 5169947 to 2b639aa Compare November 25, 2024 19:45
…conflicts`.

That way it's possible to steer how to resolve tree-related conflicts while
making it possible to detect that a conflict happened.
@Byron Byron force-pushed the merge branch 7 times, most recently from c5c6512 to a82f3f2 Compare December 2, 2024 15:12
@Byron Byron marked this pull request as ready for review December 2, 2024 15:20
@Byron Byron force-pushed the merge branch 3 times, most recently from 35226fc to ad9d484 Compare December 5, 2024 11:45
Byron added 9 commits December 5, 2024 20:08
That way it's possible to control how tree-conflicts should be auto-resolved.
With it one can decide which side to favor in case of
irreconcilable tree-conflicts.
Previously, the lock location would block all writers from executing
a fixture even though they wouldn't step on each others feet.

Now, a script destination is used to assure locks are created close
to the destination when creating writable fixtures, typically removing
the need for multiple writers to wait on each other unnecessarily.
… `Id`.

This makes these types easier to use as it's enough to pass a wrapped type
to perform more actions on the underlying repository.
Previously it was impossible to tell if rename tracking was disabled, or
if it was unset, which leads to incorrect logic.

This changes the signature of `diff::new_rewrites()` to also provide information about
whether or not it was configured.
The problem here is that our implementation avoids using a hashmap,
but pays the price by having bad performance if there are too many
possible candidates (binary search doesn't seem to be up to the task).

This can lead to impossibly long runtimes, so for now, limit it explicitly.
Byron added 2 commits December 7, 2024 19:23
…ave unconflicted *and* conflicted entries.

This is a massive footgun currently where incorrect usage is very likely while causing all kinds of mistakes.
Byron added 3 commits December 7, 2024 20:53
Previously it was possible to mark perfectly working content merges as conflicting
if we would choose 'our' or 'their' resolution.
Empty files are equivalent to having no content, which also means
such files have no identity to speak off. This definitely helps
with false positives of `.gitignore` for instance, which can be empty
to tell Git to track a directory.

On top of that, Git has a heuristic to do rename tracking of small files
by similarity as the similarity may be off of files just have a couple of
lines to speak about.

Note that empty files that are renamed as part of a whole directory will still
be tracked as renames.
@Byron Byron enabled auto-merge December 8, 2024 18:40
@Byron Byron merged commit 520c832 into main Dec 8, 2024
20 checks passed
@Byron Byron deleted the merge branch December 8, 2024 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

1 participant