Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add padding to JSON to avoid git merge conflicts #133

Open
stipsan opened this issue Sep 20, 2016 · 29 comments
Open

Add padding to JSON to avoid git merge conflicts #133

stipsan opened this issue Sep 20, 2016 · 29 comments

Comments

@stipsan
Copy link

stipsan commented Sep 20, 2016

Hey,

We've noticed that the more people we have working on our JS SPA app views, all of which are translated, the more painful git merge conflicts we get.

A simple solution to avoid them can be seen here: http://stackoverflow.com/a/33262179

In other words, have a linebreak between each translation in the JS.

Great work on l10ns btw! We reviewed Polyglot.js and many other solutions but we found l10ns to be superior 👍

@tinganho
Copy link
Owner

Can you give an example of one merge conflict?

@stipsan
Copy link
Author

stipsan commented Sep 20, 2016

Bob the dev starts working on feature A while Kyle the engineer starts working on feature B.
Both of them add translations and the en_US.json file (and others) gets new entries added.

Now Bob submits a PR, all is well and it's merged without trouble.
But now Kyle will get a merge conflict when he pulls in master now that Bobs changes to the json files is added.
Kyle have to resolve them manually and it's quite the pain.

The repo we're experience this on isn't public, but I can set one up for you if you want to investigate further and see the problem in action.

@tinganho
Copy link
Owner

I was meaning something specific to our l10ns's json files? What you are describing is just a general merge conflict.

L10ns currently order both entries and properties to make it more git friendly. If you are changing the same line of course a manual resolve is needed. Note, adding line padding will not resolve that.

@stipsan
Copy link
Author

stipsan commented Sep 21, 2016

Ah I see, no we're not changing the same lines. We simply add new strings to the language files at the same time.
In our project we're working on a React app. We use feature branching and require all strings in the views to be translated. Therefore we quite often add new strings to language files at the same time and that's the kind of merge conflict that would be avoided with the extra padding.
The padding would prevent git from being unsure if the changes are related or not.

If we edit the same lines there's still going to be a conflict with the padding, that's not the issue we're trying to address.

@tinganho
Copy link
Owner

I think I need a good isolated reproducible example. With the localization files from L10ns.

@stipsan
Copy link
Author

stipsan commented Sep 22, 2016

Ok sure, I'll make one and post the link in this issue 👍

@stipsan
Copy link
Author

stipsan commented Sep 27, 2016

I managed to (eventually) reproduce it here: https://github.com/stipsan/l10ns-merge-conflict/pull/7
But I went ahead and tried the proposed padding solution from stack overflow and it actually made no difference: https://github.com/stipsan/l10ns-merge-conflict/pull/16/files

It's clear that this is something that likely cannot be solved in l10ns itself, it's a problem with the default merge driver in git.
This module looks promising: https://www.npmjs.com/package/git-json-merge

I'll report back if it git-json-merge works out. If it does it may be worth adding to the l10ns readme as a protip 😃

@tinganho
Copy link
Owner

Thanks for doing the research.

@cadesalaberry
Copy link
Contributor

I was on the same path,

Sadly https://github.com/jonatanpedersen/git-json-merge/ makes no difference as it is hard for a diff engine to resolve conflicts in arrays.

Beware, when used with l10ns, the current git-json-merge plugin can lead to data destruction...

We are trying to come up with a solution jonatanpedersen/git-json-merge#3

@stipsan
Copy link
Author

stipsan commented Oct 3, 2016

@cadesalaberry thanks a lot for taking the time to share that!
I hadn't gotten around to test it out yet and your feedback helped me dodge that bullet 👌

@tinganho
Copy link
Owner

tinganho commented May 9, 2017

To summarize the problem:

Two user adds a localization entry in en.json:

  {
    "files": [
>>>>>> HEAD
      "Source/Content/Offline/InternalServerError.tsx"
    ],
    "id": "p7Le4KgRFed",
    "key": "INTERNAL_SERVER_ERROR--DESCRIPTION",
    "new": true,
    "timestamp": "2017-05-03T15:23:53.782Z",
    "value": "owijeoiwejfifwe.",
=======
      "Source/Content/Offline/InternalServerError.tsx"
    ],
    "id": "p7Le4KgRFed",
    "key": "INTERNAL_SERVER_ERROR--DESCRIPTION",
    "new": true,
    "timestamp": "2017-05-03T15:23:53.782Z",
    "value": "wefefwfwefwewfe.",
<<<<<<< 2i3hr98f2h
    "variables": []
  },

Because git is just chunking the unique code between the two commits, i.e. leaving out files and variables line. It makes it hard for us if we want to choose both of the chunks. Which happens quite often.

So we want git to be a bit smarter and maybe resolve the conflict automatically or just provide chunks that includes files and variables.

>>>>>> HEAD
  {
    "files": [
      "Source/Content/Offline/InternalServerError.tsx"
    ],
    "id": "p7Le4KgRFed",
    "key": "INTERNAL_SERVER_ERROR--DESCRIPTION",
    "new": true,
    "timestamp": "2017-05-03T15:23:53.782Z",
    "value": "owijeoiwejfifwe.",
    "variables": []
  },
=======
  {
    "files": [
      "Source/Content/Offline/InternalServerError.tsx"
    ],
    "id": "p7Le4KgRFed",
    "key": "INTERNAL_SERVER_ERROR--DESCRIPTION",
    "new": true,
    "timestamp": "2017-05-03T15:23:53.782Z",
    "value": "wefefwfwefwewfe.",
    "variables": []
  },
<<<<<<< 2i3hr98f2h

@qborreda
Copy link

qborreda commented May 11, 2017

Similar to this, the json generator writes new entries on top of each other, and git comes up with this mess when merging:

  {
    "files": [
      "src/localizations/dictionary.js"
    ],
<<<<<<< HEAD
    "id": "zeEXaBEzSX",
    "key": "GENERIC->NO_PERMISSION",
    "new": false,
    "timestamp": "2017-03-16T16:45:48.446Z",
    "value": "You don't have permissions to do this action.",
=======
    "id": "Royz94kns6",
    "key": "GENERIC->FILTER_BY",
    "new": false,
    "timestamp": "2017-05-04T09:38:05.490Z",
    "value": "Filter by",
>>>>>>> develop
    "variables": []
  },

and the process of resolving these conflicts is so painful when two different branches use translations heavily .. multiply for as many languages as you have (5 in our case) ..

@tinganho
Copy link
Owner

One solution is to create one file per localization string. Unless, we want every user of l10ns to add a proper merge driver for git to resolve these conflicts.

@qborreda
Copy link

Well, we have over 240 translation strings .. translated into 5 languages.
I'm thinking we might as well work on a db and some interface that outputs the json format, and exclude the json files from the git repo ..

@tinganho
Copy link
Owner

Anyone running into this problem might mitigate it by applying diff3 style to your git config, to show common ancestor: microsoft/TypeScript#16129.

@batiste
Copy link

batiste commented Sep 11, 2017

Wouldn't changing the format to use a map of keys solve this problem?
It seems that internally the code is using a map anyway.

@batiste
Copy link

batiste commented Sep 11, 2017

Here is a proposed PR #165

@tinganho
Copy link
Owner

tinganho commented Sep 11, 2017

I think it depends. If everything is one line yes(though, I don't think we want that).

{
    "UNIQUE_KEY_1": {
          "files": "same",
          "value": "unique-1",
          "variables": "same"
    },
    "UNIQUE_KEY_2": {
          "files": "same",
          "value": "unique-2",
          "variables": "same"
    },
    // ...
}

But, I think in the above case Git wants to collapse the "variables" line and the "}," line. Anyone that wants to just add both "UNIQUE_KEY_1" and "UNIQUE_KEY_2" to each store needs to re-add those lines to resolve the conflict.

We could actually have our own file format instead of using JSON and have unique lines in the start and end of each entry:

start "UNIQUE_KEY_1"
          "files": "same",
          "value": "unique-1",
          "variables": "same"
end "UNIQUE_KEY_1"
start "UNIQUE_KEY_2"
          "files": "same",
          "value": "unique-1",
          "variables": "same"
end "UNIQUE_KEY_2"

I think this will resolve the conflicts. But I need a POC first, to be sure.

@batiste
Copy link

batiste commented Sep 11, 2017

@tinganho you might be right the map doesn't really solves the problem. I an unsure about a new file format.. That seems like an over engineered solution. I my experience those conflicts happen a lot because all the new stuff is inserted at the top of the file so this a recipe for conflicts. If the insertion order in the array would be random that would reduce the chance for conflicts enormously.

I there any documentation out there on how to split the translations in different files?

@batiste
Copy link

batiste commented Sep 11, 2017

I believe a solution as simple as this would probably avoid the majority of conflicts:

batiste@5c201e8

If the key is random or pseudo random, then when the number of key increase the risk of conflict would decrease? Sorting by date might be the source of all those conflicts...

@cadesalaberry
Copy link
Contributor

cadesalaberry commented Sep 11, 2017

This might work pretty well with a custom git merger now that it is not and array anymore.

https://www.npmjs.com/package/git-json-merge

EDIT: NVM, I did not see I already mentionned it in this thread...

@tinganho
Copy link
Owner

@batiste I think you are right, sorting by date is the root cause of all the conflicts here. And if we sort by key, it would significantly reduce the conflicts. Though, there is still a slight chance of a conflict when it sorts each entry to be of the same row. Let me just think a bit about the implications. And I might add your commit.

@cadesalaberry I think I commented on custom git mergers before, but I think it would be a to daunting task to let every user of L10ns install it and configure git to use it. And L10ns should also be agnostic to which SCM you use. I haven't really used git-json-merge myself, but how does it deal with multiple type of files. Let say JS files and JSON files?

@batiste
Copy link

batiste commented Sep 18, 2017

No news on this? We could really benefit from a change in our team...

@tinganho
Copy link
Owner

I will have closer look later today or tomorrow, and probably merge it in if I cannot find any issue.

@cadesalaberry
Copy link
Contributor

@tinganho It supports reading js files, but it will always write the file as JSON.
I understand your worry, It's true it would only work for JSON.

@Sparted We ended up running this script after each translation and l10ns update. This is only temporary, but is so much better when it comes to merge conflicts.

./scripts/l10ns-inline.js

var path = require('path');
var fs = require('fs');

var l10nsConfig = require(path.resolve(__dirname, '..', 'l10ns')).projects.core;
var localizationsDir = path.resolve(__dirname, '..', l10nsConfig.store);

Object.keys(l10nsConfig.languages).forEach(processFile);

function processFile(lang) {
  var filepath = path.join(localizationsDir, lang + '.json');
  var json = JSON.parse(fs.readFileSync(filepath));

  try {
    fs.writeFileSync(filepath, processJSON(json));
  } catch (e) {
    console.error('Failed to process localization file ' + filepath);
    console.error(e.stack);
    process.exit(1);
  }
}

function processJSON(json) {
  var lines = json
    .sort(function(a, b) {
      var diff = new Date(b.timestamp) - new Date(a.timestamp);

      return diff || b.key.localeCompare(a.key);
    })
    .map(function(entry, i) {
      return [
        '  ',
        JSON.stringify(entry),
        (i !== json.length - 1 ? ',' : '')
      ].join('');
    });

  return ['[', lines.join('\n'), ']'].join('\n');
}

@tinganho
Copy link
Owner

I now published a new version of l10ns that sorts by key. Let me know if there is any problem. Thanks @cadesalaberry and @batiste to your prior research on this.

I'm still keeping this issue open. I'm a bit interested in introducing a new fileformat in the future to make this problem disappear for good.

@tinganho
Copy link
Owner

tinganho commented Oct 4, 2017

I've just tried to add unique lines on start and end of each entry. And it doesn't auto resolve the conflicts.

screen shot 2017-10-05 at 01 14 07

I think we have to resolve to @cadesalaberry's solution with a custom merge tool to git https://git-scm.com/docs/gitattributes#_defining_a_custom_merge_driver.

@tinganho
Copy link
Owner

tinganho commented Oct 4, 2017

Maybe the merge strategy union is a solution for this problem: https://git-scm.com/docs/gitattributes#gitattributes-union

@cadesalaberry
Copy link
Contributor

Yes, I had a look at it as well, but it is not always consistent when merging.

When you have duplicate keys, l10ns update will only keep the first occurrence (if I remember well), which is not always right.

We are looking at a way to interact with git to get a smoother merge experience. I will get back to you once we get results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants