Improve readability of diff_cleanupSemantic #26

TravisJRyan · 2024-07-24T16:38:59Z

Issue

The readability of diff_cleanupSemantic could be improved. While it's promoted for use when a human needs to read the diff per the README, I'm still seeing that words are partially split. I think the algorithm should always try and make diffs of words/phrases complete and not interrupted in between words.

The use case is building a UI where semantic diffs are key to track before/after changes to some text.

Example

Before: The dog was a little hungry.
After: The duck was a bit hungry.

Expected:
"dog" is completely crossed out in place of "duck", and "little" is completed crossed out in place of "bit".

Actual output:

The above takes a lot of mental energy to understand, so it's not feasible for humans to use this to understand the diff for a large piece of text, as these types of issues occur frequently. Some way to more aggressively try and separate words/phrases into readable diffs would be preferred here.

I've also tried using diff_efficientCleanup with varying edit distances, but it doesn't seem to fully get rid of this problem, as the algorithm needs more knowledge about how to split on words/phrases for readability.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve readability of diff_cleanupSemantic #26

Improve readability of diff_cleanupSemantic #26

TravisJRyan commented Jul 24, 2024

Improve readability of diff_cleanupSemantic #26

Improve readability of diff_cleanupSemantic #26

Comments

TravisJRyan commented Jul 24, 2024

Issue

Example