diff --git a/README.md b/README.md index ff23860bf7..8621664086 100644 --- a/README.md +++ b/README.md @@ -117,6 +117,22 @@ AST diffing is a lossy process from the perspective of a text diff. Difftastic will ignore whitespace that isn't syntactically significant, but merging requires tracking whitespace. +### Can difftastic ignore reordering? + +No. Difftastic always considers order to be important, so diffing +e.g. `set(1, 2)` and `set(2, 1)` will show changes. + +If you're diffing JSON, consider sorting the keys before passing them +to difftastic. + +``` +$ difft <(jq --sort-keys < file_1.json) <(jq --sort-keys < file_2.json) +``` + +See also [Tricky Cases: Unordered Data +Types](https://difftastic.wilfred.me.uk/tricky_cases.html#unordered-data-types) +in the manual. + ### Can I use difftastic to check for syntactic changes without diffing? Yes. Difftastic can check if the two files have the same AST, without diff --git a/manual/src/tricky_cases.md b/manual/src/tricky_cases.md index 1e94dae2b3..7568ccb7cc 100644 --- a/manual/src/tricky_cases.md +++ b/manual/src/tricky_cases.md @@ -371,6 +371,39 @@ Syntactic diffing can ignore whitespace changes, but it has to assume punctuation is meaningful. This can lead to punctuation changes being highlighted, which may be quite far from the relevant content change. +## Unordered Data Types + +``` +// Before +set(1, 2) + +// After +set(2, 1) +``` + +Users may expect difftastic to find no changes here. This is difficult +for several reasons. + +For programming languages, side effects might make the order +relevant. `set(foo(), bar())` might behave differently to `set(bar(), +foo())`. + +For configuration languages like JSON or YAML, some parser +implementations do actually expose ordering information +(e.g. `object_pairs_hook=OrderedDict` in Python, or serde_json's +`preserve_order` feature in Rust). + +To make matters worse, unordered tree diffing is NP-hard. + +> For the unordered case, it turns out that all of the problems in +> general are NP-hard. Indeed, the tree edit distance and alignment +> distance problems are even MAX SNP-hard. +> +> -- [A survey on tree edit distance and related problems](https://doi.org/10.1016/j.tcs.2004.12.030) + +**Difftastic**: Difftastic considers ordering to be meaningful +everywhere, so it will always report ordering changes. + ## Novel Blank Lines Blank lines are challenging for syntactic diffs. We are comparing