diff --git a/README.md b/README.md index 4eafd2e..6bb2854 100644 --- a/README.md +++ b/README.md @@ -141,7 +141,34 @@ This algorithm is usually used for optical character recognition (OCR) applicati It can also be used for keyboard typing auto-correction. Here the cost of substituting E and R is lower for example because these are located next to each other on an AZERTY or QWERTY keyboard. Hence the probability that the user mistyped the characters is higher. - +```cs +using System; +using F23.StringSimilarity; + +public class Program +{ + public static void Main(string[] args) + { + var l = new WeightedLevenshtein(new ExampleCharSub()); + + Console.WriteLine(l.Distance("String1", "String1")); + Console.WriteLine(l.Distance("String1", "Srring1")); + Console.WriteLine(l.Distance("String1", "Srring2")); + } +} + +private class ExampleCharSub : ICharacterSubstitution +{ + public double Cost(char c1, char c2) + { + // The cost for substituting 't' and 'r' is considered smaller as these 2 are located next to each other on a keyboard + if (c1 == 't' && c2 == 'r') return 0.5; + + // For most cases, the cost of substituting 2 characters is 1.0 + return 1.0; + } +} +``` ## Damerau-Levenshtein Similar to Levenshtein, Damerau-Levenshtein distance with transposition (also sometimes calls unrestricted Damerau-Levenshtein distance) is the minimum number of operations needed to transform one string into the other, where an operation is defined as an insertion, deletion, or substitution of a single character, or a **transposition of two adjacent characters**. diff --git a/src/F23.StringSimilarity/F23.StringSimilarity.csproj b/src/F23.StringSimilarity/F23.StringSimilarity.csproj index 50bb102..8e0e467 100644 --- a/src/F23.StringSimilarity/F23.StringSimilarity.csproj +++ b/src/F23.StringSimilarity/F23.StringSimilarity.csproj @@ -1,23 +1,35 @@  - netstandard1.0 + netstandard2.0;net45 F23.StringSimilarity - 3.1.0 + 4.0.0 + string;similarity;distance;levenshtein;jaro-winkler;lcs;cosine StringSimilarity.NET James Blair, Paul Irwin Copyright 2018 feature[23] A .NET port of java-string-similarity. A .NET port of java-string-similarity (https://github.com/tdebatty/java-string-similarity). A library implementing different string similarity and distance measures. Several algorithms (including Levenshtein edit distance and sibblings, Jaro-Winkler, Longest Common Subsequence, cosine similarity etc.) are currently implemented. https://github.com/feature23/StringSimilarity.NET - https://raw.githubusercontent.com/feature23/StringSimilarity.NET/master/LICENSE - https://raw.githubusercontent.com/feature23/StringSimilarity.NET/master/logo.png + MIT + logo.png false string similarity distance cosine damerau jaccard jaro-winkler levenshtein ngram qgram shingle sift4 + true + true + snupkg + + + + + + + + - bin\Release\netstandard1.0\F23.StringSimilarity.xml + bin\Release\netstandard2.0\F23.StringSimilarity.xml \ No newline at end of file diff --git a/logo.png b/src/F23.StringSimilarity/logo.png similarity index 100% rename from logo.png rename to src/F23.StringSimilarity/logo.png diff --git a/test/F23.StringSimilarity.Tests/F23.StringSimilarity.Tests.csproj b/test/F23.StringSimilarity.Tests/F23.StringSimilarity.Tests.csproj index 66de828..1f0ee0f 100644 --- a/test/F23.StringSimilarity.Tests/F23.StringSimilarity.Tests.csproj +++ b/test/F23.StringSimilarity.Tests/F23.StringSimilarity.Tests.csproj @@ -1,7 +1,7 @@  - netcoreapp2.0 + netcoreapp3.1