Lingua 1.5.0
Features
-
The new method
LanguageDetector.detect_multiple_languages_of()
has been introduced. It allows to detect multiple languages in mixed-language text. (#1) -
The new method
LanguageDetectorBuilder.with_low_accuracy_mode()
has been introduced. By activating it, detection accuracy for short text is reduced in favor of a smaller memory footprint and faster detection performance. (#119) -
The new method
LanguageDetector.compute_language_confidence()
has been introduced. It allows to retrieve the confidence value for one specific language only, given the input text. (#102)
Improvements
-
The computation of the confidence values has been revised and the softmax function is now applied to the values, making them better comparable by behaving more like real probabilities. (#120)
-
The WASM API has been revised. Now it makes use of the same builder pattern as the Rust API. (#122)
-
The language model files are now compressed with the Brotli algorithm which reduces the file size by 15 %, on average. (#189)
-
The language model ngrams are now stored in a
CompactString
type which reduces the amount of consumed memory by 20 %. (#198) -
Several performance optimizations have been applied which makes the library nearly twice as fast as the previous version. Big thanks go out to @serega and @koute for their help. (#82, #148, #177)
-
The enums
IsoCode639_1
andIsoCode639_3
now implement some new traits such asCopy
,Hash
and Serde'sSerialize
andDeserialize
. The enumLanguage
now implementsCopy
as well. (#175)